Explaining DeepSeek and its implications with Chris Manning

Christopher Manning
View Profile >

Christopher Manning, AIX Ventures Investing Partner, shares his thoughts on DeepSeek-R1 and its implications. The talk covers everything from DeepSeek history to geopolitical implications.


Key Takeaways

  1. DeepSeek is a rising AI company from China, with models approaching GPT-4 levels in reasoning and math, while being highly efficient.

  2. DeepSeek has followed a trajectory similar to OpenAI, growing from relative obscurity to developing highly competitive LLMs they have since released three major iterations of its models:

    • DeepSeek v1: Based on Llama 2 architecture, with early optimizations for hardware efficiency.

    • DeepSeek v2: Introduced architectural innovations like multi-head latent attention and Mixture of Experts (MoE), improving efficiency and reducing computation costs.

    • DeepSeek v3 & R1: These models have state-of-the-art efficiency, leveraging FP8 training, MoE with a high number of experts, and low-rank decomposition for attention mechanisms. They significantly cut down inference costs while maintaining high performance, particularly in reasoning, math, and coding tasks.

  3. China's AI development is robust, with multiple players competing in LLM advancements.

  4. DeepSeek's key innovations:

    • Mixture of Experts (MoE) models with efficient multi-head latent attention

    • FP8 training and inference, significantly reducing computational costs

    • Low-rank decomposition to optimize data flow and efficiency

  5. AI development is moving fast—there are no permanent technological leads, and companies can catch up within months.

  6. Open-source AI is closing the gap with proprietary models, as DeepSeek commits to continued open publicationof its advancements.

  7. Geopolitical context: US chip restrictions have not stopped China’s AI growth but accelerated domestic innovation in both AI models and chip production.

  8. AI compute demand is increasing—despite efficiency gains, companies like Nvidia will continue to see high demand.

  • DeepSeek and AI in China

    Welcome, everyone. Today, I aim to provide a comprehensive overview of DeepSeek and, more broadly, the landscape of artificial intelligence in China. The discussion will unfold in three key segments. First, I will offer some context on DeepSeek's origins and its role in the evolving AI ecosystem in China. Second, I will delve into a more technical analysis of DeepSeek’s large language models, which may either be the highlight for some or a challenging deep dive for others. Finally, I will discuss the broader implications of these developments, spanning industry, investment, and geopolitical considerations. I will conclude by addressing any questions you may have.

    The Rise of DeepSeek

    To begin, it is worth considering the remarkable trajectory of DeepSeek, a relatively obscure company that has recently made headlines. DeepSeek, led by CEO Liang Wenfeng, emerged from a somewhat unconventional background. Primarily a medium-frequency trading firm, it ventured into language model development as a secondary initiative. This is not entirely without precedent; for example, DE Shaw in the United States has long been known for its dual focus on finance and biotechnology. Similarly, DeepSeek leverages its pool of intellectual talent—many of whom are International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI) winners—to push the boundaries of deep learning while maintaining its core financial operations.

    Despite the current media attention, DeepSeek has been an active player in the AI field for some time. Those closely following the progress of large language models would have been aware of its developments well before its recent breakthroughs. Just as OpenAI gained momentum with GPT-2 and GPT-3 before its widespread recognition with ChatGPT, DeepSeek has followed a similar trajectory.

    DeepSeek’s Evolution and Technical Foundations

    DeepSeek released its first large language model (LLM) in late 2022, closely following the explosion of global interest in generative AI. The model, which can retrospectively be considered version 1, largely mirrored the architecture of Meta’s Llama 2, incorporating key efficiency enhancements. Notably, the company prioritized computational efficiency, implementing innovations such as group-query attention and depth-favoring architectural choices. These refinements positioned DeepSeek as a formidable player, with its first model already demonstrating competitive performance relative to GPT-3.5 and Llama 3.

    The company continued its steady cadence of advancements, releasing DeepSeek v2 six months later. This iteration introduced several architectural innovations, most notably multi-head latent attention, which significantly improved performance. Additionally, DeepSeek v2 marked the company’s entry into mixture-of-experts (MoE) models, allowing for more computationally efficient scaling. The model, with 236 billion parameters, rivaled or surpassed its Western counterparts, including Meta’s Llama 3 and Mistral, in terms of efficiency.

    Most recently, DeepSeek v3, released in December 2024, and its subsequent reasoning model, DeepSeek-R1, have solidified the company’s position in the global AI landscape. The R1 model, optimized for complex reasoning, demonstrates impressive capabilities, particularly in mathematical and coding tasks. Its prominence on the Chatbot Arena leaderboard underscores its competitive standing, with performance rivalling OpenAI’s GPT-4 and Google’s Gemini models.

    China’s AI Ecosystem Beyond DeepSeek

    While DeepSeek has captured recent attention, it is crucial to recognize that it is not the sole player in China’s burgeoning AI sector. Major technology giants such as Baidu, Alibaba, and Tencent have also developed large-scale models, with Alibaba’s Qianwen (Qwen) series being particularly notable. Meanwhile, emerging players, such as Step AI, have rapidly ascended global leaderboards, further highlighting China’s growing AI capabilities.

    The broader implication is clear: the gap between AI capabilities in China and the West is narrowing at an accelerated pace. The notion that Western firms maintain an insurmountable technological lead is increasingly untenable. Rather, AI development is now characterized by rapid iteration cycles, where even a six-month advantage can be significant but not definitive.

    Technical Innovations in DeepSeek-R1

    DeepSeek’s latest models showcase several technical advancements that have driven their performance gains. Among these, its approach to attention mechanisms is particularly noteworthy. Traditional transformer architectures employ standard QKV attention, but DeepSeek has pioneered a low-rank decomposition method that dramatically reduces computational costs while improving performance. This novel attention mechanism, coupled with its highly efficient MoE implementation and predominant reliance on FP8 precision for training, has enabled DeepSeek to achieve state-of-the-art results at significantly lower compute costs.

    The efficiency gains extend to inference as well. DeepSeek-R1 reportedly operates at approximately 10% of the inference cost of OpenAI’s comparable models. This remarkable cost reduction stems from both architectural optimizations and rigorous engineering at the hardware level, including custom low-level modifications to Nvidia’s CUDA framework. Such efficiency is crucial not only for training but also for real-world deployment, where inference costs often exceed training costs by an order of magnitude.

    Impact on AIX Ventures Investing Strategy

    At AIX Ventures we haven’t been investing in capital-intensive foundation model companies. The capital required results in Seed-stage valuations inconsistent with the risk level and the early-stage return profile we look for. We continue to be excited about and investing in the application layer, including the intersection of AI & Bio.

    Geopolitical and Industry Implications

    DeepSeek’s rise carries significant implications for the AI industry, investment strategies, and global geopolitics. One pressing question is whether leading Western AI firms—such as OpenAI, Meta, and Microsoft—have overinvested in large-scale compute infrastructure. While it is true that DeepSeek has demonstrated a more compute-efficient approach, large-scale infrastructure remains indispensable for ongoing AI research, deployment, and model iteration.

    From a geopolitical standpoint, U.S. export controls on high-end GPUs have had mixed results. While restrictions have somewhat constrained China’s access to cutting-edge hardware, they have also catalyzed domestic innovation. Companies such as Huawei have been developing competitive AI chips, and DeepSeek has already optimized its models for inference on Huawei’s Ascend 910C chips. Although domestic Chinese chips currently lag in interconnect speed, they are rapidly improving, suggesting that hardware constraints will not be a long-term barrier.

    Additionally, there is speculation regarding the extent of DeepSeek’s compute resources. While official reports state that DeepSeek utilized 2,048 H800 GPUs for its latest training runs, it is likely that the company has access to a significantly larger compute pool. However, its ability to achieve state-of-the-art performance with a fraction of the hardware typically used by Western firms underscores a fundamental shift: efficiency in AI model training is now as critical as sheer computational scale.

    The Future of AI Development

    What does this mean for the broader AI landscape? Several key takeaways emerge:

    1. Open-source AI is closing the gap. The rapid iteration of open-source models, particularly in China, suggests that proprietary AI models may struggle to maintain a sustained lead. DeepSeek’s commitment to open-source development ensures that cutting-edge AI capabilities remain widely accessible.

    2. Geopolitical realities necessitate a recalibrated approach. AI development is no longer confined to a handful of U.S. firms. Policymakers and industry leaders must recognize that innovation is globally distributed, and strategies premised on indefinite Western dominance are increasingly untenable.

    3. Efficiency is the new frontier. While scaling compute remains essential, the ability to maximize performance with constrained resources will define the next phase of AI competition. DeepSeek has demonstrated that with rigorous engineering and novel architectural choices, it is possible to achieve state-of-the-art results with substantially reduced costs.

    Conclusion

    In sum, DeepSeek’s rise exemplifies the rapid evolution of AI and the diminishing technological gap between China and the West. While the competitive landscape remains dynamic, one lesson is clear: there is no longer a singular monopoly on AI innovation. Instead, the field is characterized by relentless iteration, broad accessibility, and a redefined global equilibrium.

    With that, I welcome any questions you may have.