Your Board and the Pace of Change - plus Chian AI
A quick update to my little article on China triggered from digging in Singapore surrounded by Chinese EVs... including the DeepSeek news and why that matters to your business.
The last sentence is about the future or your business sector... yes, all business sectors.
The article is below, and here's the DeepSeek update that ironically I got Claude to help me with. Better in several ways and 30x cheaper...yep 30x cheaper.
30x is me going for a stroll versus a bullet train. 30x is ten years of Moore's Law.
30x is the low-end estimate...see below.
30x is going from the long bow to the M134 mini gun (yes I adjusted for operating cost)
Your board should be discussing how you succeed in a world that changes this fast. Not a chance you are prepared and probably also not preparing.
Top 5 DeepSeek V3.1 Advances
1. Hybrid Reasoning Architecture
The Innovation: DeepSeek V3.1 integrates reasoning and non-reasoning functions into a single model with "hybrid inference: Think & Non-Think — one model, two modes". V3.1 supports both paradigms in a single model and uses a pair of chat templates to toggle between the two.
Why It Matters: Previous generations required separate models for different capabilities, but V3.1 seamlessly switches between normal chat and deep reasoning modes without losing context.
2. Dramatic Cost Advantage
The Numbers: DeepSeek Chat costs about $0.27 per 1 million input tokens and $1.10 per 1 million output tokens, while GPT-4o costs $2.50 per million input tokens and $10 per million output tokens - making DeepSeek roughly 29.8x cheaper than GPT-4o.
Real-World Impact: DeepSeek V3.1 achieves 68 times cheaper cost than Claude Opus with total testing cost of only about $1, and businesses using AI at scale can save 98%+ on processing costs with DeepSeek vs. ChatGPT.
3. Superior Coding Performance
Benchmark Results: DeepSeek V3.1 achieves 71.6% pass rate in Aider programming tests, surpassing Claude Opus and in coding benchmarks such as SWE-bench Verified, the model outperformed DeepSeek's R1-0528 with a score of 66.0%, compared to R1's 44.6%.
4. Massive Scale with Efficiency
Architecture: DeepSeek V3.1 is a massive 685-billion-parameter model, an increase from its 671B predecessor, with a Mixture-of-Experts (MoE) architecture that activates only 37 billion parameters per token.
Training Efficiency: DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training, compared to Llama 3.1's 30.8M hours - trained for 1/18th the cost of GPT-4o.
5. Extended Context and Open Source Accessibility
Context Window: The V3.1 has a longer context window of 128,000 tokens, meaning it can consider a larger amount of information for any given query and maintain longer conversations with better recall.
Accessibility: The license is MIT (that's new - previous DeepSeek v3 had a custom license), making it freely available for commercial use, while the model can run at >20 tokens/second on a 512GB M3 Ultra Mac Studio ($9,499 of consumer-grade hardware).
DeepSeek's approach fundamentally challenges assumptions about how frontier AI systems should be developed and distributed by making advanced capabilities freely available while potentially undermining competitors' ability to maintain high margins.