Discussion about this post

User's avatar
praxis22's avatar

There is a certain sense of psyop theatre to the rush of coverage, I doubt many people/journalists outside the industry understood the details however. Mostly I think it sold well as a story, as on it's face, it proved that you could do more with less. Regardless of whether you believe the numbers. It's obvious that DeepSeek has innovated in a space and place that few expected. The very nature of the Chinese tech giants seems to be low paid drudge work and rigid structures. Compared to the collegiate atmosphere spoken about at DeepSeek. That and they gave away the details of how they did what they did in the paper they published. You don't need the code, as the real innovation was in process. At least from my understanding of it. V3 came out to almost no fanfare in December, it was R1 that caused the ruckus. Decent model as it goes, I have a version distilled into Llama 3.1 70B at 5Q_KM running on a 3090 it's very verbose and sea change compared to what came before.

https://thesequence.substack.com/p/the-sequence-opinion-489-crazy-how (paid)

Hand coding PTX and NCCL on GPU allowed DeepSeek R1 to efficiently train its massive model on a cluster of 2,048 H800 GPUs over just two months.

Expand full comment
Michael Spencer's avatar

DeepSeek in China is the ah-ha moment some of us experienced with ChatGPT. Young people using it for therapy for example, is not uncommon. It more or less disrupted other apps there, not just globally.

I'm not sure Western people can fully understand what DeepSeek means tbh. DeepSeek is a rally cry for China AI, open-source and many other things including China's homegrown semi independence and national strategy.

Expand full comment
4 more comments...

No posts