DeepSeek FAQ
DeepSeek's R1 model, which is similar to OpenAI's o1, caused a meltdown in the tech industry over the weekend. Many of the revelations that contributed to the meltdown were included in DeepSeek's announcement over Christmas and most of the breakthroughs in V3 were actually revealed with the release of the V2 model last January. DeepSeek-V2 introduced two important breakthroughs, DeepSeekMoE and DeepSeekMLA - the implications of these breakthroughs only became apparent with V3, which added a new approach to load balancing and multi-token prediction in training. The cost of training the model amounted to only $5.576 million, but that excludes the costs associated with prior research, experiments, and data.
#technology #ai #deepseek