RE: LeoThread 2025-02-03 09:39

Part 3/7:

Deeps’ advances represent a trend observed in the tech industry, where efficiency often triumphs over sheer size and processing power. The pressures faced by Chinese tech companies, due to restrictions imposed by the U.S. and its allies concerning advanced chips, have compelled them to innovate within more constrained frameworks.

Deeps' methodology began with a massive model, V3, that featured 675 billion parameters, followed by a more refined approach utilizing a technique known as a “mixture of experts.” This technique resembles a system of multiple experts specializing in various domains, which allows the model to allocate processing power more effectively. Each expert uses fewer parameters—between 35 to 37 billion—to achieve optimal throughput while conserving resources.