Part 4/8:
Miniax has circumvented traditional challenges associated with large context windows. The company employs a unique design involving a mixture of experts, which consists of 32 specialized submodels working jointly. At any time, only a small fraction of these submodels is activated, maintaining an efficient workload while leveraging 456 billion parameters.
The introduction of "lightning attention" is another game-changer. This linear technique significantly reduces the computational costs as the sequence length increases, allowing the model to generate extensive responses with minimal resource usage. In fact, for the generation of a 100,000 token response, M1 utilizes only a quarter of the floating-point operations required by its competitors.