You are viewing a single comment's thread from:

RE: LeoThread 2025-11-04 16-50

in LeoFinance11 days ago

Part 4/11:

What makes these models particularly revolutionary is their architecture based on mixtures of experts (MoE) with sparse activation. Instead of activating all parameters for each inference, only a fraction is engaged—around 3.6 to 5.1 billion parameters per token—vastly reducing computational load. This sparse activation enables fast, energy-efficient inference, enabling real-time processing even on affordable hardware.

Furthermore, supporting context windows of up to 128,000 tokens—vastly larger than current models—allows for understanding and reasoning over exceptionally long inputs. Users can adjust the reasoning depth, toggling between quick responses or complex, multi-step problem solving, all locally.

Transparent Reasoning and Safety Implications