You are viewing a single comment's thread from:

RE: LeoThread 2025-02-07 13:28

in LeoFinance8 months ago

Part 4/10:

DeepSeek has employed a modified attention mechanism known as Multi-Layer Attention (MLA) which allows for substantial reductions in the memory demanded by traditional transformer architectures. Though the model still operates under a quadratic complexity framework, the efficiency gains are striking, resulting in estimated savings of 80-90% in memory pressure during operations.

Understanding the Cost Disparity: DeepSeek vs. OpenAI