You are viewing a single comment's thread from:

RE: LeoThread 2025-10-18 23-22

in LeoFinance15 hours ago

Part 7/11:

  • The device employs specialized hardware support for FP4 (4-bit floating point), enabling efficient model quantization that keeps models small and fast without sacrificing much quality. This means Larry can run models in FP4 mode with close-to-FP8 accuracy—a remarkable feat for such a tiny device.

Why This Matters

While Larry isn't the fastest at inference, especially on singular large models, it excels by offering:

  • Massive shared memory enabling multi-model and multi-task workloads

  • Portability — small enough to carry around, yet powerful enough to train or fine-tune models

  • Cost-efficiency — roughly $315/year in operational costs versus Terry’s over $1,400, due to lower power consumption and size