RE: LeoThread 2025-11-05 23-35

Part 8/15:

Inference, or deploying trained models to generate predictions or decisions, tends to be lumpier—with variable workloads depending on user activity, environment complexity, and model optimization. It's also more cost-sensitive, as inference workloads are performed billions of times over.

Cutress notes that most revenue in AI comes from inference, which is more flexible but challenging to optimize at hardware level due to lumpy and unpredictable workloads. Hardware designed for training, such as Tesla’s Dojo, must be adaptable to inference demands—yet industry-wide, hardware architectures struggle to balance these needs efficiently.