You are viewing a single comment's thread from:

RE: LeoThread 2025-11-09 20-32

in LeoFinance14 days ago

Part 6/16:

A common issue with reasoning models is their tendency to ramble or generate endless text, especially when handling lengthy tasks. Hermes 4 addresses this by a specialized fine-tuning stage aimed solely at learning when to stop. The team generated extensive reasoning traces, inserted precise stopping points, and retrained the model to recognize the optimal moment to conclude its output. The results were impressive: runaway generations decreased by up to 80%, with only marginal drops (around 5-12%) in accuracy across benchmarks.

Exemplary Benchmarks and Ethical Alignment

Hermes 4’s performance on various benchmarks underscores its capabilities:

  • Math 500: 96.3%

  • AIME 24: 81.9%

  • AIME 25: 82.1%

  • GPQA Diamond: 70.5%

  • Live Codebench: 61.3%