RE: LeoThread 2025-11-09 20-32

Part 5/11:

Online Mode: The model can abort reasoning early if confidence in a step drops below a set threshold, preventing wasted computation on doomed solutions. This approach can save between 43% to 85% of tokens—significantly reducing resource use—while often improving accuracy.

Record-Setting Performance on the AIM 2025 Math Exam

The results of this approach are astonishing. Utilizing the GPT OSS 120B model—specifically trained for mathematical reasoning through curriculum learning and exposure to specialized math data—Meta AI achieved the following:

Initial pass accuracy: 91.8%
Traditional majority voting (without Deep Comp): 97%
Deep Comp with confidence filtering: 99.9% accuracy, with a token savings of 84.7%