RE: LeoThread 2025-09-22 18:20

Part 3/12:

Previously, the top models scored around 5-8% below the best. After the tweak, Grok 4 achieved an 80% success rate, outpacing giants like Google and OpenAI, which had previously dominated the benchmark. Astonishingly, this leap transpired without the traditional reliance on massive datasets or extensive training, emphasizing the power of scaling and clever fine-tuning over raw compute.

Anomaly in the Benchmark

Elon Musk himself was reportedly stunned by these results, particularly due to the anomaly where Grok 4's performance suddenly skyrocketed—despite no major update coinciding with the leap. This prompted Musk to reconsider the potential of Grok 5, which could incorporate these breakthroughs and possibly accelerate progress toward AGI within the year.