Part 2/12:
A pivotal moment was the discovery that Grok 4, an AI model developed by independent researchers, surpassed existing benchmarks without a major model upgrade. Jeremy Berman and Eric Pang, two AI researchers unaffiliated with XAI, fine-tuned Grok 4 using novel techniques—particularly open source program synthesis and test-time adaptations. Their modifications dramatically enhanced Grok 4’s performance, especially on the Arc AGI benchmark, a sophisticated test designed to measure an AI's ability to solve unfamiliar, complex puzzles requiring human-like intelligence.