Part 5/11:
The impact of this approach is underscored by impressive performance metrics. Starting with Meta's Llama 3-70B instruct model, the Self-Taught Evaluator improved the model's accuracy on the Reward Benchmark from 75.4% to 88.3% after several iterations. Remarkably, this was achieved without using any human-labeled data, representing a breakthrough in autonomous learning.
Further boosting the score to 88.7% with a "majority vote" method, the system now surpasses many previous models relying on labor-intensive human annotations. These results demonstrate that AI can reliably teach itself, maintaining high standards of performance independently.