Part 3/8:
Paperbench comprises a collection of 20 recent ML research papers diversified across 12 topics related to the International Conference on Machine Learning (ICML). Each paper is accompanied by a rubric, meticulously crafted in collaboration with the respective authors, ensuring that the evaluation process adheres to strict quality standards.
The challenge faced by AI agents lies in not only understanding the research material but also constructing the codebase from scratch, executing experiments, and troubleshooting issues seamlessly. Achieving this level of proficiency is critical; merely having access to the papers is inadequate. Through Paperbench, AI is expected to demonstrate a robust ability to deliver verified outcomes independent of prior implementations.