Even some of the best AI can’t beat this new benchmark
Every publicly available flagship AI system scored below 11% on Humanity's Last Exam, a benchmark that includes thousands of crowdsourced questions in multiple formats.
Every publicly available flagship AI system scored below 11% on Humanity's Last Exam, a benchmark that includes thousands of crowdsourced questions in multiple formats.