Part 3/10:
Next, we see the trajectory of performance on Google’s proof question-answering benchmark. This graph, spanning back to July 2023, illustrates a dramatic leap from random guessing accuracy—25%—to a staggering human expert-level performance exceeding 70%. This rapid climb raises valid questions about the timeline for AIs surpassing human capabilities across a variety of tasks, from coding to expert-level domain knowledge.