Part 3/9:
To put these models to the test, Apple devised customized puzzles. The models could not look up answers or employ any computational tools, presenting a scenario similar to giving a human an exam without any tools for assistance. The puzzles were selected to be challenging yet unconventionally solvable, avoiding memorized patterns and ensuring that the models could not leverage their training to simply identify answers.