Part 9/11:
OS World Benchmark: Achieving 38.1% on navigation tasks, compared to a human score of 72.4%.
Web Arena Benchmark: Scoring 58.1% across web navigation challenges, still below human level but surpassing previous models.
Limitations involve occasional inaccuracies, incomplete website access, or unforeseen errors. OpenAI recognizes these constraints and commits to ongoing iteration.
The Road Ahead: Marketplaces and Deployment
Looking forward, OpenAI plans to launch a Marketplace by the end of January 2025, featuring ready-to-deploy AI "desks"—custom agents designed to augment or replace specific employee roles. These agents will be customizable, with same-day setup from the First Movers team.