You are viewing a single comment's thread from:

RE: LeoThread 2024-10-22 09:10

in LeoFinance11 months ago

Ofir Press, a postdoctoral researcher at Princeton University who helped develop SWE-bench, says that agentic AI tends to lack the ability to plan far ahead and often struggle to recover from errors. “In order to show them to be useful we must obtain strong performance on tough and realistic benchmarks,” he says, like reliably planning a wide range of trips for a user and booking all the necessary tickets.