Anthropic claims the new 3.5 Sonnet is simply a stronger, more robust model that can do better on coding tasks than even OpenAI’s flagship o1, per the SWE-bench Verified benchmark. Despite not being explicitly trained to do so, the upgraded 3.5 Sonnet self-corrects and retries tasks when it encounters obstacles, and can work toward objectives that require dozens or hundreds of steps.
You are viewing a single comment's thread from: