You are viewing a single comment's thread from:

RE: LeoThread 2024-11-09 06:43

in LeoFinance11 months ago

OpenAI Research Finds That Even Its Best Models Give Wrong Answers a Wild Proportion of the Time

OpenAI's latest AI models are shockingly bad at being right.

OpenAI has released a new benchmark, dubbed "SimpleQA," that's designed to measure the accuracy of the output of its own and competing artificial intelligence models.

In doing so, the AI company has revealed just how bad its latest models are at providing correct answers. In its own tests, its cutting edge o1-preview model, which was released last month, scored an abysmal 42.7 percent success rate on the new benchmark.

#openai #simpleqa #ai #technology

Sort:  

In other words, even the cream of the crop of recently announced large language models (LLMs) is far more likely to provide an outright incorrect answer than a right one — a concerning indictment, especially as the tech is starting to pervade many aspects of our everyday lives.

Competing models, like Anthropic's, scored even lower on OpenAI's SimpleQA benchmark, with its recently released Claude-3.5-sonnet model getting only 28.9 percent of questions right. However, the model was far more inclined to reveal its own uncertainty and decline to answer — which, given the damning results, is probably for the best.

Worse yet, OpenAI found that its own AI models tend to vastly overestimate their own abilities, a characteristic that can lead to them being highly confident in the falsehoods they concoct.