asane

OpenAI’s research finds that even its best models give wrong answers a wild proportion of the time

Online Niel November 2, 2024

BS generator

OpenAI has launched a new landmarkcalled “SimpleQA”, which is designed to measure the accuracy of the results of its own and competing AI models.

In doing so, the AI company revealed how bad its latest models are at providing correct answers. In its own tests, its flagship model o1-preview, which was released last monthposted an abysmal 42.7 percent hit rate at the new benchmark.

In other words, even the best crop of recently announced large language models (LLMs) are much more likely to give a completely incorrect answer than a correct one—a worrying indictment, especially as technology begins to permeate many aspects of us. everyday lives.

Wrong again

Competing models such as Anthropic scored even lower on OpenAI’s SimpleQA benchmark, with its recently released Claude-3.5-sonnet model getting only 28.9% of questions correct. However, the model was much more inclined to reveal her own uncertainty and refuse to answer — which, given the damning results, is probably for the best.

Worse yet, OpenAI has found that its own AI models tend to vastly overestimate their own abilities, a characteristic that can cause them to be very confident in the fakes they invent.

LLMs have long suffered from “hallucinations,” a fancy term that AI companies have come up with to refer to their models. well documented trend to produce answers that are complete BS.

Despite the very high odds of coming up with complete inventions, the world has embraced the technology with open arms, from students generating homework developers employed by the generating technology giants huge areas of code.

And the crackers start the show. As an example, an AI model used by hospitals and built on OpenAI technology was caught this week introducing hallucinations and frequent inaccuracies while transcribing patient interactions.

Cops in the United States are too is starting to embrace AIa terrifying development that could lead to law enforcement falsely accusing the innocent or promoting disturbing biases.

OpenAI’s latest findings are yet another worrying sign that current LLMs cannot reliably tell the truth.

It’s a development that should serve as a reminder to treat any LLM output out there with a lot of skepticism and a willingness to go over the generated text with a fine comb.

Whether it’s a problem that can be solved with even larger training sets, which is what AI leaders are rushing to reassure investors of — remains o open question.

More about OpenAI: AI model used by hospitals caught making up patient details, inventing non-existent drugs and sex acts