OpenAI Research: Why GPT-5 and Chatbots Still Hallucinate

OpenAI researchers are taking a closer look at one of the most persistent challenges in AI: hallucinations.

In a new research paper, summarized in an OpenAI blog post, the company defines hallucinations as “plausible but false statements generated by language models.” Despite advances in systems like GPT-5 and ChatGPT, the researchers say hallucinations “remain a fundamental challenge for all large language models”, and are unlikely to ever be completely eliminated.

To demonstrate, the team asked a widely used chatbot about the title of researcher Adam Tauman Kalai’s PhD dissertation. It produced three confident answers, all incorrect. The same thing happened when asked about his birthday.

Why do chatbots get these facts wrong with such confidence? According to the paper, the issue partly stems from pretraining: models learn to predict the next word in a sentence, without distinguishing between true and false statements. While consistent patterns like spelling improve with scale, arbitrary low-frequency facts, like a person’s birthday, remain prone to error.

Instead of focusing solely on training, the paper points to a deeper issue: how models are evaluated. Current accuracy-based benchmarks encourage guessing, much like multiple-choice exams where random answers can score points. In this setup, saying “I don’t know” always scores zero, but guessing could pay off.

The researchers argue for a new approach: model evaluations should penalize confident errors more heavily while rewarding uncertainty. In other words, give partial credit when an AI admits it doesn’t know, and deduct more when it makes a bold but wrong claim.

“It’s not enough to introduce a few new uncertainty-aware tests on the side,” the paper warns. “The widely used, accuracy-based evals need to be updated so that their scoring discourages guessing. If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess.”

Trending →

Salesloft GitHub Breach Led to Supply Chain Attack on Big Tech Customers

Whistleblowers Accuse Meta of Suppressing Child Safety Research

Anthropic Endorses California’s SB 53 AI Safety Bill

Signal Introduces Secure Backups and First Paid Plan

Spotify Rolls Out Smart Filters for Personalized Listening

OpenAI Research: Why GPT-5 and Chatbots Still Hallucinate

OpenAI says confident errors in AI come from flawed evaluations, and offers a new way to score models.

You Might Also Like ↷

Grammarly Launches AI-Powered Document Interface with Student and Professional Tools

357% Growth, 1.13 Billion Visits: AI’s Becoming the Internet’s Middleman

Google Finance Gets AI Upgrade With Smart Answers, Advanced Charts, and Real-Time Data

Open AI Seeks Meta Evidence in Musk’s $97B Takeover Lawsuit

Trending →

You Might Also Like ↷

Our Newsletter