When ChatGPT exploded in 2022, it felt like magic. But behind the curtain, a quiet squad of OpenAI researchers, led by people like Hunter Lightman, was laser-focused on something far less glamorous: high school math competitions. Like GPT-5
That team, now known as MathGen, was not chasing viral fame. They were training OpenAI’s models to reason, to work through problems like humans do, rather than just predicting the next likely word. It is paying off. Fast forward to today, and OpenAI has built some of the world’s most advanced AI reasoning models. One of them even snagged a gold medal at the International Mathematical Olympiad (IMO), a flex that’s both nerdy and revolutionary.
These reasoning models are powering what OpenAI wants to build: AI agents that do not just chat, but act, booking tickets, writing code, planning your day, or even just solving tough problems on the fly.
“Eventually, you’ll just ask the computer for what you need, and it’ll do it,” said Sam Altman. That dream? It’s not far off.
The Agent Era Is not an Accident
While ChatGPT was a viral surprise, OpenAI’s agent tech has been a years-long grind. The 2024 debut of the o1 reasoning model was a major leap, and Silicon Valley noticed. Meta poached five of the 21 researchers behind it, dangling offers as high as $100 million. One, Shengjia Zhao, now leads Meta’s Superintelligence Labs. No pressure.
What is Powering GPT-5?
It is not just raw compute. The secret sauce is reinforcement learning (RL), a method of training where models get rewarded for being right (or punished for being wrong). Combined with large language models and a trick called “test-time computation,” OpenAI taught its models not just to answer but to pause, think, and check their work. They even named a breakthrough system “Strawberry,” because why not.
Related: From Chat To Checkout ; OpenAI Wants You To Shop Without Leaving ChatGPT.
“It felt like reading the thoughts of a person,” said OpenAI researcher Ahmed El-Kishky, describing how the model learned to backtrack when it made mistakes. Creepy? A little. Impressive? Absolutely.
Whether these models are really reasoning like humans is still up for debate. But OpenAI argues: if the model can solve hard problems better than ever before, does it matter how it gets there? Other researchers agree. Like Nathan Lambert from AI2, who says it is kind of like comparing airplanes to birds. They don’t flap their wings, but they still fly, and fast.
The Next Challenge: Subjective Tasks
Right now, agents do best with black-and-white problems like coding. But shopping? Trip planning? Parking? Not so much. Lightman says the future lies in training models on less-verifiable, more subjective tasks, the stuff humans do all day without realizing how complex it is.
“We have some leads on how to do these things,” he hints.
One such lead: a new OpenAI model that spawns multiple agents to tackle a problem in parallel, then picks the best answer. (Think of it like an internal debate team with one job: outthink each other.)
And it’s not just OpenAI. Google and Elon Musk’s xAI are chasing similar ideas. So expect the next generation of AI agents to be smarter, faster, and much better at not acting like an idiot when you ask them to book a hotel.
What This Means for You
With GPT-5 on the horizon and agents that may soon understand your needs before you even ask, OpenAI is moving toward the holy grail: an AI assistant that just gets you. Not just a chatbot. Not just a tool. A digital brain that works with you. But the question remains: Will AI agents ever truly “understand” what we want, or just get better at faking it?