Dec 20, 2024 1:00 PM

OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills

A day after Google announced its first model capable of reasoning over problems, OpenAI has upped the stakes with an improved version of its own.

NEW YORK NEW YORK DECEMBER 04 OpenAI CEO Sam Altman Visits Making Money With Charles Payne at Fox Business Network...

OpenAI CEO Sam Altman Visits "Making Money With Charles Payne" at Fox Business Network Studios on December 04, 2024 in New York City.Photograph: Mike Coppola/Getty Images

OpenAI today announced an improved version of its most capable artificial intelligence model to date—one that takes even more time to deliberate over questions—just a day after Google announced its first model of this type.

OpenAI’s new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time ruminating over a problem in order to deliver better answers to questions that require step-by-step logical reasoning. (OpenAI chose to skip the “o2” moniker because it's already the name of a mobile carrier in the UK.)

“We view this as the beginning of the next phase of AI,” said OpenAI CEO Sam Altman on a livestream Friday. “Where you can use these models to do increasingly complex tasks that require a lot of reasoning.”

The o3 model scores much higher on several measures than its predecessor, OpenAI says, including ones that measure complex coding-related skills and advanced math and science competency. It is three times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI models’ ability to reason over extremely difficult mathematical and logic problems they’re encountering for the first time.

Google is pursuing a similar line of research. Noam Shazeer, a Google researcher, yesterday revealed in a post on X that the company has developed its own reasoning model, called Gemini 2.0 Flash Thinking. Google’s CEO, Sundar Pichai, called it “our most thoughtful model yet” in his own post. Google’s new model achieved a high score on SWE-Bench, a test that measures a models’ agentic abilities.

However, OpenAI’s new o3 model is 20 percent better than o1. “o3 blew it out of the water,” says Ofir Press, a post-doctoral researcher at Princeton University who helped develop SWE-Bench. “Very surprising increase, not sure how they did it.”

The two dueling models show competition between OpenAI and Google to be fiercer than ever. It is crucial for OpenAI to demonstrate that it can keep making advances as it seeks to attract more investment and build a profitable business. Google is meanwhile desperate to show that it remains at the forefront of AI research.

The new models also show how AI companies are increasingly looking beyond simply scaling up AI models in order to wring greater intelligence out of them.

OpenAI says there are two versions of the new model, o3 and o3-mini. The company is not making the models publicly available yet but says it will invite outsiders to apply to perform testing of them.

OpenAI today also revealed more details of techniques used to align o1. The new method, known as deliberative alignment, involves training a model with a set of safety specifications and having it reason about the nature of the request as well as its own answer it is given to interrogate whether it may contravene its guardrails. The approach makes the model more difficult to trick into misbehavior because its reasoning process can root out attempts at mischief.

Large language models can answer many questions remarkably well, but they often stumble when asked to solve puzzles that require basic math or logic. OpenAI’s o1 incorporates training on step-by-step problem-solving that makes an AI model better able to tackle these types of problems.

Models that reason over problems will also be important as companies seek to deploy so-called AI agents that can reliably figure out how to solve complex problems on a users’ behalf.

“This really signifies that we are really climbing the frontier of utility,” Mark Chen, senior vice president of research at OpenAI said on today’s livestream.

“This model is incredible at programming,” Atlman added.

While a true breakthrough moment has eluded tech giants at the end of the year, the pace of AI announcements has been dizzying of late.

Early this month Google announced a new version of its flagship model, called Gemini 2.0, and demonstrated it as a web browsing helper and as an assistant that sees the world through a smartphone or a pair of smart glasses.

OpenAI has made numerous announcements in the run up to Christmas, including a new version of its video-generating model, a free version of its ChatGPT-powered search engine, and a way to access ChatGPT over the phone by calling 1-800-ChatGPT.

Update 12/20/24 1:16 pm ET: This story has been updated with further comment and detail from OpenAI.

You Might Also Like …