AI Outsmarts 30 of World's Top Mathematicians at Secret Meeting in California

(Image credit: Yuichiro Chino via Getty Images)

Over a weekend in mid-May, a secret math conclave took place. Thirty of the world’s most renowned mathematicians gathered in Berkeley, California, some from as far away as the United Kingdom. The group competed against a “logical” chatbot, which was given problems they had created to test its mathematical skills. After two days of intense interaction with the bot, the researchers were amazed that it could solve some of the most difficult problems that could be solved. “Some of my colleagues actually say that these models are approaching mathematical genius,” says Ken Ono, a mathematician at the University of Virginia who chaired and served on the judging panel for the event.

The chatbot used runs on the o4-mini platform, which is a large-language model (LLM) trained by OpenAI to perform complex inferences. Google’s Gemini 2.5 Flash has similar capabilities. Like the LLMs that previous versions of ChatGPT were based on, o4-mini is trained to predict the next word in a sentence. However, compared to these earlier LLMs, o4-mini and its peers are lighter and more adaptive, training on specialized datasets with more human support. This approach allows for a chatbot that can analyze complex mathematical problems more deeply than traditional LLMs.

To track o4-mini’s progress, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, with generating 300 math problems for which solutions had not yet been published. Even traditional LLMs can correctly answer many complex math questions. However, when Epoch AI posed several of these questions to different models than the ones they were trained on, the most successful ones were able to solve less than 2%, indicating the limited reasoning capabilities of these LLMs. But o4-mini showed very different results.