For over half a century, the International Mathematical Olympiad (IMO) has stood as the ultimate crucible for young intellects. It is a competition where the brightest teenagers on Earth convene to solve six problems of such excruciating difficulty that even professional mathematicians often struggle to crack them. For decades, the "Gold Medal" standard—solving roughly five out of six of these problems under strict time limits—was considered a bastion of human intuition, creativity, and spark that machines simply could not replicate.
That bastion has fallen.
As we look back from early 2026, the last two years have rewritten the history of artificial intelligence. We have witnessed a transition so rapid it felt less like a climb and more like a phase shift. In 2024, AI scraped the surface of greatness with a Silver medal. In 2025, it claimed the Gold.
This is the story of how silicon learned to reason, not by crunching numbers, but by learning the art of the proof.
The Grand Challenge
To understand the magnitude of this achievement, one must understand the IMO. Unlike a standard math test, you cannot pass the IMO by memorizing formulas. The problems are novel, abstract beasts that require "proofs"—logical arguments written in prose and symbols that establish a truth with absolute certainty.
A typical problem might ask a student to prove a property of a function for all rational numbers, or to find a geometric configuration in a 40-sided polygon. There is no recipe. Solving them requires a flash of insight—a "magic key" that unlocks the logic—followed by pages of rigorous justification.
For years, AI was terrible at this. Large Language Models (LLMs) like GPT-4 were impressive conversationalists but prone to "hallucination"—confidently stating that 2+2=5 if the context nudged them that way. They lacked the ability to self-correct, to backtrack, and most importantly, to reason formally without stumbling.
Then came the summer of 2024.
2024: The Silver Breakthrough
In July 2024, Google DeepMind unveiled two systems that would change the game: AlphaProof and AlphaGeometry 2.
They entered the arena not as a single monolithic brain, but as a specialized team. AlphaGeometry 2 was a "neuro-symbolic" hybrid, designed specifically to see lines, circles, and angles. It didn't just guess; it used a language model to suggest "auxiliary constructions"—like drawing a helpful extra line in a triangle—and then passed those ideas to a symbolic engine that checked the logic with the rigidity of a computer program. It solved the geometry problem in the 2024 IMO in a blistering 19 seconds.
But the real star was AlphaProof.
AlphaProof took a different approach. It didn't try to write proofs in English, which is messy and ambiguous. Instead, it translated the IMO problems into Lean, a formal programming language used by mathematicians to verify theorems. Once the problem was in code, AlphaProof used reinforcement learning—the same technique that taught AlphaGo to master Chess—to play the "game" of math. It would generate millions of potential proof steps, constantly checking if they compiled correctly in Lean. If a step was valid, it was rewarded. If it was nonsense, it was discarded.
The result was a shock to the system. AlphaProof and AlphaGeometry 2 combined to solve 4 out of 6 problems, scoring 28 points. This was a Silver Medal score.
The "Impossible" Problem 6
The most terrifying feat of the 2024 AI was its conquest of Problem 6. In the lore of the IMO, Problem 6 is the "boss fight"—traditionally the hardest problem of the competition.
The 2024 Problem 6 involved "aquaesulian functions," a bizarre functional equation asking for the properties of a function $f$ from rational numbers to rational numbers satisfying a complex equality.
Only five human contestants out of over 600 managed to solve it. It required a non-intuitive leap of logic that seemed fundamentally "human." AlphaProof solved it. It didn't just solve it; it generated a rigorous, machine-verified proof that no human could poke a hole in. It was the first time an AI had outperformed the vast majority of gold-medal humans on the hardest problem of the test.
However, the 2024 systems had a weakness. They were slow. AlphaProof took days to find some solutions, churning through massive compute resources. And they failed to solve the combinatorics problems—puzzles involving counting and arrangement—which rely less on formal structure and more on chaotic creativity.
The world wondered: was this a fluke? Or was it the beginning of the end for human dominance in mathematics?
2025: The Golden Standard
The answer came one year later, at the 2025 IMO in Queensland, Australia.
If 2024 was the year of the "Specialist" (using translation to formal code), 2025 was the year of the "Generalist."
Google DeepMind returned with Gemini Deep Think, an advanced version of their multimodal model. OpenAI joined the fray with a new experimental reasoning model (the precursor to the o-series).
The results were historic.
- Gemini Deep Think: 35 / 42 Points (Gold Medal)
- OpenAI's Model: 35 / 42 Points (Gold Medal)
Both systems solved 5 out of 6 problems. The "Silver" ceiling was shattered. But the way they did it was even more revolutionary than the score.
The Death of Translation
Unlike AlphaProof, which relied on translating math into the Lean coding language, the 2025 models worked primarily in natural language. They read the problem in English and wrote the proof in English, just like a human student.
They achieved this through a paradigm called Test-Time Compute.
In the past, AI models were trained to spit out an answer instantly. "What is the capital of France?" -> "Paris." But math doesn't work like that. If you ask a mathematician to solve an IMO problem, they don't answer in milliseconds. They think. They try a path, hit a dead end, backtrack, try a new angle, and scribble on scratchpad.
The 2025 models were designed to "think" for hours. DeepMind's "Deep Think" mode allowed the model to generate thousands of internal "chains of thought," branching out like a tree of possibilities. It could recognize when it was going down a wrong path, discard that branch, and pivot to a new strategy.
It wasn't just predicting the next word; it was simulating the process of problem-solving.
The Final Frontier: What They Still Can't Do
Despite the Gold medals, there was one glaring "0" on the scorecards of both AI giants in 2025.
Problem 6.Once again, the hardest problem—a combinatorics challenge—remained unsolved by the machines. While they had mastered algebra, geometry, and number theory, the chaotic, open-ended nature of combinatorics proved elusive.
Combinatorics often requires a specific type of visualization or a "coloring argument" (e.g., "imagine painting the grid like a chessboard") that is incredibly difficult to deduce from pattern matching alone. The AIs could follow logical rules perfectly, but they still struggled to invent a brand-new "game" within the problem, which is often what combinatorics demands.
This failure highlights that we haven't reached "Artificial General Intelligence" (AGI) quite yet. The machines are brilliant at navigating complex logical structures, but they still lack a specific spark of unstructured creativity that human geniuses possess.
The Era of "Post-Truth" Mathematics?
The implications of this mastery extend far beyond a high school competition. We are entering an era of Automated Theorem Proving.
For centuries, mathematics has been a solitary pursuit. A mathematician spends months verifying a lemma. Now, we have systems that can verify the logic of a proof instantly (using tools like Lean) or generate novel proofs for complex statements (using Deep Think).
This changes the definition of "truth." In the past, a theorem was true because a human community agreed the proof looked correct. Now, a theorem can be true because a machine has verified it against the axioms of logic, with a certainty no human can match.
We are already seeing this in 2026. Mathematicians are not being replaced; they are being augmented. They act as "architects," defining the high-level conjectures and the "big picture" strategy, while AI agents act as the "builders," filling in the rigorous logical bricks to construct the palace.
Conclusion
The journey from the Silver of 2024 to the Gold of 2025 was not just an improvement in benchmarks; it was a transformation in the nature of synthetic thought.
We taught silicon to speak. Then we taught it to see. Now, we have taught it to reason.
The Silicon Proof is no longer just an experiment. It is a partner. As we look at the unsolved problems of physics, biology, and cryptography, we are no longer staring at them alone. We have built a mind that can think alongside us, tireless and terrifyingly brilliant, ready to solve the equations of our future.
Reference:
- https://gregrobison.medium.com/from-silver-to-gold-an-in-depth-analysis-of-googles-gemini-deep-think-and-its-landmark-d3e07f9368cf
- https://www.lesswrong.com/posts/TyCdgpCfX7sfiobsH/ai-achieves-silver-medal-standard-solving-international
- https://medium.com/@lextrackai/ais-gold-medal-moment-at-the-imo-from-scores-to-methods-what-s-truly-been-redefined-9e15f38905ce
- https://simonwillison.net/2025/Jul/19/openai-gold-medal-math-olympiad/
- https://www.lesswrong.com/posts/RcBqeJ8GHM2LygQK3/openai-claims-imo-gold-medal
- https://deepmind.google/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/
- https://www.cbsnews.com/news/humans-beat-ai-technology-google-openai-math-olympiad-machines-catching-up/
- https://www.axios.com/2025/07/21/openai-deepmind-math-olympiad-ai
- https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/
- https://the-decoder.com/google-deepminds-gemini-wins-mathematical-olympiad-gold-using-only-natural-language/
- https://garymarcus.substack.com/p/deepmind-and-openai-achieve-imo-gold
- https://www.youtube.com/watch?v=ydnmT8B68Co
- https://www.youtube.com/watch?v=5FMpqA2CELw
- https://www.youtube.com/watch?v=7h3gJfWnDoc
- https://www.youtube.com/watch?v=36HchiQGU4U
- https://intuitionlabs.ai/articles/ai-reasoning-math-olympiad-imo
- https://rits.shanghai.nyu.edu/ai/ai-wins-gold-at-2025-international-mathematical-olympiad/