📡 Breaking news
Analyzing latest trends...

Google Gemini 3 Deep Think Crushes Reasoning Benchmarks Surpassing Claude 4.6 Opus in Major Leap

Google Gemini 3 Deep Think Crushes Reasoning Benchmarks Surpassing Claude 4.6 Opus in Major Leap
The Battle for Superiority: Gemini 3 Deep Think Outshines Claude 4.6 Opus in Scientific Reasoning.

Google has officially released a significant update for Gemini 3 Deep Think (February 2026 version), showcasing a historic breakthrough in abstract reasoning and scientific problem-solving. This update marks a pivotal moment in the AI race, as Gemini 3 moves closer to human-level intelligence in complex academic domains.

Dominating the ARC-AGI-2 Challenge

The headline of this update is Gemini 3 Deep Think’s performance on ARC-AGI-2 a benchmark specifically designed by François Chollet to test a model’s ability to solve novel, visual-logic puzzles it has never encountered before.

  • Gemini 3 Deep Think: Achieved a verified score of 84.6%.

  • Claude 4.6 Opus: Scored 68.8%. Gemini’s massive lead of nearly 16 percentage points represents a paradigm shift, proving that Google’s "inference-time compute" (giving the model more time to "think" before answering) is effectively saturating what was once considered the hardest test for AI.

Conquering "Humanity’s Last Exam"

Gemini 3 Deep Think also set a new record on Humanity’s Last Exam (HLE), a benchmark consisting of graduate-level questions from 50 different countries.

  • New Score: 48.4% (Up from 37.5% in the previous version).

  • Comparison: It significantly outpaced Claude 4.6 Opus, which stands at 40.0%. This result highlights the model's proficiency in deep scientific knowledge and multi-step academic reasoning.

Availability and API Access

For the first time, Google is making Deep Think available via the Gemini API for select researchers and developers. Additionally, subscribers of Google AI Ultra can now experience the February 2026 update directly within the Gemini app.

This success stems from the Inference-time Optimization technique, which is akin to giving the AI ​​an "inner monologue" a thought process in its mind before delivering the answer. This method significantly reduces errors in complex research problems compared to traditional model scaling.

Beyond engineering and science, Gemini 3 Deep Think achieved a Codeforces Elo score of 3,455, considered a "Grandmaster" level in competitive programming, vastly surpassing Claude 4.6 Opus (2,352 Elo).

In multimodal understanding, Gemini 3 also dominated with an 81.5% score on MMMU-Pro, demonstrating its proficiency not only in text but also its ability to accurately analyze complex scientific and physics diagrams.

Data indicates that running a single task on ARC-AGI-2 in Deep Think mode costs $13.62 per task, reflecting the fact that despite its intelligence, a careful consideration of the economic trade-off is necessary for production-level implementation. 

 

Google Debuts AI-Powered Shopping Seamless Checkout and Interactive Ads via Gemini and Search

 

Source: Google

💬 AI Content Assistant

Ask me anything about this article. No data is stored for your question.

Comments

Popular posts from this blog

Chrome for Android Shatters Speed Records Toppling Safari in New Benchmarks.

OpenAI Abandons Video The Shocking Shutdown of Sora and the $1B Disney Deal.

Netflix U.S. Prices Climb Again Premium Tier Reaches New Peak of $26.99.

Mistral AI Secures $830M to Fuel Paris Data Center Expansion with NVIDIA GB300.

Android 17 Beta 3 is Here Universal Windowing and the Return of the Wi-Fi Toggle.

Intel Arc Pro B70 and B65 Redefine Entry-Level AI Hardware.

LA Jury Rules Meta and YouTube Intentionally Addicted Millions.