AI Tech News

WELCOME

Latest Articles

Fresh insights, updated daily.

When Every Model Scores 88%: Why Benchmark Saturation Is Breaking AI Evaluation
Technology

When Every Model Scores 88%: Why Benchmark Saturation Is Breaking AI Evaluation

The Problem No One Wanted to Admit Frontier models now score 88% on MMLU, bumping against the estimated human-expert ceiling of 89.8%. That's the saturation sig...

6 min read

Tracked Data

AI Intelligence Index — Top 3 Frontier Models

See all datasets
01531466105-1706-0106-08Claude Opus 4.7 (Adaptive Reasoning, Max Effort) — Anthropic: 57 (2026-05-17)Claude Opus 4.8 (Adaptive Reasoning, Max Effort) — Anthropic: 61 (2026-06-01)Claude Opus 4.8 (Adaptive Reasoning, Max Effort) — Anthropic: 61 (2026-06-08)61GPT-5.5 (xhigh) — OpenAI: 60 (2026-05-17)GPT-5.5 (xhigh) — OpenAI: 60 (2026-06-01)GPT-5.5 (xhigh) — OpenAI: 60 (2026-06-08)60Gemini 3.1 Pro Preview — Google DeepMind: 57 (2026-05-17)Gemini 3.1 Pro Preview — Google DeepMind: 57 (2026-06-01)Gemini 3.1 Pro Preview — Google DeepMind: 57 (2026-06-08)57
  • Anthropic
  • OpenAI
  • Google DeepMind

Intelligence Index — Trend

Hover over each point to see the specific model version at that date.

Last updated: 2026-06-08 · 3 data points · artificialanalysis.ai

Latest News

See all