Adaptive Reasoning in Claude 4.6+: Why Effort Levels Replace Token Budgets for Agentic Workflows
The Paradigm Shift: From Fixed Budgets to Dynamic Effort
Adaptive Thinking is a mode introduced with Opus 4.6 where the model autonomously decides how much to reason for each conversation turn. This represents a fundamental departure from earlier reasoning approaches. Rather than requiring developers to choose between all-or-nothing thinking, the key insight is that reasoning depth should be a function of task complexity, not a fixed pipeline parameter.
The problem it solves is straightforward but persistent: prior approaches forced developers to choose between always paying for thinking (wasteful on simple tasks) or never enabling it (leaving performance on the table for hard tasks). A classification task doesn't warrant the same computational depth as multi-step code generation, yet fixed-budget systems treated both identically.
How Adaptive Reasoning Actually Works
Claude evaluates the complexity of each request and determines whether and how much to use extended thinking. At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems.
This is not magic. Adaptive thinking relies on the model's estimate of task complexity, which is influenced by how you phrase the task. Vague prompts get categorized as simple tasks and receive less internal reasoning. Concrete prompts that name specific files, invariants, or constraints get categorized as complex and receive more. The model's complexity estimator reads cues in your prompt and adjusts behavior accordingly.
The Four Effort Levels
Alongside adaptive thinking, Claude 4.6 exposes an explicit effort parameter with four levels: This creates a new optimization pattern: instead of routing between models, you route between effort levels on a single model.
- Low: Pay up to 90% less in simple tasks (using Low) and reserves maximum power for when you really need it.
- Medium: Claude may skip thinking for simpler problems. The middle ground for moderate complexity.
- High (default API): The API default is high (not medium). At high and max effort levels, Claude may think more extensively and can be more likely to exhaust the max_tokens budget.
- Max: Maximum reasoning depth for the hardest problems.
The Economics: Sonnet 4.6 as the Efficiency Play
The effort parameter shifts economics at a tier below the model choice. Claude Opus 4.6 costs about $5 per million input tokens and $25 per million output tokens, while Claude Sonnet 4.6 costs roughly $3 per million input tokens and $15 per million output tokens.
But the real story is performance-per-dollar at different effort levels. Sonnet 4.6 at medium effort scores 79.6% on SWE-bench Verified, just 1.2 points behind Opus 4.6's 80.8% at high effort. At $3/$15 per million tokens versus Opus's $5/$25, that is roughly 60% of the cost for 98.5% of the capability on the industry's most-watched coding benchmark.
This changes the agent routing conversation. For two years, every serious agent pipeline used the same architecture: route simple tasks to a cheap model, hard tasks to an expensive one. Anthropic just made that pattern optional. Now a single Sonnet instance can handle both simple and complex work by adjusting effort, without incurring model-switching overhead.
Task Budgets: The Companion Layer for Long-Horizon Agents
Effort controls depth per step, but agentic workflows span many steps. Task budgets let you tell Claude how many tokens it has for a full agentic loop, including thinking, tool calls, tool results, and output. The model sees a running countdown and uses it to prioritize work and finish gracefully as the budget is consumed.
Effort controls how deeply Claude reasons per step. Task budgets control how much total work Claude does across an agentic loop. The two are complementary: effort tunes depth, task budgets tune breadth.
In practice: Adaptive thinking naturally scales down as the budget depletes. As your token budget runs low, the model throttles reasoning automatically, finishing gracefully rather than hitting a hard wall.
What Changed From the Default Shift Controversy
In early 2026, the field noticed a regression: Developers who spend all day in Claude Code, prompt engineers running complex agent loops, and researchers pushing Claude Opus 4.6 on long-horizon tasks have all described the same subtle but persistent sensation. The model still works. The model still reasons. But the reads-per-edit ratio has dropped, the multi-step plans feel thinner, and the tokens seem to vanish faster than they used to.
Anthropic did not remove the high effort level. They only moved the default. You can set the effort level back to high on a per-request basis through the API, through the system-prompt configuration, or through your Claude Code settings depending on which surface you use.
The practical fix for power users was simple: For most power users this single change is enough to resolve the perceived regression. But it exposed a second dynamic: Prompt design relies on the model's estimate of task complexity, which is influenced by how you phrase the task. Vague requests get low effort; explicit ones trigger deeper reasoning.
Real-World Impact: SRE Workflows and Beyond
In production SRE workflows, adaptive reasoning shows a different value. The reason: adaptive thinking allocates reasoning budget dynamically, minimal overhead during data collection, full depth when building a diagnosis. A system investigating infrastructure events spends light reasoning cycles gathering logs and signals, then switches to maximum reasoning depth when correlating root causes.
Sonnet-4.6 performed similarly to Opus-4.6 on root cause accuracy, and in a few cases even beat it. Both models comfortably outperformed Opus-4.5 on our hardest investigations, but Sonnet-4.6 does it at about 40% less per token.
Claude Code and Multi-Agent Scenarios
In Claude Code, you control the effort level in 3 ways: Automatic (default): Claude detects complexity and adjusts it himself. It works well in 80% of cases.
For teams running multi-agent workflows, cost compounds. Agent Teams spawn multiple Claude Code instances, each with its own context window. A 3-agent team uses roughly 7x more tokens than a standard single-agent session, because each teammate maintains its own context window and runs as a separate Claude instance. Here, effort control becomes essential: delegate simple tasks to Sonnet at low effort, reserve Opus at high effort for bottleneck decisions.
Opus 4.8: Five Effort Levels and Fast Mode
You can also choose "extra" ("xhigh" in Claude Code) or "max", and the model will spend more tokens to get better results. Opus 4.8 introduces finer-grained control. On May 28th, Anthropic launched Claude Opus 4.8, their most capable generally available model to date. They claim a "modest but tangible improvement" over Opus 4.7, with improvements around long-running agents, coding, enterprise workflows, financial analysis, cybersecurity, and multimodal reasoning.
A new feature— Opus 4.8 accepts system messages inside the conversation history after a user turn. This means agents can update instructions during a long-running task without restating the full system prompt, preserving prompt cache hits on earlier turns and reducing input cost on agentic loops. —further optimizes long-horizon workflows.
How Adaptive Reasoning Differs From Fixed Budgets
| Dimension | Fixed Token Budget (4.5 and earlier) | Adaptive Effort (4.6+) |
|---|---|---|
| Decision Model | Developer pre-configures a static limit (e.g., 32k tokens) | Model assesses each request; developer sets effort level as a signal |
| Cost on Easy Tasks | Full budget consumed even for trivial classification | Low/medium effort skips thinking entirely |
| Performance on Hard Tasks | Limited by pre-set ceiling; may undershoot | Max effort provides open-ended reasoning depth |
| Interleaved Reasoning in Agents | Must use separate beta header; non-standard | Adaptive Reasoning automatically enables Interleaved Thinking, allowing reasoning between tool calls. This is the structural reason it is the correct paradigm for agentic workflows: the model can think, call a tool, see the result, think again, and proceed. |
| API Migration Path | N/A—was the standard | Claude Opus 4.7 and later require adaptive thinking — the legacy fixed-budget API is no longer accepted. Deprecated on Claude 4.6. Removed on Claude Opus 4.7 and later: requests with type: 'enabled' return a 400 error. |
Prompt Design: The Hidden Control Surface
Adaptive reasoning respects explicit signals in your prompt. The two main tools you have to influence it are the effort level and prompting. At higher effort, the model thinks more often and more deeply. You can also nudge it with instructions in your prompt: "Think carefully and step by step before responding" pushes it toward more thinking, and "Prioritize responding quickly rather than thinking deeply" pushes it the other way.
This means prompt engineering takes on a new layer. A vague request ("Analyze this code") will auto-detect as simple and receive low reasoning. A precise request ("Review /src/auth/jwt.ts for timing-attack vulnerabilities in the HMAC comparison function") signals complexity and triggers deeper thinking—without changing the effort parameter.
Billing and Summarization: Full Reasoning, Condensed Display
You are charged for the full thinking tokens, not the summarised output. The billed output token count will not match the response token count. The model does the full reasoning; you see the summary; you pay for the full reasoning.
This is a critical detail for cost forecasting. The displayed summary is compressed; your invoice reflects the raw reasoning depth. When thinking is on, Claude produces a summary of its internal reasoning that you can see in the response. On older models, this summary was shown by default. On Opus 4.7, the summary is hidden unless you explicitly ask for it.
Limitations and Caveats
Adaptive thinking isn't magic. The model can overthink, especially at max effort. It can also skip thinking when you wish it hadn't. The complexity estimator is not perfect. Atypical problem structures, proprietary domains, or ambiguous requirements can trigger misclassification.
Additionally, the problem is that adaptive thinking depends on the model's own estimate of task complexity, and that estimate is not perfect. For workflows where reasoning transparency and auditability matter, this introduces a gap: you don't control which turns the model decided to reason on, only the effort level that influences that decision.
When to Use Each Effort Level: A Decision Framework
- Low effort: Stateless tasks—formatting, templating, classification without ambiguity, routine extractions. Suitable when you've validated the prompt works reliably on simpler versions.
- Medium effort: Balanced workflows where 80% of requests are routine and 20% are complex. Claude auto-routes; you pay a middle-ground cost.
- High effort (API default): Coding tasks, multi-step problem-solving, synthesis of ambiguous requirements, anything that could benefit from visible reasoning.
- Max effort: High-stakes decisions, novel problem structures, competitive benchmarks, or when you've validated higher effort materially improves accuracy for your use case.
Key Takeaways
- The model API, response format, error handling, and integration code stay identical. Only a single parameter changes. Migration friction is low.
- Effort levels enable single-model agent pipelines; model routing (Sonnet vs. Opus) becomes optional for many workflows.
- For agentic workflows spanning multiple steps, Task budgets work best when you want Claude to self-regulate token spend on long-horizon tasks, have a predictable per-task cost or latency ceiling to enforce, and want the model to finish gracefully (summarize findings, report progress) as it approaches the budget rather than cutting off mid-action.
- Prompt design directly influences adaptive reasoning; specificity triggers depth without API changes.
- You pay for full reasoning tokens regardless of display; cost forecasting must account for the raw thinking budget, not the visible summary.
What's Next: Broader Adoption and Open Questions
The shift to effort-based reasoning reflects a broader industry trend. GPT 5 runs a system that routes between fast and deliberate paths and exposes developer controls to tune thinking time. In production, the system automatically switches modes; developers can cap or elevate effort as needed. This convergence suggests effort-based control will become a platform-level primitive.
For teams currently building agentic systems, the practical implication is straightforward: effort levels are not optional optimization—they're the correct abstraction for reasoning control in long-horizon workflows. Start with adaptive (the default), measure token usage by effort level in your analytics, and iterate based on cost-per-completion metrics rather than hunches.