Learning Objectives
- Understand what extended thinking is and when it provides value
- Configure thinking budgets and effort levels for different task types
- Use adaptive reasoning on Opus 4.6 to balance speed and quality
- Identify when to use high vs. low effort thinking
- Optimize costs by matching reasoning depth to task complexity
What is Extended Thinking?
Extended thinking gives Claude space to reason step-by-step before formulating a response. It's enabled by default in Claude Code, allowing the model to "think through" complex problems internally before presenting a solution.
This reasoning process happens in a special "thinking phase" where Claude can:
- Break down multi-step problems into logical sequences
- Evaluate multiple approaches and tradeoffs
- Plan file modifications across a codebase
- Debug issues by systematically eliminating possibilities
- Consider edge cases and potential pitfalls
The thinking content is summarized and shown in verbose mode (toggle with Ctrl+O or Cmd+O). In normal mode, you only see the final response—but the reasoning still happened behind the scenes.
When Extended Thinking Adds Value
Extended thinking is particularly valuable for:
- Complex architectural decisions — Evaluating framework choices, design patterns, or system structure
- Challenging bugs — Reasoning through symptoms, hypotheses, and diagnostic steps
- Multi-step planning — Breaking down large features into logical implementation sequences
- Refactoring decisions — Weighing the tradeoffs between different code organization approaches
- Security considerations — Thinking through authentication flows, data validation, and attack surfaces
For simple tasks like "rename this variable" or "add a console.log statement," extended thinking adds minimal value—and you're better off using lower effort levels to save time and cost.
Verbose Mode Tip: Press Ctrl+O (or Cmd+O on Mac) to toggle verbose mode and see Claude's thinking process. This helps you understand how the model approached your problem and builds trust in complex solutions.
Adaptive Reasoning on Opus 4.6
Claude Opus 4.6 introduces adaptive reasoning, which dynamically allocates thinking tokens based on the complexity of your request and the configured effort level. Instead of a fixed thinking budget, Opus adjusts its reasoning depth to match the task.
Effort Levels
There are three effort levels that control how much reasoning Opus applies:
| Effort Level | Use Case | Speed | Cost | Reasoning Depth |
|---|---|---|---|---|
| Low | Simple tasks, quick fixes, known commands | Fast | Low | Minimal—just enough to complete the task |
| Medium | Moderate complexity, standard features | Balanced | Medium | Reasonable tradeoff evaluation |
| High (default) | Complex problems, architectural decisions, hard bugs | Slower | Higher | Deep multi-step reasoning and planning |
Adaptive reasoning means Opus won't "overthink" a simple task even on high effort—but it will allocate more reasoning capacity when it detects complexity. Conversely, on low effort, Opus limits its reasoning budget even for complex tasks, prioritizing speed over exhaustive analysis.
When to Use Each Effort Level
Use High Effort when:
- Architecting new features or systems
- Refactoring across multiple files
- Debugging obscure issues with unclear root causes
- Evaluating security implications
- Making decisions with significant long-term impact
Use Medium Effort when:
- Implementing well-defined features
- Standard CRUD operations with some business logic
- Routine debugging where symptoms point to likely causes
- Code reviews and optimization suggestions
Use Low Effort when:
- Renaming variables, functions, or files
- Adding logging or comments
- Running known commands (tests, builds, deploys)
- Simple formatting or style fixes
- Quick documentation updates
Cost Awareness: You're charged for all thinking tokens used, even though only a summary is shown. High effort on complex tasks can use thousands of thinking tokens—make sure the reasoning depth justifies the cost.
Configuring Thinking
Claude Code provides multiple ways to control extended thinking behavior.
Quick Toggle
Press Option+T (Mac) or Alt+T (Windows/Linux) to toggle extended thinking on or off for the current conversation. A status message confirms the change:
Extended thinking enabled
This override applies only to the active session—your global default remains unchanged.
Global Default
Set your preferred default in the configuration:
/configNavigate to Extended Thinking and choose:
- Enabled (default for complex tasks)
- Disabled (faster responses, no thinking phase)
This affects all new conversations but doesn't change active sessions.
Effort Level Adjustment
For models that support adaptive reasoning (Opus 4.6), you can adjust effort on the fly:
/modelUse left/right arrow keys to cycle through effort levels:
- Press right arrow → increase effort (low → medium → high)
- Press left arrow → decrease effort (high → medium → low)
The current effort level is displayed next to the model name.
Environment Variable
For advanced control, set the MAX_THINKING_TOKENS environment variable:
# Disable thinking entirely
export MAX_THINKING_TOKENS=0
# Set a custom budget (non-Opus models only)
export MAX_THINKING_TOKENS=15000Note: MAX_THINKING_TOKENS is ignored on Opus 4.6 (which uses adaptive reasoning) except when set to 0, which disables thinking entirely.
Settings File
In your Claude Code settings (.claude/settings.json or via the GUI), you can configure:
{
"alwaysThinkingEnabled": true,
"defaultEffortLevel": "high"
}This persists across sessions and projects.
How Thinking Works Across Models
Different Claude models handle extended thinking differently.
Opus 4.6: Adaptive Reasoning
Claude Opus 4.6 uses adaptive reasoning, which dynamically allocates thinking tokens based on:
- The configured effort level (low, medium, high)
- The detected complexity of your request
- The context and conversation history
There's no fixed budget—Opus decides how much reasoning to apply. The MAX_THINKING_TOKENS environment variable is ignored (except when set to 0 to disable thinking entirely).
Example: On high effort, a request to "refactor authentication across the app" might use 10,000+ thinking tokens. The same request on low effort might use only 2,000 tokens, resulting in a faster but less thorough plan.
Other Models: Fixed Budget
Models like Sonnet 4.5 and earlier use a fixed thinking budget of up to 31,999 tokens. The model will think until it reaches this limit or completes its reasoning, whichever comes first.
You can lower this budget with MAX_THINKING_TOKENS, but you cannot increase it beyond 31,999.
Example: If you set MAX_THINKING_TOKENS=5000, Sonnet will stop thinking after 5,000 tokens even if more reasoning would help. This saves cost but may reduce solution quality on complex problems.
The opusplan Model
Claude Code offers a special opusplan model that combines the strengths of both Opus and Sonnet:
- Planning phase uses Opus 4.6 with extended thinking for complex reasoning
- Execution phase uses Sonnet 4.5 for efficient code generation
This hybrid approach gives you:
- Deep reasoning for architectural decisions and multi-step plans
- Fast, cost-effective implementation of those plans
- The best balance of quality and efficiency for large features
To use opusplan:
/modelSelect opusplan from the model list. The model will automatically switch between Opus (planning) and Sonnet (execution) based on the task phase.
When to Use opusplan: Perfect for large feature implementations, complex refactors, and projects where you want high-quality planning but don't need Opus-level reasoning for every single code edit.
Cost Considerations
Extended thinking tokens are billed at the same rate as input tokens for the model you're using. Since thinking can add thousands of tokens to a request, costs can escalate quickly on high-effort tasks.
Example Cost Calculation
Assume you're using Opus 4.6 with the following pricing:
- Input tokens: $15 per 1M tokens
- Thinking tokens: $15 per 1M tokens (same rate)
- Output tokens: $75 per 1M tokens
A complex refactoring request might use:
- 8,000 input tokens (your request + file context)
- 12,000 thinking tokens (high effort adaptive reasoning)
- 6,000 output tokens (code changes and explanation)
Total cost: (8,000 + 12,000) × $15/1M + 6,000 × $75/1M = $0.30 + $0.45 = $0.75
Compare this to low effort on the same request:
- 8,000 input tokens
- 3,000 thinking tokens (minimal reasoning)
- 5,000 output tokens
Total cost: (8,000 + 3,000) × $15/1M + 5,000 × $75/1M = $0.165 + $0.375 = $0.54
Savings: $0.21 per request—or 28% cost reduction. Over hundreds of requests, this adds up.
Cost Optimization Strategies
- Match effort to complexity — Don't use high effort for simple tasks
- Use
opusplanfor large features — Get deep planning without paying Opus rates for every code edit - Disable thinking for known workflows — When running familiar commands, thinking adds little value
- Monitor usage in verbose mode — See how many thinking tokens are actually being used
- Iterate incrementally — Break large problems into smaller steps to reduce per-request complexity
Common Misconceptions
"Think" Phrases Don't Allocate Tokens
Some users add phrases like "think carefully" or "ultrathink" to their prompts, assuming this allocates more thinking tokens. This is incorrect.
These phrases are interpreted as regular prompt instructions—they might influence the model's behavior slightly, but they don't trigger the extended thinking mechanism or allocate additional thinking tokens.
To actually increase thinking depth, you need to:
- Increase the effort level (
/model→ adjust with arrow keys) - Ensure extended thinking is enabled (
Option+T/Alt+T) - Use a model that supports extended thinking (Opus 4.6, Sonnet 4.5, etc.)
Verbose Mode Doesn't Change Thinking
Toggling verbose mode (Ctrl+O / Cmd+O) only changes what you see—it doesn't affect how much thinking actually happens. The model thinks the same amount whether verbose mode is on or off; the toggle just controls visibility of the thinking summary.
Thinking Isn't "Hidden Output"
Some users assume thinking content is secretly included in every response. This is not the case—thinking happens in a separate phase before the output phase, and the thinking tokens are summarized rather than fully reproduced.
When verbose mode is off, you don't see any thinking content, but it still happened and you still paid for those tokens.
Exercise: Compare Effort Levels
Let's see adaptive reasoning in action by solving the same complex problem at different effort levels.
Compare Effort Levels
intermediate20 minTask: Ask Claude to design a rate-limiting system for an API.
Your prompt:
Design a rate-limiting system for a REST API. It should support per-user limits (100 requests/hour), global limits (10,000 requests/hour across all users), and burst allowances. Consider scalability, Redis vs. in-memory storage, and how to handle distributed servers. Provide a concrete implementation plan.
Steps:
Enable verbose mode (Ctrl+O / Cmd+O) so you can see the thinking process.
Set effort to low (/model → left arrow until "low" is shown) and submit your prompt. Observe how much thinking happens and the depth of the response.
Start a new conversation, set effort to medium, and submit the same prompt. Compare the thinking depth and response quality.
Start a third conversation, set effort to high (default), and submit the prompt again. Note the difference in reasoning thoroughness.
Review all three responses. Which effort level gave you the best balance of speed, cost, and quality for this specific task?
Reflection Questions:
- Did low effort miss important considerations that medium or high effort caught?
- Was high effort "overkill" for this task, or did the extra reasoning add real value?
- How many thinking tokens were used at each level? (Check the verbose output summary.)
- Would you use a different effort level if you were implementing vs. just brainstorming?
Expected Outcome: You'll likely find that high effort provides the most comprehensive analysis (considering edge cases, tradeoffs, and scalability issues), while low effort gives a functional but less nuanced solution. Medium effort often hits the "sweet spot" for well-defined engineering problems.
Practical Guidelines
When to Increase Effort
Increase to high effort when:
- You're architecting a new system or major feature
- The problem has unclear requirements or multiple valid approaches
- You need to evaluate security, scalability, or performance tradeoffs
- You're debugging an issue where the root cause is unclear
- The decision has long-term implications (framework choice, data model design, etc.)
When to Decrease Effort
Decrease to low effort when:
- The task is well-defined with a clear solution path
- You're making routine changes (renames, formatting, logging)
- Speed matters more than exhaustive analysis
- You're running known commands or scripts
- The cost of "overthinking" outweighs the risk of missing an edge case
Hybrid Approach
For large projects, consider a hybrid approach:
- Planning phase: Use high effort (or
opusplan) to architect the feature - Implementation phase: Drop to medium or low effort for routine code edits
- Review phase: Increase to medium effort for debugging and optimization
This balances thorough upfront planning with efficient execution.
Summary
Extended thinking gives Claude space to reason before responding, leading to higher-quality solutions for complex problems. Opus 4.6's adaptive reasoning dynamically allocates thinking tokens based on effort level (low, medium, high), while other models use a fixed budget.
You control thinking via:
- Quick toggle (
Option+T/Alt+T) - Effort level adjustment (
/model+ arrow keys) - Global configuration (
/config) - Environment variable (
MAX_THINKING_TOKENS)
Remember: you're charged for all thinking tokens, so match effort to complexity. Use high effort for architectural decisions and hard bugs; use low effort for simple tasks. For large features, opusplan gives you the best of both worlds—deep planning with Opus, efficient execution with Sonnet.
Key Takeaway
Extended thinking is powerful but not free. Match your effort level to task complexity — use high effort for architectural decisions and hard bugs, low effort for simple tasks, and consider disabling thinking entirely for routine operations. For large features, opusplan gives you deep planning with Opus and efficient execution with Sonnet.