Extended Thinking & Reasoning Budgets | Claude Code Fundamentals

Learning Objectives

Understand what extended thinking is and when it provides value
Configure thinking budgets and effort levels for different task types
Use adaptive reasoning on Opus 4.6 to balance speed and quality
Identify when to use high vs. low effort thinking
Optimize costs by matching reasoning depth to task complexity

What is Extended Thinking?

Extended thinking gives Claude space to reason step-by-step before formulating a response. It's enabled by default in Claude Code, allowing the model to "think through" complex problems internally before presenting a solution.

This reasoning process happens in a special "thinking phase" where Claude can:

Break down multi-step problems into logical sequences
Evaluate multiple approaches and tradeoffs
Plan file modifications across a codebase
Debug issues by systematically eliminating possibilities
Consider edge cases and potential pitfalls

The thinking content is summarized and shown in verbose mode (toggle with Ctrl+O or Cmd+O). In normal mode, you only see the final response—but the reasoning still happened behind the scenes.

When Extended Thinking Adds Value

Extended thinking is particularly valuable for:

Complex architectural decisions — Evaluating framework choices, design patterns, or system structure
Challenging bugs — Reasoning through symptoms, hypotheses, and diagnostic steps
Multi-step planning — Breaking down large features into logical implementation sequences
Refactoring decisions — Weighing the tradeoffs between different code organization approaches
Security considerations — Thinking through authentication flows, data validation, and attack surfaces

For simple tasks like "rename this variable" or "add a console.log statement," extended thinking adds minimal value—and you're better off using lower effort levels to save time and cost.

Verbose Mode Tip: Press Ctrl+O (or Cmd+O on Mac) to toggle verbose mode and see Claude's thinking process. This helps you understand how the model approached your problem and builds trust in complex solutions.

Adaptive Reasoning on Opus 4.6

Claude Opus 4.6 introduces adaptive reasoning, which dynamically allocates thinking tokens based on the complexity of your request and the configured effort level. Instead of a fixed thinking budget, Opus adjusts its reasoning depth to match the task.

Effort Levels

There are three effort levels that control how much reasoning Opus applies:

Effort Level	Use Case	Speed	Cost	Reasoning Depth
Low	Simple tasks, quick fixes, known commands	Fast	Low	Minimal—just enough to complete the task
Medium	Moderate complexity, standard features	Balanced	Medium	Reasonable tradeoff evaluation
High (default)	Complex problems, architectural decisions, hard bugs	Slower	Higher	Deep multi-step reasoning and planning

Adaptive reasoning means Opus won't "overthink" a simple task even on high effort—but it will allocate more reasoning capacity when it detects complexity. Conversely, on low effort, Opus limits its reasoning budget even for complex tasks, prioritizing speed over exhaustive analysis.

When to Use Each Effort Level

Use High Effort when:

Architecting new features or systems
Refactoring across multiple files
Debugging obscure issues with unclear root causes
Evaluating security implications
Making decisions with significant long-term impact

Use Medium Effort when:

Implementing well-defined features
Standard CRUD operations with some business logic
Routine debugging where symptoms point to likely causes
Code reviews and optimization suggestions

Use Low Effort when:

Renaming variables, functions, or files
Adding logging or comments
Running known commands (tests, builds, deploys)
Simple formatting or style fixes
Quick documentation updates

Cost Awareness: You're charged for all thinking tokens used, even though only a summary is shown. High effort on complex tasks can use thousands of thinking tokens—make sure the reasoning depth justifies the cost.

Configuring Thinking

Claude Code provides multiple ways to control extended thinking behavior.

Quick Toggle

Press Option+T (Mac) or Alt+T (Windows/Linux) to toggle extended thinking on or off for the current conversation. A status message confirms the change:

Extended thinking enabled

This override applies only to the active session—your global default remains unchanged.

Global Default

Set your preferred default in the configuration:

bash

/config

Navigate to Extended Thinking and choose:

Enabled (default for complex tasks)
Disabled (faster responses, no thinking phase)

This affects all new conversations but doesn't change active sessions.

Effort Level Adjustment

For models that support adaptive reasoning (Opus 4.6), you can adjust effort on the fly:

bash

/model

Use left/right arrow keys to cycle through effort levels:

Press right arrow → increase effort (low → medium → high)
Press left arrow → decrease effort (high → medium → low)

The current effort level is displayed next to the model name.

Environment Variable

For advanced control, set the MAX_THINKING_TOKENS environment variable:

bash

# Disable thinking entirely
export MAX_THINKING_TOKENS=0
 
# Set a custom budget (non-Opus models only)
export MAX_THINKING_TOKENS=15000

Note: MAX_THINKING_TOKENS is ignored on Opus 4.6 (which uses adaptive reasoning) except when set to 0, which disables thinking entirely.

Settings File

In your Claude Code settings (.claude/settings.json or via the GUI), you can configure:

json

{
  "alwaysThinkingEnabled": true,
  "defaultEffortLevel": "high"
}

This persists across sessions and projects.

How Thinking Works Across Models

Different Claude models handle extended thinking differently.

Opus 4.6: Adaptive Reasoning

Claude Opus 4.6 uses adaptive reasoning, which dynamically allocates thinking tokens based on:

The configured effort level (low, medium, high)
The detected complexity of your request
The context and conversation history

There's no fixed budget—Opus decides how much reasoning to apply. The MAX_THINKING_TOKENS environment variable is ignored (except when set to 0 to disable thinking entirely).

Example: On high effort, a request to "refactor authentication across the app" might use 10,000+ thinking tokens. The same request on low effort might use only 2,000 tokens, resulting in a faster but less thorough plan.

Other Models: Fixed Budget

Models like Sonnet 4.5 and earlier use a fixed thinking budget of up to 31,999 tokens. The model will think until it reaches this limit or completes its reasoning, whichever comes first.

You can lower this budget with MAX_THINKING_TOKENS, but you cannot increase it beyond 31,999.

Example: If you set MAX_THINKING_TOKENS=5000, Sonnet will stop thinking after 5,000 tokens even if more reasoning would help. This saves cost but may reduce solution quality on complex problems.

The `opusplan` Model

Claude Code offers a special opusplan model that combines the strengths of both Opus and Sonnet:

Planning phase uses Opus 4.6 with extended thinking for complex reasoning
Execution phase uses Sonnet 4.5 for efficient code generation

This hybrid approach gives you:

Deep reasoning for architectural decisions and multi-step plans
Fast, cost-effective implementation of those plans
The best balance of quality and efficiency for large features

To use opusplan:

bash

/model

Select opusplan from the model list. The model will automatically switch between Opus (planning) and Sonnet (execution) based on the task phase.

When to Use opusplan: Perfect for large feature implementations, complex refactors, and projects where you want high-quality planning but don't need Opus-level reasoning for every single code edit.

Cost Considerations

Extended thinking tokens are billed at the same rate as input tokens for the model you're using. Since thinking can add thousands of tokens to a request, costs can escalate quickly on high-effort tasks.

Example Cost Calculation

Assume you're using Opus 4.6 with the following pricing:

Input tokens: $15 per 1M tokens
Thinking tokens: $15 per 1M tokens (same rate)
Output tokens: $75 per 1M tokens

A complex refactoring request might use:

8,000 input tokens (your request + file context)
12,000 thinking tokens (high effort adaptive reasoning)
6,000 output tokens (code changes and explanation)

Total cost: (8,000 + 12,000) × $15/1M + 6,000 × $75/1M = $0.30 + $0.45 = $0.75

Compare this to low effort on the same request:

8,000 input tokens
3,000 thinking tokens (minimal reasoning)
5,000 output tokens

Total cost: (8,000 + 3,000) × $15/1M + 5,000 × $75/1M = $0.165 + $0.375 = $0.54

Savings: $0.21 per request—or 28% cost reduction. Over hundreds of requests, this adds up.

Cost Optimization Strategies

Match effort to complexity — Don't use high effort for simple tasks
Use opusplan for large features — Get deep planning without paying Opus rates for every code edit
Disable thinking for known workflows — When running familiar commands, thinking adds little value
Monitor usage in verbose mode — See how many thinking tokens are actually being used
Iterate incrementally — Break large problems into smaller steps to reduce per-request complexity

Common Misconceptions

"Think" Phrases Don't Allocate Tokens

Some users add phrases like "think carefully" or "ultrathink" to their prompts, assuming this allocates more thinking tokens. This is incorrect.

These phrases are interpreted as regular prompt instructions—they might influence the model's behavior slightly, but they don't trigger the extended thinking mechanism or allocate additional thinking tokens.

To actually increase thinking depth, you need to:

Increase the effort level (/model → adjust with arrow keys)
Ensure extended thinking is enabled (Option+T / Alt+T)
Use a model that supports extended thinking (Opus 4.6, Sonnet 4.5, etc.)

Verbose Mode Doesn't Change Thinking

Toggling verbose mode (Ctrl+O / Cmd+O) only changes what you see—it doesn't affect how much thinking actually happens. The model thinks the same amount whether verbose mode is on or off; the toggle just controls visibility of the thinking summary.

Thinking Isn't "Hidden Output"

Some users assume thinking content is secretly included in every response. This is not the case—thinking happens in a separate phase before the output phase, and the thinking tokens are summarized rather than fully reproduced.

When verbose mode is off, you don't see any thinking content, but it still happened and you still paid for those tokens.

Exercise: Compare Effort Levels

Let's see adaptive reasoning in action by solving the same complex problem at different effort levels.

Compare Effort Levels

intermediate20 min

Task: Ask Claude to design a rate-limiting system for an API.

Your prompt:

Design a rate-limiting system for a REST API. It should support per-user limits (100 requests/hour), global limits (10,000 requests/hour across all users), and burst allowances. Consider scalability, Redis vs. in-memory storage, and how to handle distributed servers. Provide a concrete implementation plan.

Steps:

Enable verbose mode (Ctrl+O / Cmd+O) so you can see the thinking process.

Set effort to low (/model → left arrow until "low" is shown) and submit your prompt. Observe how much thinking happens and the depth of the response.

Start a new conversation, set effort to medium, and submit the same prompt. Compare the thinking depth and response quality.

Start a third conversation, set effort to high (default), and submit the prompt again. Note the difference in reasoning thoroughness.

Review all three responses. Which effort level gave you the best balance of speed, cost, and quality for this specific task?

Reflection Questions:

Did low effort miss important considerations that medium or high effort caught?
Was high effort "overkill" for this task, or did the extra reasoning add real value?
How many thinking tokens were used at each level? (Check the verbose output summary.)
Would you use a different effort level if you were implementing vs. just brainstorming?

Expected Outcome: You'll likely find that high effort provides the most comprehensive analysis (considering edge cases, tradeoffs, and scalability issues), while low effort gives a functional but less nuanced solution. Medium effort often hits the "sweet spot" for well-defined engineering problems.

Practical Guidelines

When to Increase Effort

Increase to high effort when:

You're architecting a new system or major feature
The problem has unclear requirements or multiple valid approaches
You need to evaluate security, scalability, or performance tradeoffs
You're debugging an issue where the root cause is unclear
The decision has long-term implications (framework choice, data model design, etc.)

When to Decrease Effort

Decrease to low effort when:

The task is well-defined with a clear solution path
You're making routine changes (renames, formatting, logging)
Speed matters more than exhaustive analysis
You're running known commands or scripts
The cost of "overthinking" outweighs the risk of missing an edge case

Hybrid Approach

For large projects, consider a hybrid approach:

Planning phase: Use high effort (or opusplan) to architect the feature
Implementation phase: Drop to medium or low effort for routine code edits
Review phase: Increase to medium effort for debugging and optimization

This balances thorough upfront planning with efficient execution.

Summary

Extended thinking gives Claude space to reason before responding, leading to higher-quality solutions for complex problems. Opus 4.6's adaptive reasoning dynamically allocates thinking tokens based on effort level (low, medium, high), while other models use a fixed budget.

You control thinking via:

Quick toggle (Option+T / Alt+T)
Effort level adjustment (/model + arrow keys)
Global configuration (/config)
Environment variable (MAX_THINKING_TOKENS)

Remember: you're charged for all thinking tokens, so match effort to complexity. Use high effort for architectural decisions and hard bugs; use low effort for simple tasks. For large features, opusplan gives you the best of both worlds—deep planning with Opus, efficient execution with Sonnet.

Key Takeaway

Extended thinking is powerful but not free. Match your effort level to task complexity — use high effort for architectural decisions and hard bugs, low effort for simple tasks, and consider disabling thinking entirely for routine operations. For large features, opusplan gives you deep planning with Opus and efficient execution with Sonnet.

Learning Objectives

Understand what extended thinking is and when it provides value
Configure thinking budgets and effort levels for different task types
Use adaptive reasoning on Opus 4.6 to balance speed and quality
Identify when to use high vs. low effort thinking
Optimize costs by matching reasoning depth to task complexity

What is Extended Thinking?

This reasoning process happens in a special "thinking phase" where Claude can:

Break down multi-step problems into logical sequences
Evaluate multiple approaches and tradeoffs
Plan file modifications across a codebase
Debug issues by systematically eliminating possibilities
Consider edge cases and potential pitfalls

When Extended Thinking Adds Value

Extended thinking is particularly valuable for:

Complex architectural decisions — Evaluating framework choices, design patterns, or system structure
Challenging bugs — Reasoning through symptoms, hypotheses, and diagnostic steps
Multi-step planning — Breaking down large features into logical implementation sequences
Refactoring decisions — Weighing the tradeoffs between different code organization approaches
Security considerations — Thinking through authentication flows, data validation, and attack surfaces

For simple tasks like "rename this variable" or "add a console.log statement," extended thinking adds minimal value—and you're better off using lower effort levels to save time and cost.

Adaptive Reasoning on Opus 4.6

Effort Levels

There are three effort levels that control how much reasoning Opus applies:

Effort Level	Use Case	Speed	Cost	Reasoning Depth
Low	Simple tasks, quick fixes, known commands	Fast	Low	Minimal—just enough to complete the task
Medium	Moderate complexity, standard features	Balanced	Medium	Reasonable tradeoff evaluation
High (default)	Complex problems, architectural decisions, hard bugs	Slower	Higher	Deep multi-step reasoning and planning

When to Use Each Effort Level

Use High Effort when:

Architecting new features or systems
Refactoring across multiple files
Debugging obscure issues with unclear root causes
Evaluating security implications
Making decisions with significant long-term impact

Use Medium Effort when:

Implementing well-defined features
Standard CRUD operations with some business logic
Routine debugging where symptoms point to likely causes
Code reviews and optimization suggestions

Use Low Effort when:

Renaming variables, functions, or files
Adding logging or comments
Running known commands (tests, builds, deploys)
Simple formatting or style fixes
Quick documentation updates

Configuring Thinking

Claude Code provides multiple ways to control extended thinking behavior.

Quick Toggle

Press Option+T (Mac) or Alt+T (Windows/Linux) to toggle extended thinking on or off for the current conversation. A status message confirms the change:

Extended thinking enabled

This override applies only to the active session—your global default remains unchanged.

Global Default

Set your preferred default in the configuration:

bash

/config

Navigate to Extended Thinking and choose:

Enabled (default for complex tasks)
Disabled (faster responses, no thinking phase)

This affects all new conversations but doesn't change active sessions.

Effort Level Adjustment

For models that support adaptive reasoning (Opus 4.6), you can adjust effort on the fly:

bash

/model

Use left/right arrow keys to cycle through effort levels:

Press right arrow → increase effort (low → medium → high)
Press left arrow → decrease effort (high → medium → low)

The current effort level is displayed next to the model name.

Environment Variable

For advanced control, set the MAX_THINKING_TOKENS environment variable:

bash

# Disable thinking entirely
export MAX_THINKING_TOKENS=0
 
# Set a custom budget (non-Opus models only)
export MAX_THINKING_TOKENS=15000

Note: MAX_THINKING_TOKENS is ignored on Opus 4.6 (which uses adaptive reasoning) except when set to 0, which disables thinking entirely.

Settings File

In your Claude Code settings (.claude/settings.json or via the GUI), you can configure:

json

{
  "alwaysThinkingEnabled": true,
  "defaultEffortLevel": "high"
}

This persists across sessions and projects.

How Thinking Works Across Models

Different Claude models handle extended thinking differently.

Opus 4.6: Adaptive Reasoning

Claude Opus 4.6 uses adaptive reasoning, which dynamically allocates thinking tokens based on:

The configured effort level (low, medium, high)
The detected complexity of your request
The context and conversation history

There's no fixed budget—Opus decides how much reasoning to apply. The MAX_THINKING_TOKENS environment variable is ignored (except when set to 0 to disable thinking entirely).

Other Models: Fixed Budget

Models like Sonnet 4.5 and earlier use a fixed thinking budget of up to 31,999 tokens. The model will think until it reaches this limit or completes its reasoning, whichever comes first.

You can lower this budget with MAX_THINKING_TOKENS, but you cannot increase it beyond 31,999.

The `opusplan` Model

Claude Code offers a special opusplan model that combines the strengths of both Opus and Sonnet:

Planning phase uses Opus 4.6 with extended thinking for complex reasoning
Execution phase uses Sonnet 4.5 for efficient code generation

This hybrid approach gives you:

Deep reasoning for architectural decisions and multi-step plans
Fast, cost-effective implementation of those plans
The best balance of quality and efficiency for large features

To use opusplan:

bash

/model

Select opusplan from the model list. The model will automatically switch between Opus (planning) and Sonnet (execution) based on the task phase.

Cost Considerations

Example Cost Calculation

Assume you're using Opus 4.6 with the following pricing:

Input tokens: $15 per 1M tokens
Thinking tokens: $15 per 1M tokens (same rate)
Output tokens: $75 per 1M tokens

A complex refactoring request might use:

8,000 input tokens (your request + file context)
12,000 thinking tokens (high effort adaptive reasoning)
6,000 output tokens (code changes and explanation)

Total cost: (8,000 + 12,000) × $15/1M + 6,000 × $75/1M = $0.30 + $0.45 = $0.75

Compare this to low effort on the same request:

8,000 input tokens
3,000 thinking tokens (minimal reasoning)
5,000 output tokens

Total cost: (8,000 + 3,000) × $15/1M + 5,000 × $75/1M = $0.165 + $0.375 = $0.54

Savings: $0.21 per request—or 28% cost reduction. Over hundreds of requests, this adds up.

Cost Optimization Strategies

Match effort to complexity — Don't use high effort for simple tasks
Use opusplan for large features — Get deep planning without paying Opus rates for every code edit
Disable thinking for known workflows — When running familiar commands, thinking adds little value
Monitor usage in verbose mode — See how many thinking tokens are actually being used
Iterate incrementally — Break large problems into smaller steps to reduce per-request complexity

Common Misconceptions

"Think" Phrases Don't Allocate Tokens

Some users add phrases like "think carefully" or "ultrathink" to their prompts, assuming this allocates more thinking tokens. This is incorrect.

To actually increase thinking depth, you need to:

Increase the effort level (/model → adjust with arrow keys)
Ensure extended thinking is enabled (Option+T / Alt+T)
Use a model that supports extended thinking (Opus 4.6, Sonnet 4.5, etc.)

Verbose Mode Doesn't Change Thinking

Thinking Isn't "Hidden Output"

When verbose mode is off, you don't see any thinking content, but it still happened and you still paid for those tokens.

Exercise: Compare Effort Levels

Let's see adaptive reasoning in action by solving the same complex problem at different effort levels.

Compare Effort Levels

intermediate20 min

Task: Ask Claude to design a rate-limiting system for an API.

Your prompt:

Design a rate-limiting system for a REST API. It should support per-user limits (100 requests/hour), global limits (10,000 requests/hour across all users), and burst allowances. Consider scalability, Redis vs. in-memory storage, and how to handle distributed servers. Provide a concrete implementation plan.

Steps:

Enable verbose mode (Ctrl+O / Cmd+O) so you can see the thinking process.

Set effort to low (/model → left arrow until "low" is shown) and submit your prompt. Observe how much thinking happens and the depth of the response.

Start a new conversation, set effort to medium, and submit the same prompt. Compare the thinking depth and response quality.

Start a third conversation, set effort to high (default), and submit the prompt again. Note the difference in reasoning thoroughness.

Review all three responses. Which effort level gave you the best balance of speed, cost, and quality for this specific task?

Reflection Questions:

Did low effort miss important considerations that medium or high effort caught?
Was high effort "overkill" for this task, or did the extra reasoning add real value?
How many thinking tokens were used at each level? (Check the verbose output summary.)
Would you use a different effort level if you were implementing vs. just brainstorming?

Practical Guidelines

When to Increase Effort

Increase to high effort when:

You're architecting a new system or major feature
The problem has unclear requirements or multiple valid approaches
You need to evaluate security, scalability, or performance tradeoffs
You're debugging an issue where the root cause is unclear
The decision has long-term implications (framework choice, data model design, etc.)

When to Decrease Effort

Decrease to low effort when:

The task is well-defined with a clear solution path
You're making routine changes (renames, formatting, logging)
Speed matters more than exhaustive analysis
You're running known commands or scripts
The cost of "overthinking" outweighs the risk of missing an edge case

Hybrid Approach

For large projects, consider a hybrid approach:

Planning phase: Use high effort (or opusplan) to architect the feature
Implementation phase: Drop to medium or low effort for routine code edits
Review phase: Increase to medium effort for debugging and optimization

This balances thorough upfront planning with efficient execution.

Summary

You control thinking via:

Quick toggle (Option+T / Alt+T)
Effort level adjustment (/model + arrow keys)
Global configuration (/config)
Environment variable (MAX_THINKING_TOKENS)