Skip to content

Context Window Mechanics

Understanding how context windows work — not just that they’re limited — enables you to make better engineering decisions across any AI coding agent (Claude Code, Cursor, Copilot Workspace, and others).

In transformer architectures, each token attends to every other token in the context window. This creates n² pairwise relationships. As context grows:

  1. Each token’s “share” of attention decreases
  2. The model must spread its attention budget across more relationships
  3. Earlier tokens receive progressively less attention
  4. Instructions at the beginning of context can be “diluted” by later content

This is why a 200k-token model doesn’t simply work “200k tokens well.” Performance varies dramatically across the window.

Models exhibit strong recency bias — they attend more strongly to recent tokens. This has practical implications:

  • Instructions near the end of context are followed more reliably
  • The system prompt (beginning of context) can be overridden by later content
  • Recent file reads have more influence than earlier ones
  • The last correction you give matters more than the first

When an AI coding agent hits 95% context utilization, it typically triggers auto-compaction:

  1. The full conversation history is passed to a summarization model
  2. The model preserves: architectural decisions, unresolved issues, implementation details, modified file list
  3. The model discards: redundant tool outputs, duplicate messages, resolved discussions
  4. The conversation continues with the compressed summary

What survives compaction:

  • Key decisions and their rationale
  • Current task state and progress
  • File modification history
  • Active constraints and requirements

What typically doesn’t survive:

  • Exact code snippets from earlier reads
  • Build/test output details
  • The full text of earlier discussion
  • Nuanced instructions from early in the conversation

You can influence what compaction preserves via your agent configuration file:

## Compaction Instructions
When compacting, always preserve:
- The full list of modified files
- All test commands that have been run
- The current implementation plan and progress
- Any architectural decisions made during this session

In tools that support manual compaction, you can trigger it directly with specific instructions. See Tool Configuration Reference for your tool’s compact command.

Compact your context. Focus on the API migration changes. Preserve the file list,
migration sequence, and remaining steps. Discard exploration output.

Some AI coding agents support selective compaction from a checkpoint rather than compacting the entire conversation. This condenses messages from a selected point forward while keeping earlier context intact — useful when exploration filled the context but you want to preserve the initial plan. Check your tool’s documentation for equivalent checkpoint or session-management features.

SituationAction
Starting a new taskClear your context or start a fresh session for a clean start
Between research and planningCompact your context, preserving research findings
Between planning and implementationCompact your context, preserving the plan
After fixing a bugClear your context if moving to unrelated work
Context at 60% during complex taskConsider compacting proactively
Context at 80%+Compact immediately or start fresh
After 2+ failed correction attemptsClear your context and start with a better prompt