Context Engineering
From Prompt Engineering to Context Engineering
Section titled “From Prompt Engineering to Context Engineering”Prompt engineering is writing a good prompt. Context engineering is the discipline of curating and maintaining the optimal set of tokens during LLM inference — across system instructions, tools, external data, and message history.
The distinction matters because agentic AI doesn’t operate on single prompts. It operates across extended interactions where the full context state — everything the model can see — determines output quality.
“Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.”
— Anthropic Engineering
Why Context Degrades
Section titled “Why Context Degrades”The Attention Budget
Section titled “The Attention Budget”Every token in the context window depletes a finite attention resource. In transformer architectures, each token must attend to every other token (n² pairwise relationships). As context grows, these relationships stretch thin.
The practical impact:
| Context Utilization | Typical Behavior |
|---|---|
| 0-40% | Optimal reasoning, strong instruction following |
| 40-60% | Good performance, slight recall degradation |
| 60-80% | Noticeable degradation, may miss earlier instructions |
| 80-95% | Significant quality loss, “forgetting” earlier context |
| 95%+ | Auto-compaction triggers, potential information loss |
Context Rot
Section titled “Context Rot”As an agentic session progresses, the context accumulates:
- File contents from exploration (often the largest consumer)
- Failed approaches and their error messages
- Stale information about files since modified
- Tool outputs that served their purpose but remain in context
This is context rot — the progressive contamination of context with low-signal tokens that dilute the high-signal ones.
The Five Pillars of Context Engineering
Section titled “The Five Pillars of Context Engineering”1. Minimize: Load Less
Section titled “1. Minimize: Load Less”The best token is one you never load. Before adding anything to context, ask: Does the agent need this to complete the task?
Anti-pattern:
Here's the entire auth module: [2000 lines of code]Now add rate limiting.Engineered approach:
Add rate limiting to the auth endpoints.Look at src/middleware/auth.ts for the current auth flow.Follow the pattern in src/middleware/cors.ts as a reference.The AI agent loads only what it needs, when it needs it.
2. Structure: Organize Information
Section titled “2. Structure: Organize Information”Use clear delimiters, headers, and semantic organization. Models process structured information more reliably than flat text.
## Current BehaviorThe /api/users endpoint returns all users without pagination.
## Desired BehaviorAdd cursor-based pagination with a default page size of 20.
## Constraints- Must be backwards-compatible (unpaginated requests still work)- Use the same pagination pattern as /api/orders (see src/api/orders.ts)
## Verification- Existing tests must pass- Add tests for: first page, middle page, last page, empty results3. Isolate: Separate Concerns
Section titled “3. Isolate: Separate Concerns”Different tasks should use different contexts. This is the principle behind sub-agents and the Research → Plan → Implement workflow.
┌─────────────────────────────────┐│ Main Agent (Orchestrator) ││ Clean context: plan + decisions │├────────────┬────────────────────┤│ │ ││ ┌─────────▼──────────┐ ││ │ Research Sub-Agent │ ││ │ Reads 50 files │ ││ │ Returns 500 tokens │ ││ └─────────────────────┘ ││ ││ ┌─────────────────────┐ ││ │ Implementation Agent │ ││ │ Clean context + plan │ ││ │ Worktree isolation │ ││ └─────────────────────┘ │└─────────────────────────────────┘A research sub-agent might consume 50,000 tokens exploring files — but returns a 1,000-token summary. The main agent’s context stays clean.
4. Refresh: Compact Proactively
Section titled “4. Refresh: Compact Proactively”Don’t wait for auto-compaction at 95%. Trigger a context compaction with specific instructions at natural breakpoints. Use your tool’s compaction command or start a fresh session, passing explicit preservation instructions:
Compact context. Preserve: the API design decisions, the list of files modified,and the test commands. Discard: file exploration, build logs, failed approaches.The Frequent Intentional Compaction (FIC) methodology targets 40-60% context utilization:
- Complete a phase (research, planning, or implementation)
- Compact with phase-specific instructions
- Begin the next phase with clean context
5. Persist: Save Outside Context
Section titled “5. Persist: Save Outside Context”Critical information should exist outside the context window:
- Agent configuration files — loaded automatically each session (see Tool Configuration Reference for filename conventions)
- Scratchpad files —
NOTES.md,PLAN.mdfor active work - Memory system — persists across compaction boundaries
- Git commits — the ultimate persistence mechanism
Dynamic Context Retrieval
Section titled “Dynamic Context Retrieval”Rather than pre-loading everything, let agents discover context at runtime:
Look at how existing widgets work on the home page.HotDogWidget.php is a good example. Follow that pattern.The agent reads only what’s relevant, guided by your pointer.
Explore the auth module. Start with the entry pointand trace the session flow. Document what you find.The agent incrementally discovers context, building understanding organically.
Read the agent configuration file for the projectstructure overview, then use grep to find all rate limiting related code.Combines upfront retrieval (agent configuration file) with runtime discovery (grep).
Measuring Context Health
Section titled “Measuring Context Health”Track these signals during sessions:
- Context utilization % — configure a status line to show this continuously
- Instruction adherence — is the AI agent following your agent configuration file rules?
- Recall accuracy — does the AI agent remember decisions from earlier in the session?
- Response quality — are outputs getting less precise or more generic?
If any of these degrade, it’s time to compact your context or start a fresh session.
For tool-specific instructions on configuring status lines, compaction commands, and memory systems, see the Tool Configuration Reference.
Key Takeaways
Section titled “Key Takeaways”- Context is a finite resource with diminishing returns — treat it as precious
- Target 40-60% utilization for complex reasoning tasks
- Use sub-agents for exploration to keep the main context clean
- Compact proactively between phases, not just when forced
- Let agents discover context dynamically rather than pre-loading everything
- Persist critical information outside the context window