Context Engineering

From Prompt Engineering to Context Engineering

Prompt engineering is writing a good prompt. Context engineering is the discipline of curating and maintaining the optimal set of tokens during LLM inference — across system instructions, tools, external data, and message history.

The distinction matters because agentic AI doesn’t operate on single prompts. It operates across extended interactions where the full context state — everything the model can see — determines output quality.

“Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.”

— Anthropic Engineering

Why Context Degrades

The Attention Budget

Every token in the context window depletes a finite attention resource. In transformer architectures, each token must attend to every other token (n² pairwise relationships). As context grows, these relationships stretch thin.

The practical impact:

Context Utilization	Typical Behavior
0-40%	Optimal reasoning, strong instruction following
40-60%	Good performance, slight recall degradation
60-80%	Noticeable degradation, may miss earlier instructions
80-95%	Significant quality loss, “forgetting” earlier context
95%+	Auto-compaction triggers, potential information loss

Context Rot

As an agentic session progresses, the context accumulates:

File contents from exploration (often the largest consumer)
Failed approaches and their error messages
Stale information about files since modified
Tool outputs that served their purpose but remain in context

This is context rot — the progressive contamination of context with low-signal tokens that dilute the high-signal ones.

The Five Pillars of Context Engineering

1. Minimize: Load Less

The best token is one you never load. Before adding anything to context, ask: Does the agent need this to complete the task?

Anti-pattern:

Here's the entire auth module: [2000 lines of code]
Now add rate limiting.

Engineered approach:

Add rate limiting to the auth endpoints.
Look at src/middleware/auth.ts for the current auth flow.
Follow the pattern in src/middleware/cors.ts as a reference.

The AI agent loads only what it needs, when it needs it.

2. Structure: Organize Information

Use clear delimiters, headers, and semantic organization. Models process structured information more reliably than flat text.

## Current Behavior
The /api/users endpoint returns all users without pagination.

## Desired Behavior
Add cursor-based pagination with a default page size of 20.

## Constraints
- Must be backwards-compatible (unpaginated requests still work)
- Use the same pagination pattern as /api/orders (see src/api/orders.ts)

## Verification
- Existing tests must pass
- Add tests for: first page, middle page, last page, empty results

3. Isolate: Separate Concerns

Different tasks should use different contexts. This is the principle behind sub-agents and the Research → Plan → Implement workflow.

┌─────────────────────────────────┐
│ Main Agent (Orchestrator)       │
│ Clean context: plan + decisions │
├────────────┬────────────────────┤
│            │                    │
│  ┌─────────▼──────────┐        │
│  │ Research Sub-Agent  │        │
│  │ Reads 50 files      │        │
│  │ Returns 500 tokens  │        │
│  └─────────────────────┘        │
│                                 │
│  ┌─────────────────────┐        │
│  │ Implementation Agent │        │
│  │ Clean context + plan │        │
│  │ Worktree isolation   │        │
│  └─────────────────────┘        │
└─────────────────────────────────┘

A research sub-agent might consume 50,000 tokens exploring files — but returns a 1,000-token summary. The main agent’s context stays clean.

4. Refresh: Compact Proactively

Don’t wait for auto-compaction at 95%. Trigger a context compaction with specific instructions at natural breakpoints. Use your tool’s compaction command or start a fresh session, passing explicit preservation instructions:

Compact context. Preserve: the API design decisions, the list of files modified,
and the test commands. Discard: file exploration, build logs, failed approaches.

The Frequent Intentional Compaction (FIC) methodology targets 40-60% context utilization:

Complete a phase (research, planning, or implementation)
Compact with phase-specific instructions
Begin the next phase with clean context

5. Persist: Save Outside Context

Critical information should exist outside the context window:

Agent configuration files — loaded automatically each session (see Tool Configuration Reference for filename conventions)
Scratchpad files — NOTES.md, PLAN.md for active work
Memory system — persists across compaction boundaries
Git commits — the ultimate persistence mechanism

Dynamic Context Retrieval

Rather than pre-loading everything, let agents discover context at runtime:

Look at how existing widgets work on the home page.
HotDogWidget.php is a good example. Follow that pattern.

The agent reads only what’s relevant, guided by your pointer.

Explore the auth module. Start with the entry point
and trace the session flow. Document what you find.

The agent incrementally discovers context, building understanding organically.

Read the agent configuration file for the project
structure overview, then use grep to find all rate limiting related code.

Combines upfront retrieval (agent configuration file) with runtime discovery (grep).

Measuring Context Health

Track these signals during sessions:

Context utilization % — configure a status line to show this continuously
Instruction adherence — is the AI agent following your agent configuration file rules?
Recall accuracy — does the AI agent remember decisions from earlier in the session?
Response quality — are outputs getting less precise or more generic?

If any of these degrade, it’s time to compact your context or start a fresh session.

For tool-specific instructions on configuring status lines, compaction commands, and memory systems, see the Tool Configuration Reference.

Key Takeaways

Context is a finite resource with diminishing returns — treat it as precious
Target 40-60% utilization for complex reasoning tasks
Use sub-agents for exploration to keep the main context clean
Compact proactively between phases, not just when forced
Let agents discover context dynamically rather than pre-loading everything
Persist critical information outside the context window