Skip to content

Context Engineering

From Prompt Engineering to Context Engineering

Section titled “From Prompt Engineering to Context Engineering”

Prompt engineering is writing a good prompt. Context engineering is the discipline of curating and maintaining the optimal set of tokens during LLM inference — across system instructions, tools, external data, and message history.

The distinction matters because agentic AI doesn’t operate on single prompts. It operates across extended interactions where the full context state — everything the model can see — determines output quality.

“Find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.”

— Anthropic Engineering

Every token in the context window depletes a finite attention resource. In transformer architectures, each token must attend to every other token (n² pairwise relationships). As context grows, these relationships stretch thin.

The practical impact:

Context UtilizationTypical Behavior
0-40%Optimal reasoning, strong instruction following
40-60%Good performance, slight recall degradation
60-80%Noticeable degradation, may miss earlier instructions
80-95%Significant quality loss, “forgetting” earlier context
95%+Auto-compaction triggers, potential information loss

As an agentic session progresses, the context accumulates:

  • File contents from exploration (often the largest consumer)
  • Failed approaches and their error messages
  • Stale information about files since modified
  • Tool outputs that served their purpose but remain in context

This is context rot — the progressive contamination of context with low-signal tokens that dilute the high-signal ones.

The best token is one you never load. Before adding anything to context, ask: Does the agent need this to complete the task?

Anti-pattern:

Here's the entire auth module: [2000 lines of code]
Now add rate limiting.

Engineered approach:

Add rate limiting to the auth endpoints.
Look at src/middleware/auth.ts for the current auth flow.
Follow the pattern in src/middleware/cors.ts as a reference.

The AI agent loads only what it needs, when it needs it.

Use clear delimiters, headers, and semantic organization. Models process structured information more reliably than flat text.

## Current Behavior
The /api/users endpoint returns all users without pagination.
## Desired Behavior
Add cursor-based pagination with a default page size of 20.
## Constraints
- Must be backwards-compatible (unpaginated requests still work)
- Use the same pagination pattern as /api/orders (see src/api/orders.ts)
## Verification
- Existing tests must pass
- Add tests for: first page, middle page, last page, empty results

Different tasks should use different contexts. This is the principle behind sub-agents and the Research → Plan → Implement workflow.

┌─────────────────────────────────┐
│ Main Agent (Orchestrator) │
│ Clean context: plan + decisions │
├────────────┬────────────────────┤
│ │ │
│ ┌─────────▼──────────┐ │
│ │ Research Sub-Agent │ │
│ │ Reads 50 files │ │
│ │ Returns 500 tokens │ │
│ └─────────────────────┘ │
│ │
│ ┌─────────────────────┐ │
│ │ Implementation Agent │ │
│ │ Clean context + plan │ │
│ │ Worktree isolation │ │
│ └─────────────────────┘ │
└─────────────────────────────────┘

A research sub-agent might consume 50,000 tokens exploring files — but returns a 1,000-token summary. The main agent’s context stays clean.

Don’t wait for auto-compaction at 95%. Trigger a context compaction with specific instructions at natural breakpoints. Use your tool’s compaction command or start a fresh session, passing explicit preservation instructions:

Compact context. Preserve: the API design decisions, the list of files modified,
and the test commands. Discard: file exploration, build logs, failed approaches.

The Frequent Intentional Compaction (FIC) methodology targets 40-60% context utilization:

  1. Complete a phase (research, planning, or implementation)
  2. Compact with phase-specific instructions
  3. Begin the next phase with clean context

Critical information should exist outside the context window:

  • Agent configuration files — loaded automatically each session (see Tool Configuration Reference for filename conventions)
  • Scratchpad filesNOTES.md, PLAN.md for active work
  • Memory system — persists across compaction boundaries
  • Git commits — the ultimate persistence mechanism

Rather than pre-loading everything, let agents discover context at runtime:

Look at how existing widgets work on the home page.
HotDogWidget.php is a good example. Follow that pattern.

The agent reads only what’s relevant, guided by your pointer.

Track these signals during sessions:

  1. Context utilization % — configure a status line to show this continuously
  2. Instruction adherence — is the AI agent following your agent configuration file rules?
  3. Recall accuracy — does the AI agent remember decisions from earlier in the session?
  4. Response quality — are outputs getting less precise or more generic?

If any of these degrade, it’s time to compact your context or start a fresh session.

For tool-specific instructions on configuring status lines, compaction commands, and memory systems, see the Tool Configuration Reference.

  • Context is a finite resource with diminishing returns — treat it as precious
  • Target 40-60% utilization for complex reasoning tasks
  • Use sub-agents for exploration to keep the main context clean
  • Compact proactively between phases, not just when forced
  • Let agents discover context dynamically rather than pre-loading everything
  • Persist critical information outside the context window