Test-Driven Agentic Development

TDD is the most natural quality methodology for AI agents. Tests serve as unambiguous specifications, provide instant feedback, and create safety nets for iteration.

The TDAD Framework

Test-Driven Agentic Development (TDAD) combines traditional TDD with agent-specific optimizations. Research shows:

70% reduction in test-level regressions
33% improvement in resolution rate
Regression paradox: Naive “use TDD” prompts actually increase regressions by 9.94%

The key insight: tell agents which tests to run, not how to do TDD.

The Agentic TDD Workflow

Break down requirements into test cases

The rate limiter needs to handle these scenarios:

Basic functionality:
- [ ] Allow 100 requests per minute from a single client
- [ ] Return 429 on the 101st request
- [ ] Include Retry-After header in 429 responses

Edge cases:
- [ ] Multiple clients have independent limits
- [ ] Limits reset after the window expires
- [ ] Concurrent requests near the limit are handled correctly

Error handling:
- [ ] Redis unavailable: fail open (allow request)
- [ ] Invalid API key: skip rate limiting, return 401

Write tests for each scenario. Group them in this order.

Write tests first, confirm they fail
```
Write the test file at src/__tests__/rateLimit.test.ts.
Cover all scenarios listed above.
Run the tests and confirm they all fail (red phase).
```
Confirming tests fail is critical. Without this step, you risk creating tests that already pass — which means they don’t actually test the new implementation.

Implement one group at a time

Implement the basic functionality of the rate limiter.
Target: make the "Basic functionality" tests pass.
Don't worry about edge cases yet — just the basics.
Run the basic functionality tests after implementing.

Progress through all groups

Now implement edge case handling. Make the edge case tests pass.
Then implement error handling. Make those tests pass.
After each group, verify no previous tests broke.

Refactor while green

All tests pass. Refactor the rate limiter for:
- Clarity: better variable names, extracted functions
- Performance: minimize Redis roundtrips
Keep ALL tests green throughout refactoring.

Prompt Patterns for Agentic TDD

The Test Plan Prompt

Generate a TDD plan for implementing [feature]:

## Test Groups
### Group 1: Core Behavior
- [ ] Write test: [behavior 1]
- [ ] Write test: [behavior 2]
- [ ] Implement to pass

### Group 2: Edge Cases
- [ ] Write test: [edge case 1]
- [ ] Write test: [edge case 2]
- [ ] Implement to pass

### Group 3: Error Handling
- [ ] Write test: [error scenario 1]
- [ ] Implement to pass

### Final
- [ ] Run full suite
- [ ] Refactor while keeping tests green

The Contextual Test Target Prompt

Instead of “use TDD,” tell the agent which tests matter:

Implement the payment webhook handler.

Before making changes, run these existing tests to establish baseline:
  pnpm vitest run src/__tests__/webhooks.test.ts
  pnpm vitest run src/__tests__/payments.test.ts

After implementation, ALL of these must still pass,
plus new tests covering:
  - Valid Stripe webhook signature verification
  - Idempotent processing of duplicate events
  - Graceful handling of unknown event types

Integration with Code Health

Pair TDD with automated code health checks:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "command": "pnpm tsc --noEmit 2>&1 | head -20"
      }
    ]
  }
}

This creates a layered verification system:

After each edit: Type checking (via hooks)
After each test group: Targeted test run
After completion: Full test suite
Before commit: Lint + coverage check

Key Takeaways

TDD is a natural fit for coding agents — tests are unambiguous specs
Tell agents which tests to run, not how to do TDD
Write tests in groups by functionality domain
Always confirm tests fail before implementing
Refactor as a separate step with all tests green
Layer verification: types → unit tests → integration → E2E