Skip to content

Test-Driven Agentic Development

TDD is the most natural quality methodology for AI agents. Tests serve as unambiguous specifications, provide instant feedback, and create safety nets for iteration.

Test-Driven Agentic Development (TDAD) combines traditional TDD with agent-specific optimizations. Research shows:

  • 70% reduction in test-level regressions
  • 33% improvement in resolution rate
  • Regression paradox: Naive “use TDD” prompts actually increase regressions by 9.94%

The key insight: tell agents which tests to run, not how to do TDD.

  1. Break down requirements into test cases

    The rate limiter needs to handle these scenarios:
    Basic functionality:
    - [ ] Allow 100 requests per minute from a single client
    - [ ] Return 429 on the 101st request
    - [ ] Include Retry-After header in 429 responses
    Edge cases:
    - [ ] Multiple clients have independent limits
    - [ ] Limits reset after the window expires
    - [ ] Concurrent requests near the limit are handled correctly
    Error handling:
    - [ ] Redis unavailable: fail open (allow request)
    - [ ] Invalid API key: skip rate limiting, return 401
    Write tests for each scenario. Group them in this order.
  2. Write tests first, confirm they fail

    Write the test file at src/__tests__/rateLimit.test.ts.
    Cover all scenarios listed above.
    Run the tests and confirm they all fail (red phase).
  3. Implement one group at a time

    Implement the basic functionality of the rate limiter.
    Target: make the "Basic functionality" tests pass.
    Don't worry about edge cases yet — just the basics.
    Run the basic functionality tests after implementing.
  4. Progress through all groups

    Now implement edge case handling. Make the edge case tests pass.
    Then implement error handling. Make those tests pass.
    After each group, verify no previous tests broke.
  5. Refactor while green

    All tests pass. Refactor the rate limiter for:
    - Clarity: better variable names, extracted functions
    - Performance: minimize Redis roundtrips
    Keep ALL tests green throughout refactoring.
Generate a TDD plan for implementing [feature]:
## Test Groups
### Group 1: Core Behavior
- [ ] Write test: [behavior 1]
- [ ] Write test: [behavior 2]
- [ ] Implement to pass
### Group 2: Edge Cases
- [ ] Write test: [edge case 1]
- [ ] Write test: [edge case 2]
- [ ] Implement to pass
### Group 3: Error Handling
- [ ] Write test: [error scenario 1]
- [ ] Implement to pass
### Final
- [ ] Run full suite
- [ ] Refactor while keeping tests green

Instead of “use TDD,” tell the agent which tests matter:

Implement the payment webhook handler.
Before making changes, run these existing tests to establish baseline:
pnpm vitest run src/__tests__/webhooks.test.ts
pnpm vitest run src/__tests__/payments.test.ts
After implementation, ALL of these must still pass,
plus new tests covering:
- Valid Stripe webhook signature verification
- Idempotent processing of duplicate events
- Graceful handling of unknown event types

Pair TDD with automated code health checks:

{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"command": "pnpm tsc --noEmit 2>&1 | head -20"
}
]
}
}

This creates a layered verification system:

  1. After each edit: Type checking (via hooks)
  2. After each test group: Targeted test run
  3. After completion: Full test suite
  4. Before commit: Lint + coverage check
  • TDD is a natural fit for coding agents — tests are unambiguous specs
  • Tell agents which tests to run, not how to do TDD
  • Write tests in groups by functionality domain
  • Always confirm tests fail before implementing
  • Refactor as a separate step with all tests green
  • Layer verification: types → unit tests → integration → E2E