Test-Driven Agentic Development
TDD is the most natural quality methodology for AI agents. Tests serve as unambiguous specifications, provide instant feedback, and create safety nets for iteration.
The TDAD Framework
Section titled “The TDAD Framework”Test-Driven Agentic Development (TDAD) combines traditional TDD with agent-specific optimizations. Research shows:
- 70% reduction in test-level regressions
- 33% improvement in resolution rate
- Regression paradox: Naive “use TDD” prompts actually increase regressions by 9.94%
The key insight: tell agents which tests to run, not how to do TDD.
The Agentic TDD Workflow
Section titled “The Agentic TDD Workflow”-
Break down requirements into test cases
The rate limiter needs to handle these scenarios:Basic functionality:- [ ] Allow 100 requests per minute from a single client- [ ] Return 429 on the 101st request- [ ] Include Retry-After header in 429 responsesEdge cases:- [ ] Multiple clients have independent limits- [ ] Limits reset after the window expires- [ ] Concurrent requests near the limit are handled correctlyError handling:- [ ] Redis unavailable: fail open (allow request)- [ ] Invalid API key: skip rate limiting, return 401Write tests for each scenario. Group them in this order. -
Write tests first, confirm they fail
Write the test file at src/__tests__/rateLimit.test.ts.Cover all scenarios listed above.Run the tests and confirm they all fail (red phase). -
Implement one group at a time
Implement the basic functionality of the rate limiter.Target: make the "Basic functionality" tests pass.Don't worry about edge cases yet — just the basics.Run the basic functionality tests after implementing. -
Progress through all groups
Now implement edge case handling. Make the edge case tests pass.Then implement error handling. Make those tests pass.After each group, verify no previous tests broke. -
Refactor while green
All tests pass. Refactor the rate limiter for:- Clarity: better variable names, extracted functions- Performance: minimize Redis roundtripsKeep ALL tests green throughout refactoring.
Prompt Patterns for Agentic TDD
Section titled “Prompt Patterns for Agentic TDD”The Test Plan Prompt
Section titled “The Test Plan Prompt”Generate a TDD plan for implementing [feature]:
## Test Groups### Group 1: Core Behavior- [ ] Write test: [behavior 1]- [ ] Write test: [behavior 2]- [ ] Implement to pass
### Group 2: Edge Cases- [ ] Write test: [edge case 1]- [ ] Write test: [edge case 2]- [ ] Implement to pass
### Group 3: Error Handling- [ ] Write test: [error scenario 1]- [ ] Implement to pass
### Final- [ ] Run full suite- [ ] Refactor while keeping tests greenThe Contextual Test Target Prompt
Section titled “The Contextual Test Target Prompt”Instead of “use TDD,” tell the agent which tests matter:
Implement the payment webhook handler.
Before making changes, run these existing tests to establish baseline: pnpm vitest run src/__tests__/webhooks.test.ts pnpm vitest run src/__tests__/payments.test.ts
After implementation, ALL of these must still pass,plus new tests covering: - Valid Stripe webhook signature verification - Idempotent processing of duplicate events - Graceful handling of unknown event typesIntegration with Code Health
Section titled “Integration with Code Health”Pair TDD with automated code health checks:
{ "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "command": "pnpm tsc --noEmit 2>&1 | head -20" } ] }}This creates a layered verification system:
- After each edit: Type checking (via hooks)
- After each test group: Targeted test run
- After completion: Full test suite
- Before commit: Lint + coverage check
Key Takeaways
Section titled “Key Takeaways”- TDD is a natural fit for coding agents — tests are unambiguous specs
- Tell agents which tests to run, not how to do TDD
- Write tests in groups by functionality domain
- Always confirm tests fail before implementing
- Refactor as a separate step with all tests green
- Layer verification: types → unit tests → integration → E2E