Atlas survey

Tooling Stack for the AI-Assisted TDD Workshop

Layer-by-layer stack recommendation: AI assistant (Copilot/Cursor/Cline), test framework (Vitest/pytest), infra (Codespaces devcontainer), and optional quality-demo tools — with copy-paste TDD prompts.

20 sources ~5 min read #182 tdd · ai-coding · tools · vitest · cursor · github-copilot · workshop · devcontainer · codespaces

Decision: Use VS Code + GitHub Copilot Pro ($10/mo) if the audience already has Copilot — lowest friction, most familiarity. Switch to Cursor Pro ($20/mo) for the cleanest AI-native TDD loop and YOLO mode. For a zero-cost fallback: VS Code + Continue.dev or Cline + shared API key. Language: TypeScript + Vitest (fastest watch-mode feedback, zero TypeScript config). Infrastructure: GitHub Codespaces + .devcontainer so participants click one button and have a working environment in 90 seconds.

Layer 1 — AI Coding Assistant

The AI tool is the centerpiece of every exercise. Five realistic options for an expert audience, ranked by workshop friction. Plan for a mix: 56% of enterprise developers use Copilot; 75% at small companies use Claude Code — your audience will have both.[1]

Tool Type TDD-relevant features Price Workshop fit
GitHub Copilot IDE extension (VS Code, JetBrains, Visual Studio) /tests command scaffolds suites from open file;[3] custom TDD-red / TDD-green / TDD-refactor agents with automated handoffs;[4] detects existing test conventions (Jest describe/it, pytest fixtures, JUnit) Free (2k completions/mo) · Pro $10/mo · Pro+ $39/mo[2] Best default — most participants already have it
Cursor AI-native VS Code fork (imports VS Code extensions & keybindings) Composer for multi-file edits; YOLO mode auto-runs tests, reads failures, and iterates without manual intervention;[6] @file references pin test files as specs Free (limited) · Pro $20/mo · Business $40/user/mo[5] Best AI-native TDD loop; ≈30 min install overhead for non-users
Continue.dev ⭐ 33.5k Open-source VS Code / JetBrains agent (BYOM) Bring-your-own model (Anthropic, OpenAI, Ollama local); shared config via version-controlled files — every participant gets identical AI behaviour out of the repo[8] Free (BYOK) Best for "use your own model" segment; mild setup overhead for API keys
Cline ⭐ 62.7k Autonomous VS Code agent (BYOK) Runs dev servers and test commands inside the terminal, reads failures, iterates across files; Plan Mode narrates intent before acting — useful for live demos[9] Free (BYOK) High-impact demos; requires Claude/OpenAI API key per participant
Aider ⭐ 45.7k Terminal agent (git-native, BYOK) Writes tests → runs pytest/vitest → auto-commits on green; entire TDD loop visible as git history; no IDE required[10] Free (BYOK) Best for backend-only exercises or "show git as the TDD ledger" segment

→ Exercises must work with either Copilot or Cursor. Write prompts as plain text instructions, not tool-specific slash commands, so any agent can execute them.

Layer 2 — Testing Framework

Language Recommended Alternative Rationale
TypeScript / JS Vitest ⭐ 16.6k[12] Jest ⭐ 45.4k for existing codebases[13] 2–5× faster cold start; watch-mode reruns in <300ms vs Jest's 3–10s; native ESM; TypeScript zero-config via esbuild — participants spend time on TDD, not config.[11] Jest-compatible API means AI tools output valid Vitest code without prompting.
Python pytest + hypothesis unittest De facto standard; every major AI tool defaults to pytest idioms; hypothesis enables property-based testing demo in one exercise slot
Browser / E2E Playwright ⭐ 90.1k[14] Cypress (JS only) Multi-browser; AI copilots generate Playwright selectors natively; codegen command creates a test by recording clicks — good for "AI-generated E2E from user story" demo

→ Run the main exercises in TypeScript + Vitest. Offer a Python track if the audience skews backend-heavy. Avoid mixing languages mid-exercise — it dilutes the TDD-loop focus.

Layer 3 — Workshop Infrastructure

"Works on my machine" is the primary failure mode of virtual workshops. Use GitHub Codespaces backed by a .devcontainer: each participant clicks one button and has a running environment in ≈90 seconds; if it breaks, they delete and re-create in 2 minutes.[15] This approach was validated at Simon Willison's NICAR 2026 coding-agents workshop, where YOLO-mode aliases were pre-wired into the container so participants skipped all permission dialogs.[16]

// .devcontainer/devcontainer.json — minimal workshop starter { "image": "mcr.microsoft.com/devcontainers/typescript-node:22", "postCreateCommand": "npm install", "customizations": { "vscode": { "extensions": [ "GitHub.copilot", // swap for Continue.continue if BYOK day "vitest.explorer", "hbenl.vscode-test-explorer", "eamodio.gitlens" ], "settings": { "vitest.enable": true, "editor.formatOnSave": true } } } }

Codespaces free tier covers 120 core-hours/month per GitHub account — enough for a 4-hour workshop for most individual participants. For Copilot, participants bring their own subscription; the extension authenticates automatically inside the container.

Layer 4 — Optional Quality-Demo Tools

Each of these fits a 5-minute slot that makes a high-signal point without needing to be part of every exercise:

Tool What it demonstrates Cost
Qodo (formerly CodiumAI) — VS Code + JetBrains AI generates behaviourally meaningful tests, not just coverage-padding ones; v2.0 multi-agent PR review scored 60.1% F1 (9 points ahead of next tool)[17] Free 250 credits/mo; Teams $30–38/user/mo[18]
mutmut ⭐ 1.3k (Python mutation testing) Runs mutmut run against AI-generated tests → reveals the "pass but miss bugs" problem concretely; makes visible why oracle-designed tests fail[19] Free / open source
VS Code Copilot TDD Custom Agents Three agents (TDD-red / TDD-green / TDD-refactor) wired as .github/agents/ files with automated phase handoffs — shows how to codify the discipline into the project repo itself[4] Free (with any Copilot plan)

TDD Prompt Patterns — Put These on Exercise Sheets

These two prompts prevent the most common AI failure mode: the model rewrites the failing test to pass rather than fixing the implementation.[7]

Phase 1 — Test authoring (Red) Write ONLY test cases for [function name and spec]. DO NOT write any implementation code. The tests must fail when run right now. Commit when all tests are red.
Phase 2 — Implementation (Green) Write the implementation for [function name] that passes ALL tests in [test file]. DO NOT modify the test file. Write the minimal code that makes every test pass.

Two-session pattern: start a fresh chat between Phase 1 and Phase 2. This prevents the model from using the test-authoring context to "cheat" by writing an implementation before the red phase is committed. Git checkpoint after red is mandatory — it's the proof that tests were genuinely failing.[7]

⚠ Verify tests are genuinely red before issuing the implementation prompt. Vacuously-passing tests (no assertion, expect(true).toBe(true)) make the entire exercise pointless and are a common AI mistake when the spec is underspecified.

Citations · 20 sources

Click the Citations tab to load…