Tooling Stack for the AI-Assisted TDD Workshop

Decision: Use VS Code + GitHub Copilot Pro ($10/mo) if the audience already has Copilot — lowest friction, most familiarity. Switch to Cursor Pro ($20/mo) for the cleanest AI-native TDD loop and YOLO mode. For a zero-cost fallback: VS Code + Continue.dev or Cline + shared API key. Language: TypeScript + Vitest (fastest watch-mode feedback, zero TypeScript config). Infrastructure: GitHub Codespaces + .devcontainer so participants click one button and have a working environment in 90 seconds.

Layer 1 — AI Coding Assistant

The AI tool is the centerpiece of every exercise. Five realistic options for an expert audience, ranked by workshop friction. Plan for a mix: 56% of enterprise developers use Copilot; 75% at small companies use Claude Code — your audience will have both.^[1]

Tool	Type	TDD-relevant features	Price	Workshop fit
GitHub Copilot	IDE extension (VS Code, JetBrains, Visual Studio)	`/tests` command scaffolds suites from open file;^[3] custom TDD-red / TDD-green / TDD-refactor agents with automated handoffs;^[4] detects existing test conventions (Jest describe/it, pytest fixtures, JUnit)	Free (2k completions/mo) · Pro $10/mo · Pro+ $39/mo^[2]	Best default — most participants already have it
Cursor	AI-native VS Code fork (imports VS Code extensions & keybindings)	Composer for multi-file edits; YOLO mode auto-runs tests, reads failures, and iterates without manual intervention;^[6] `@file` references pin test files as specs	Free (limited) · Pro $20/mo · Business $40/user/mo^[5]	Best AI-native TDD loop; ≈30 min install overhead for non-users
Continue.dev ⭐ 33.5k	Open-source VS Code / JetBrains agent (BYOM)	Bring-your-own model (Anthropic, OpenAI, Ollama local); shared config via version-controlled files — every participant gets identical AI behaviour out of the repo^[8]	Free (BYOK)	Best for "use your own model" segment; mild setup overhead for API keys
Cline ⭐ 62.7k	Autonomous VS Code agent (BYOK)	Runs dev servers and test commands inside the terminal, reads failures, iterates across files; Plan Mode narrates intent before acting — useful for live demos^[9]	Free (BYOK)	High-impact demos; requires Claude/OpenAI API key per participant
Aider ⭐ 45.7k	Terminal agent (git-native, BYOK)	Writes tests → runs pytest/vitest → auto-commits on green; entire TDD loop visible as git history; no IDE required^[10]	Free (BYOK)	Best for backend-only exercises or "show git as the TDD ledger" segment

→ Exercises must work with either Copilot or Cursor. Write prompts as plain text instructions, not tool-specific slash commands, so any agent can execute them.

Layer 2 — Testing Framework

Language	Recommended	Alternative	Rationale
TypeScript / JS	Vitest ⭐ 16.6k^[12]	Jest ⭐ 45.4k for existing codebases^[13]	2–5× faster cold start; watch-mode reruns in <300ms vs Jest's 3–10s; native ESM; TypeScript zero-config via esbuild — participants spend time on TDD, not config.^[11] Jest-compatible API means AI tools output valid Vitest code without prompting.
Python	pytest + hypothesis	unittest	De facto standard; every major AI tool defaults to pytest idioms; hypothesis enables property-based testing demo in one exercise slot
Browser / E2E	Playwright ⭐ 90.1k^[14]	Cypress (JS only)	Multi-browser; AI copilots generate Playwright selectors natively; `codegen` command creates a test by recording clicks — good for "AI-generated E2E from user story" demo

→ Run the main exercises in TypeScript + Vitest. Offer a Python track if the audience skews backend-heavy. Avoid mixing languages mid-exercise — it dilutes the TDD-loop focus.

Layer 3 — Workshop Infrastructure

"Works on my machine" is the primary failure mode of virtual workshops. Use GitHub Codespaces backed by a .devcontainer: each participant clicks one button and has a running environment in ≈90 seconds; if it breaks, they delete and re-create in 2 minutes.^[15] This approach was validated at Simon Willison's NICAR 2026 coding-agents workshop, where YOLO-mode aliases were pre-wired into the container so participants skipped all permission dialogs.^[16]

// .devcontainer/devcontainer.json — minimal workshop starter { "image": "mcr.microsoft.com/devcontainers/typescript-node:22", "postCreateCommand": "npm install", "customizations": { "vscode": { "extensions": [ "GitHub.copilot", // swap for Continue.continue if BYOK day "vitest.explorer", "hbenl.vscode-test-explorer", "eamodio.gitlens" ], "settings": { "vitest.enable": true, "editor.formatOnSave": true } } } }

Codespaces free tier covers 120 core-hours/month per GitHub account — enough for a 4-hour workshop for most individual participants. For Copilot, participants bring their own subscription; the extension authenticates automatically inside the container.

Layer 4 — Optional Quality-Demo Tools

Each of these fits a 5-minute slot that makes a high-signal point without needing to be part of every exercise:

Tool	What it demonstrates	Cost
Qodo (formerly CodiumAI) — VS Code + JetBrains	AI generates behaviourally meaningful tests, not just coverage-padding ones; v2.0 multi-agent PR review scored 60.1% F1 (9 points ahead of next tool)^[17]	Free 250 credits/mo; Teams $30–38/user/mo^[18]
mutmut ⭐ 1.3k (Python mutation testing)	Runs `mutmut run` against AI-generated tests → reveals the "pass but miss bugs" problem concretely; makes visible why oracle-designed tests fail^[19]	Free / open source
VS Code Copilot TDD Custom Agents	Three agents (TDD-red / TDD-green / TDD-refactor) wired as `.github/agents/` files with automated phase handoffs — shows how to codify the discipline into the project repo itself^[4]	Free (with any Copilot plan)

TDD Prompt Patterns — Put These on Exercise Sheets

These two prompts prevent the most common AI failure mode: the model rewrites the failing test to pass rather than fixing the implementation.^[7]

Phase 1 — Test authoring (Red) Write ONLY test cases for [function name and spec]. DO NOT write any implementation code. The tests must fail when run right now. Commit when all tests are red.

Phase 2 — Implementation (Green) Write the implementation for [function name] that passes ALL tests in [test file]. DO NOT modify the test file. Write the minimal code that makes every test pass.

Two-session pattern: start a fresh chat between Phase 1 and Phase 2. This prevents the model from using the test-authoring context to "cheat" by writing an implementation before the red phase is committed. Git checkpoint after red is mandatory — it's the proof that tests were genuinely failing.^[7]

⚠ Verify tests are genuinely red before issuing the implementation prompt. Vacuously-passing tests (no assertion, expect(true).toBe(true)) make the entire exercise pointless and are a common AI mistake when the spec is underspecified.