Decision: Use VS Code + GitHub Copilot Pro ($10/mo) if the audience already has Copilot — lowest friction, most familiarity. Switch to Cursor Pro ($20/mo) for the cleanest AI-native TDD loop and YOLO mode. For a zero-cost fallback: VS Code + Continue.dev or Cline + shared API key. Language: TypeScript + Vitest (fastest watch-mode feedback, zero TypeScript config). Infrastructure: GitHub Codespaces + .devcontainer so participants click one button and have a working environment in 90 seconds.
Layer 1 — AI Coding Assistant
The AI tool is the centerpiece of every exercise. Five realistic options for an expert audience, ranked by workshop friction. Plan for a mix: 56% of enterprise developers use Copilot; 75% at small companies use Claude Code — your audience will have both.[1]
| Tool | Type | TDD-relevant features | Price | Workshop fit |
|---|---|---|---|---|
| GitHub Copilot | IDE extension (VS Code, JetBrains, Visual Studio) |
/tests command scaffolds suites from open file;[3]
custom TDD-red / TDD-green / TDD-refactor agents with automated handoffs;[4]
detects existing test conventions (Jest describe/it, pytest fixtures, JUnit)
|
Free (2k completions/mo) · Pro $10/mo · Pro+ $39/mo[2] | Best default — most participants already have it |
| Cursor | AI-native VS Code fork (imports VS Code extensions & keybindings) |
Composer for multi-file edits; YOLO mode auto-runs tests, reads failures, and iterates without manual intervention;[6]
@file references pin test files as specs
|
Free (limited) · Pro $20/mo · Business $40/user/mo[5] | Best AI-native TDD loop; ≈30 min install overhead for non-users |
| Continue.dev ⭐ 33.5k | Open-source VS Code / JetBrains agent (BYOM) | Bring-your-own model (Anthropic, OpenAI, Ollama local); shared config via version-controlled files — every participant gets identical AI behaviour out of the repo[8] | Free (BYOK) | Best for "use your own model" segment; mild setup overhead for API keys |
| Cline ⭐ 62.7k | Autonomous VS Code agent (BYOK) | Runs dev servers and test commands inside the terminal, reads failures, iterates across files; Plan Mode narrates intent before acting — useful for live demos[9] | Free (BYOK) | High-impact demos; requires Claude/OpenAI API key per participant |
| Aider ⭐ 45.7k | Terminal agent (git-native, BYOK) | Writes tests → runs pytest/vitest → auto-commits on green; entire TDD loop visible as git history; no IDE required[10] | Free (BYOK) | Best for backend-only exercises or "show git as the TDD ledger" segment |
→ Exercises must work with either Copilot or Cursor. Write prompts as plain text instructions, not tool-specific slash commands, so any agent can execute them.
Layer 2 — Testing Framework
| Language | Recommended | Alternative | Rationale |
|---|---|---|---|
| TypeScript / JS | Vitest ⭐ 16.6k[12] | Jest ⭐ 45.4k for existing codebases[13] | 2–5× faster cold start; watch-mode reruns in <300ms vs Jest's 3–10s; native ESM; TypeScript zero-config via esbuild — participants spend time on TDD, not config.[11] Jest-compatible API means AI tools output valid Vitest code without prompting. |
| Python | pytest + hypothesis | unittest | De facto standard; every major AI tool defaults to pytest idioms; hypothesis enables property-based testing demo in one exercise slot |
| Browser / E2E | Playwright ⭐ 90.1k[14] | Cypress (JS only) | Multi-browser; AI copilots generate Playwright selectors natively; codegen command creates a test by recording clicks — good for "AI-generated E2E from user story" demo |
→ Run the main exercises in TypeScript + Vitest. Offer a Python track if the audience skews backend-heavy. Avoid mixing languages mid-exercise — it dilutes the TDD-loop focus.
Layer 3 — Workshop Infrastructure
"Works on my machine" is the primary failure mode of virtual workshops. Use GitHub Codespaces backed by a .devcontainer: each participant clicks one button and has a running environment in ≈90 seconds; if it breaks, they delete and re-create in 2 minutes.[15] This approach was validated at Simon Willison's NICAR 2026 coding-agents workshop, where YOLO-mode aliases were pre-wired into the container so participants skipped all permission dialogs.[16]
Codespaces free tier covers 120 core-hours/month per GitHub account — enough for a 4-hour workshop for most individual participants. For Copilot, participants bring their own subscription; the extension authenticates automatically inside the container.
Layer 4 — Optional Quality-Demo Tools
Each of these fits a 5-minute slot that makes a high-signal point without needing to be part of every exercise:
| Tool | What it demonstrates | Cost |
|---|---|---|
| Qodo (formerly CodiumAI) — VS Code + JetBrains | AI generates behaviourally meaningful tests, not just coverage-padding ones; v2.0 multi-agent PR review scored 60.1% F1 (9 points ahead of next tool)[17] | Free 250 credits/mo; Teams $30–38/user/mo[18] |
| mutmut ⭐ 1.3k (Python mutation testing) | Runs mutmut run against AI-generated tests → reveals the "pass but miss bugs" problem concretely; makes visible why oracle-designed tests fail[19] |
Free / open source |
| VS Code Copilot TDD Custom Agents | Three agents (TDD-red / TDD-green / TDD-refactor) wired as .github/agents/ files with automated phase handoffs — shows how to codify the discipline into the project repo itself[4] |
Free (with any Copilot plan) |
TDD Prompt Patterns — Put These on Exercise Sheets
These two prompts prevent the most common AI failure mode: the model rewrites the failing test to pass rather than fixing the implementation.[7]
Two-session pattern: start a fresh chat between Phase 1 and Phase 2. This prevents the model from using the test-authoring context to "cheat" by writing an implementation before the red phase is committed. Git checkpoint after red is mandatory — it's the proof that tests were genuinely failing.[7]
⚠ Verify tests are genuinely red before issuing the implementation prompt. Vacuously-passing tests (no assertion, expect(true).toBe(true)) make the entire exercise pointless and are a common AI mistake when the spec is underspecified.