← Default view

AI-Assisted TDD Workshop Playbook

expedition · 79 citations · Sonnet 4.6 · 2026-06-03 · facilitator ops dashboard

⏱ 90 min session 🧪 60–70% hands-on 👥 3 facilitation roles ⚡ 4 sub-research areas 📚 79 citations
critical constraint — embed in exercise sheets, .github/copilot-instructions.md, and opening frame
"You may not modify the test file."

Without this, expert developers find the cheat in under 10 minutes and dismiss the entire loop. It redirects AI optimization from "make it green by any means" to "write a correct implementation." Adding standard TDD procedural instructions without a dependency context map worsened regression rates from 6.08% to 9.94% — worse than no TDD instruction at all. [6]

session timeline — 90 minutes
00:00 Check-in
00:10 Context Frame
00:20 Exercise Block 1 — 35 min
00:55
01:00 Exercise Block 2 — 30 min
01:30 Debrief
01:40 Takeaways
Setup / Frame (20 min) Hands-on exercises (65 min) Break — non-negotiable Synthesis (20 min)
opening frame evidence — name the gap before the first exercise
+55.8%
GitHub Copilot RCT speedup
bounded toy task
−19%
METR RCT: same experienced devs
on their own codebases, Cursor Pro
+20%
What those devs estimated
the self-perception gap
92%
Copilot /tests failure-or-empty rate
with no seed tests
9.94%
Regression rate: TDD instructions
without context map (up from 6.08%)
1.82%
Regression rate: TDD instructions
plus dependency context map
exercise catalog — warm-up + main exercise
String Calculator
Greenfield TDD · run twice: baseline then AI to show delta
15–20 min
warm-up
Tetris Skeleton
Pre-written failing tests · implement with AI
45–60 min
medium
Goose Game ⭐ 3
Kotlin · prompts/ log → concrete debrief material
45–60 min
medium
Gilded Rose
Legacy refactoring · characterization tests before touching logic
40–60 min
med-hard
Trip Service
Dependency breaking · AI alone cannot reliably solve seam design
45–60 min
hard
EXACT Mini-project
Example Mapping → AI-TDD synthesis · all three autonomy levels
60–90 min
expert
⚠ Anti-pattern demo · 10–15 min

Vibe-code a feature without tests → add second feature → observe architecture degrade live. AI agents never spontaneously suggest refactoring without test constraints. [7] Show the vibe-coded diff alongside a TDD diff. Expert devs internalize it without argument.

facilitation roles — never let lead touch zoom controls
🎙 LEAD required
  • Delivers content, runs exercises, timeboxes discussions
  • Names AI limits explicitly in context frame
  • Calls the break at 00:55 — hard stop
  • Never also operates Zoom/Meet controls
🎛 PRODUCER required
  • Manages polls, breakouts, recordings, visible countdown timer
  • Watches chat; voices questions to Lead without interrupting
  • Posts exercise instructions in chat (verbal-only gets missed)
  • Maintains parking lot board (Miro / FigJam)
🛠 HELPER(S) 1 per room of 4–6
  • Joins each breakout room first 3 min; confirms exercise loaded
  • Demos on own screen — never takes over participant's keyboard
  • Only needs to know one exercise; comfort beats completeness
✓ what works
  • Peer credibility framing
  • Explicit sandbox safety
  • First win in < 5 min
  • Mixed-sceptic rooms
  • Silent brainstorm before open floor
✗ what fails
  • "This is the new standard"
  • Skipping AI limits discussion
  • All-sceptics or all-enthusiasts rooms
  • Lecture-mode > 15 min
pre-workshop logistics — complete before session day
Environment — GitHub Codespaces
  • Commit .devcontainer/devcontainer.json with prebuild enabled
  • Language runtime + test framework (e.g. Node 22 + Vitest)
  • AI extension pre-authenticated inside the container
  • Skeleton repo: failing tests present, implementation stubs empty
  • Reference solution on separate branch (unblocks without spoiling)
  • CI on every push — instant green/red signal
  • ⚠ Free tier: 60 hrs/month — provide credit vouchers
$
API Keys — Per-Participant
  • Hard budget cap: $2–5 per key
  • Expiry: session day + 24 hrs
  • Model allowlist: workshop model only
  • Email distribution link 48 hrs out with curl test snippet
  • Claims window: opens 1 hr before start
  • Shared key = security risk for >10 participants
72h
Participant Pre-Check (72 hrs out)
  • GitHub account; Codespace opens from workshop link
  • AI extension authenticated; "hello world" generation passes
  • API key claimed; curl snippet returns valid response
  • Zoom desktop client installed (browser breaks breakout screen share)
  • Hard gate — not optional prep
🏠
Breakout Room Setup
  • Pre-assign groups — never random (experts resent the kindergarten feel)
  • 2–3 devs per room; 4–5 maximum before collaboration degrades
  • Mix skill levels; moderate cognitive diversity
  • Helper assigned to each room before session starts
  • Brief all three roles together 10 min before session
frameworks — teach alongside exercises

TDAID — Test-Driven AI Development

Extends classic red-green-refactor with a Plan phase before Red (AI generates implementation roadmap) and a Validate phase after Refactor (human reviews the diff to catch "cheat" tests).

Plan → Red → Green → Refactor → Validate

EXACT — Example-guided AI-Collaborative TDD

Prepends Example Mapping before the first test. Three autonomy levels — let participants choose and debrief the difference:

AAI runs until end of featurespeed mode
BAI runs until end of each RGR cycle★ default
CAI runs until end of each phasemax oversight
failure mode pre-mortem — 10 predictable failures
Failure mode Prevention
⚙ Environment setup in live session Loses 20–30 min; derails all subsequent timings Codespaces prebuild + mandatory pre-check 24 hrs before [9]
🔑 AI API key failure on day Blocks all exercises; kills workshop credibility Pre-provision with expiry; day-before curl test required to claim key [10]
📺 Demo-heavy, hands-on-light Expert disengagement within 15 min Hard rule: ≤7 min explanation before participants touch code; 60–70% of session must be hands-on [14]
🗣 Dominant expert hijacking discussion Others disengage; session follows one rabbit hole Parking lot + timebox; round-robin debrief format; silent brainstorm before open floor [13]
❓ Exercise too ambiguous Participants stuck; helpers overwhelmed; pacing collapses Test every exercise solo end-to-end before the session; embed "if stuck" hints as code comments in repo stub
🛠 Tool sprawl Cognitive overload; participants lose their place One primary tool per task; introduce tools sequentially; avoid simultaneous Zoom + Miro + Slack + IDE [14]
👤 No helper in breakout rooms Stuck participants wait silently; frustration builds 1 helper per room of 4–6, briefed on exercise goals, arrives in room for first 3 min [11]
🚧 Expert resistance to AI tooling Overt scepticism infects room culture Address AI limits explicitly in context frame; peer-champion framing; concrete first win in < 5 min [12]
⏰ Overrun debrief, no synthesis time Participants leave with open loops Hard 10-min closing slot in run-of-show; parking lot absorbs overflow; written recap within 24 hrs
☕ No break in 90-min session Focus degrades in last 30 min; diminishing returns 5-min break at 00:55, non-negotiable even under time pressure
expedition sub-pages