Three Archetypes
Seven tools and dozens of variants all slot into three execution models.[2] Most developers end up combining two.
CLI-First
Terminal-native. Flexible, scriptable, model-agnostic. You drive the editor; the agent drives the shell.
IDE-Native
AI baked into every editing surface — autocomplete, inline chat, multi-file composer, background agents.
Cloud Engineering
Fully autonomous execution in isolated sandboxes. You define the goal; the agent plans, codes, and PRs.
Commercial Tools at a Glance
| Tool | Backing model(s) | Context | Agentic standout | Pro price/mo |
|---|---|---|---|---|
| Claude Code ⭐ 131k | Opus 4.8 (88.6% SWE-V) | 1M tokens | Dynamic Workflows: parallel sub-agents; MCP-native | $17–20 / $100–200 Max |
| Cursor | Composer 2.5 + multi-model | 200K | "Build in Parallel" — up to 8 async sub-agents; Supermaven 72% autocomplete acceptance | $20 / $200 Ultra |
| GitHub Copilot | Opus 4.8, GPT-5.5, Gemini 3.5 | 32K–128K | Issues→PR cloud agent; inline completions unlimited; 15M users | $10 / $100 Max |
| Windsurf / Devin Desktop | SWE-1.6, Opus 4.8, GPT-5.5 | 200K | Cascade cross-file agent; Codemaps (AI-annotated visual nav); Devin cloud delegate | $20 |
| Kiro | Claude Sonnet + Amazon Nova | 200K | Spec-driven (requirements.md → design.md → tasks.md); parallel tasks cut time 4× | $20 / $200 Max |
| OpenAI Codex | GPT-5.5 | 128K | Cloud sandboxes, no local setup; multi-agent macOS/Windows desktop app | $20 (via ChatGPT) |
| Google Antigravity 2.0 | Gemini 3.5 Flash (289 tok/s) | 1M | Agents drive editor + terminal + browser; scheduled background tasks; SDK for custom agents | $19.99 |
Capability Deep Dives
Context window — the biggest differentiator
Claude Code and Google Antigravity both support 1M-token windows, enough to load an entire mid-sized codebase in one shot.[5] Copilot's 32K–128K range confines its agent mode to smaller, focused tasks — fine for single-file edits, a ceiling for cross-repo architecture work.
Kiro: spec before code
Amazon launched Kiro on May 7, 2026 as a ground-up replacement for Amazon Q Developer.[6] Its workflow produces three artefacts before writing a line of code: requirements.md (user stories + EARS acceptance criteria), design.md (architecture + data models), and tasks.md (atomic checklist). A new Requirements Analysis feature uses formal methods to verify requirements are contradiction-free.[17] Parallel Task Execution cuts implementation time ~75% for large features.[6] Kiro routes between Claude Sonnet (reasoning-heavy specs) and Amazon Nova (high-throughput code generation) via Bedrock.
Windsurf → Devin Desktop
Cognition acquired Windsurf in December 2025 and integrated its SWE-1 model family. SWE-1.6 is 13× faster than Claude Sonnet 4.5 (claimed) and improved SWE-bench Pro by 10%+ over SWE-1.5.[11] Codemaps — AI-annotated visual code navigation — are a differentiator neither Cursor nor Claude Code has shipped. Windsurf rebranded to Devin Desktop in June 2026, making the cloud Devin agent the default surface.
Benchmarks — What the Numbers Actually Mean
| Model / Agent | SWE-bench Verified | SWE-bench Pro (clean) | Note |
|---|---|---|---|
| Claude Mythos Preview | 93.9% | — | Preview model, not publicly available |
| Claude Opus 4.8 | 88.6% | 69.2% | Best on clean benchmark |
| GPT-5.3 / Codex | 85% | — | OpenAI stopped publishing Verified |
| Claude Opus 4.5 | 80.9% | 45.9% | 35-pt contamination gap |
| Augment Code | 70.6% (self-reported) | — | Highest on AI code review benchmark[13] |
Architecture matters as much as model: three frameworks running the same model scored 17 issues apart on 731 problems in February 2026 testing.[5] A Verified score alone tells you almost nothing — look for Pro scores and architecture details.
Open-Source Alternatives
Cline ⭐ 63k
VS Code extension. Inspect, edit, run terminal, use browser — asks permission each step. Best BYOM support.
Gemini CLI ⭐ 105k
Apache 2.0 TypeScript CLI. Free tier: 1000 req/day, 1M context. Google Search grounding built in.[7]
Aider ⭐ 46k
Terminal pair-programmer with git-aware diffs. Supports 100+ models. Transparent cost billing.
For teams wanting no vendor lock-in: Gemini CLI is the strongest free option (1M context, Google Search grounding, open source).[8] OpenAI also open-sourced its Codex CLI (github.com/openai/codex ⭐ 90k[9]). Claude Code itself is on GitHub (⭐ 131k[10]) though the weights aren't open.
The MCP Layer — Interoperability in Practice
The Model Context Protocol is now the universal plugin bus for coding agents.[12] As of March 2026: 97M monthly SDK downloads, 10K+ public servers, 41% of surveyed engineering orgs in production. Anthropic donated it to the Linux Foundation in December 2025 (co-founded with Block and OpenAI; Google, Microsoft, AWS all backing).
All major tools support MCP: Claude Code (native), Cursor, VS Code/Copilot, Windsurf, Kiro, Gemini CLI. 500+ public servers cover databases, file storage, project management (Jira, Asana), messaging (Slack), and CI/CD. In practice: one MCP config file gives every tool in your stack the same access to your private repos, test runners, and internal APIs.
Memory files have become the standard cross-session context mechanism: CLAUDE.md, AGENTS.md, GEMINI.md encode project conventions that agents reference across sessions — the practical replacement for prompt preambles.[2]
Pick Your Stack
| Use case | Pick | Why |
|---|---|---|
| Daily IDE, lowest friction | GitHub Copilot Pro | $10/mo, unlimited inline completions, works in every editor, GitHub-native |
| Daily IDE, best experience | Cursor Pro | Composer 2.5, parallel agents, 72% autocomplete acceptance rate |
| Complex multi-file / architecture | Claude Code Max | 1M context, Opus 4.8 reasoning, strongest on SWE-bench Pro |
| Spec-driven team workflow | Kiro Pro | Enforces requirements → design → tasks before code; 75% parallel speedup |
| Free / open-source | Gemini CLI | 1M context, 1000 req/day free, Apache 2.0, Google Search grounding |
| Autonomous delegation | Devin Desktop / Devin Cloud | Sandboxed execution, auto-PR, SWE-1.6 proprietary model |
| Budget-constrained team (10 devs) | GitHub Copilot Business | $2,280/yr vs $3,840 for Cursor Teams Standard |
Source: [1]
The most productive pattern in 2026: Cursor or Copilot for day-to-day editing (80% of work) + Claude Code for sessions demanding deep codebase understanding. MCP wires them to the same context.[18]