- Most control, production-grade: LangGraph ⭐ 26.9k — 94% multi-step accuracy at $0.08/task, LangGraph Studio, first-class checkpointing &
interrupt(). Cost: 1–2 week learning curve.[1] - Fastest prototype: CrewAI ⭐ 52.7k — role-based crews, running in 3–5 days. Hit the branching ceiling → migrate to LangGraph.[3]
- Lowest cognitive overhead / voice: OpenAI Agents SDK ⭐ 33.7k — working agent in 2–3 days, Realtime API for voice; sequential handoffs only.[6]
- TypeScript / web stack: Mastra ⭐ 24.7k — only mature TS-native framework, 3,300+ models, time-travel debug, first-class HiL suspend/resume.[23]
- .NET / Azure / M365 enterprise: MS Agent Framework ⭐ 9.9k — GA April 2, 2026, replaces Semantic Kernel + AutoGen, 7 LLM providers, first-class MCP + A2A.[5]
- AWS-native / minimal ceremony: Strands ⭐ ~6.7k — 3 primitives, IAM/VPC-native; steering handlers beat prompt-only 100% vs 82.5%.[21]
- Multimodal (text+audio+video+image) or cross-org A2A: Google ADK ⭐ 20k — only framework with full in-loop multimodal; A2A protocol native.[8]
- Avoid starting new projects on: AutoGen (archived by Microsoft late 2025) — move to AG2 or MS Agent Framework.[22]
Full Feature Matrix
| Framework | School | License | Lang | Checkpoint | HiL | MCP | Parallel | Code Exec | Voice / Multimodal | Multi-LLM | Learning Curve | Studio / Debugger | Observability | Enterprise |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LangGraph ⭐ 26.9k[9] | Graph-explicit | MIT | Py + TS | ✓ First-class | ✓ interrupt() |
⚠ Adapter | ✓ Reducer-based | ⚠ Custom | ✗ | ✓ Any LLM | 1–2 weeks | ✓ LG Studio | LangSmith | RBAC + audit |
| Google ADK ⭐ 20k[13] | Graph-explicit | Apache 2.0 | Py | ✓ Pluggable backends | ✓ | ✓ Native (v2.0) | ✓ Agent tree | ⚠ | ✓ Text+audio+video+image | ⚠ Gemini-first | Moderate | ⚠ OTEL/MLflow | OTEL + MLflow | A2A native |
| CrewAI ⭐ 52.7k[10] | Role-declarative | Apache 2.0 | Py | ✗ | ⚠ Callbacks | ✓ DSL + adapters | ⚠ Limited | ✗ | ✗ | ✓ Per-agent LLM | 3–5 days | ✓ AMP editor | AMP dashboard | ⚠ Enterprise AMP |
| AutoGen ⭐ 58.7k | ⚠ ARCHIVED by Microsoft (late 2025) — maintenance-only. Migrate to AG2 (community fork) or MS Agent Framework.[22] | |||||||||||||
| AG2 ⭐ 4.6k[12] (AutoGen fork) | Conv-emergent | Apache 2.0 | Py | ⚠ Event replay | ⚠ GroupChat | ✓ v0.12 native | ⚠ Limited | ✓ Docker/local | ✗ | ✓ | Moderate | ✗ | ⚠ Custom | ✗ |
| OpenAI Agents SDK ⭐ 33.7k[11] | Handoff-centric | MIT | Py + TS | ⚠ Session (2026) | ⚠ Approval callbacks | ⚠ v0.7+ | ⚠ Sequential only | ⚠ Sandbox | ✓ Realtime API | ⚠ OpenAI-first | 2–3 days | ⚠ Dashboard | Native tracing | ⚠ Basic guardrails |
| Strands Agents ⭐ ~6.7k[20] | Model-driven | Apache 2.0 | Py | ⚠ Dev-managed | ⚠ Custom tools | ✓ First-class | ⚠ 4 patterns | ✓ AgentCore | ✗ | ✓ Bedrock/Anthropic/OpenAI/Ollama | Low (3 primitives) | ⚠ OTEL | AWS CloudTrail + OTEL | AWS IAM/VPC |
| MS Agent Framework ⭐ 9.9k[14] v1.0 Apr 2026 | Platform-integrated | MIT | Py + .NET | ✓ All 5 patterns | ✓ pause/resume | ✓ First-class | ✓ Concurrent fan-out | ⚠ | ✗ | ✓ 7 providers | Moderate | ⚠ | ⚠ OTEL-based | Azure RBAC + M365 |
| Smolagents ⭐ 27.7k[15] | Code-as-action | Apache 2.0 | Py | ⚠ Basic | ✗ | ✓ | ⚠ Limited | ✓ Native (code agent) | ✗ | ✓ Local LLMs | Very low (40 lines) | ✗ | ⚠ Basic | ✗ |
| Pydantic AI ⭐ 17.5k[16] | Type-safe pipeline | MIT | Py | ⚠ Basic | ⚠ Basic | ✓ | ⚠ Limited | ✗ | ✗ | ✓ Any provider | Low (FastAPI-style) | ✗ | ⚠ Basic | ✗ |
| Mastra ⭐ 24.7k[17] | TS-native | Apache 2.0 | TS only | ✓ Time-travel | ✓ Suspend/resume | ✓ First-class | ✓ | ✗ | ✗ | ✓ 3,300+ models | Moderate | ✓ Mastra Studio | Built-in evals + tracing | ⚠ Growing |
| LlamaIndex ⭐ 49.9k[18] | Data-first | MIT | Py | ✓ Workflows | ✓ | ⚠ | ✓ | ✗ | ✗ | ✓ Multiple | Moderate | ⚠ LlamaTrace | LlamaCloud + LlamaTrace | ⚠ LlamaCloud paid |
⚠ = partial/limited · HiL = Human-in-Loop · Sources: [2][6]
Performance Benchmarks
| Framework | Multi-Step Accuracy | Cost / Task | Notes |
|---|---|---|---|
| LangGraph | 94% | $0.08 | Stateful caching cuts LLM calls 40–50% on repeat workflows[6] |
| AG2 | 91% | $0.45 | Highest accuracy on code write+review+debug; highest token cost[1] |
| OpenAI Agents SDK | 90% | $0.11 | [2] |
| Strands Agents | 89% (→ 100% with steering handlers) | $0.10 | Steering handlers vs prompt-only: 100% vs 82.5% on same tasks[21] |
| CrewAI | 87% | $0.12 | Up to 3× token overhead vs LangGraph on simple tasks[6] |
Critical Axes: Where Frameworks Diverge Most
Checkpointing & State Persistence
| Framework | Tier | Detail |
|---|---|---|
| LangGraph | Best-in-class | Checkpointer API: time-travel, rewind, replay, cross-session resume. Most-cited reason for production adoption.[2] |
| Mastra | Best-in-class (TS) | Workflow state persistence with time-travel debugging — unique in the TypeScript ecosystem.[19] |
| MS Agent Framework | Strong | Streaming + checkpointing + pause/resume built into all 5 orchestration patterns.[4] |
| Google ADK | Good | Session state with pluggable backends; unified graph engine adds deterministic checkpointing.[8] |
| LlamaIndex | Good | LlamaIndex Workflows include checkpointing and human-in-loop options for high-stakes processes.[18] |
| AG2 | Partial | Event replay preserves conversation history but is not a true workflow checkpoint. |
| Strands Agents | Developer-managed | No built-in checkpointing; developer responsibility to persist state. Consistent with minimal-ceremony philosophy.[1] |
| CrewAI | Weak | No built-in checkpoint/resume. Most-cited production limitation: typical workaround is rebuilding the flow in LangGraph once checkpointing is needed.[3] |
MCP (Model Context Protocol) Support
| Framework | Level | Detail |
|---|---|---|
| Strands Agents | Architecture-level | Built around MCP; semantic search scales tool inventories to thousands of APIs.[21] |
| MS Agent Framework | "Infrastructure, not a checkbox" | Dynamic tool discovery from any MCP server without code changes; MCP server and host in one SDK.[4] |
| Mastra | First-class | MCP tool sharing built-in; Zod schemas flow end-to-end through MCP tool calls.[19] |
| Google ADK | Native (ADK v2.0) | MCP added natively in ADK v2.0, alongside A2A (agent-to-agent) protocol.[8] |
| AG2 | Native (v0.12) | Built-in MCP support since v0.12.[1] |
| CrewAI | DSL + adapters | First-class in the CrewAI DSL with a growing community adapter ecosystem.[3] |
| OpenAI Agents SDK | Added v0.7 (2025) | MCP added mid-2025; not a first-class design primitive.[2] |
| LangGraph | Via LangChain adapter | Works, but inherits LangChain adapter layer — not a first-class graph primitive.[2] |
2026 Consolidation Events
| Date | Event | Impact |
|---|---|---|
| Late 2025 | AutoGen placed in maintenance mode by Microsoft | Do not start new projects. Community forked as AG2 to preserve v0.2 API.[22] |
| Feb 2026 | Google reframes ADK as an agent execution framework (not a toolkit) | Adds unified graph-based engine, GitHub/Jira/MongoDB connectors, OTEL via MLflow, Kotlin for Android.[8] |
| Apr 2, 2026 | MS Agent Framework v1.0 GA — merges Semantic Kernel + AutoGen | Single unified SDK for .NET and Python; 5 orchestration patterns, 7 LLM providers, first-class MCP + A2A + AG-UI.[5] |
| Apr 2026 | OpenAI Agents SDK promoted from Swarm experiment to production toolkit | TypeScript SDK reaches Python parity; native sandboxing added; voice (Realtime) becomes first-class.[6] |
| 2026 (ongoing) | CrewAI v1.14+ adds A2A protocol support | Enables cross-framework agent interop; Enterprise AMP adds RBAC and real-time monitoring.[1] |
Pick Your Path
LangGraph ⭐ 26.9k — directed graph, first-class checkpointing, best observability (LangSmith). Budget 1–2 weeks to climb the learning curve.[3]
CrewAI ⭐ 52.7k — role-based crews, stakeholder-legible abstractions, running in 3–5 days. When branching logic becomes complex, migrate to LangGraph (~1–2 weeks rebuild).[3]
OpenAI Agents SDK ⭐ 33.7k — working agent in 2–3 days, Realtime API for voice, 100+ LLMs. Sequential handoffs only — plan topology before scaling.[6]
Mastra ⭐ 24.7k — only mature TS-native agent framework, 3,300+ models, time-travel debug, full HiL suspend/resume, Mastra Studio.[23]
Strands Agents ⭐ ~6.7k — IAM/VPC/Bedrock-native, 3 primitives, minimal boilerplate. Steering handlers outperform prompt-only: 100% vs 82.5%.[21]
MS Agent Framework ⭐ 9.9k — GA April 2026, replaces Semantic Kernel + AutoGen, C#/.NET depth, 7 LLM providers, first-class MCP + A2A.[5]
Google ADK ⭐ 20k — only framework with text+audio+video+image in-loop; A2A protocol native for cross-organization agent interop.[8]
LlamaIndex ⭐ 49.9k — built ground-up for retrieval; agent capabilities layer on the strongest indexing foundation in the ecosystem.[18]
Smolagents ⭐ 27.7k — 1,000-line core, action primitive is generated Python code, 40-line ReAct agent vs 120 in LangGraph. No production enterprise features.[15]
Pydantic AI ⭐ 17.5k — FastAPI-style DI, schema-validated outputs with auto self-correction. Not a general-purpose framework — pair with LangGraph or Mastra for orchestration.[16]