All Contenders at a Glance
| Framework | ⭐ | Philosophy | Core Primitive | Sweet Spot |
|---|---|---|---|---|
| LangGraph [2] | 26.9k | Control-explicit | Graph / Node / Edge | Production stateful, regulated industries |
| Google ADK [12] | 20k | Control-explicit | Agent Tree / Graph Engine | Gemini ecosystem, A2A interop |
| CrewAI [4] | 52.7k | Role-declarative | Crew / Agent / Task | Fast prototyping, non-engineer stakeholders |
| AutoGen [5] (archived) | 58.7k | Conversation-emergent | GroupChat / ConversableAgent | Legacy systems only |
| AG2 [6] (community fork) | 4.6k | Conversation-emergent | GroupChat / Event Bus | Code-execution-heavy, research workflows |
| OpenAI Agents SDK [7] | 33.7k | Handoff-centric | Agent / Handoff / Guardrail | GPT-first, voice agents, sequential delegation |
| Strands Agents [8] | ~6.7k | Model-driven minimalism | Model + Tools + Prompt | AWS-native, minimal ceremony |
| Semantic Kernel [14] | 28k | Platform-integrated | Plugin / Planner / Kernel | .NET / Azure enterprise stacks |
| Smolagents [16] | 27.7k | Code-as-action | CodeAgent / ToolCallingAgent | Local LLMs, coding-heavy tasks |
| Pydantic AI [18] | 17.5k | Type-safe pipeline | Agent / Model / RunContext | Output-validation-critical apps |
| Mastra [20] | 24.7k | TypeScript-native | Agent / Workflow / Tool | Web-stack TypeScript teams |
| LlamaIndex [22] | 49.9k | Data-first | Index / QueryEngine / Agent | RAG-grounded agents, private data corpora |
The Seven Schools
Control-Explicit — You Write the Graph, the Framework Runs It
Both frameworks in this school share a conviction: orchestration logic is too important to leave to a language model. You define nodes (computation steps), edges (state transitions), and conditional branches explicitly. The runtime executes exactly what you specified. Payoff: determinism, auditability, rewind. Cost: upfront graph design.
LangGraph
Directed graph with typed state and reducer-based concurrent-update resolution. interrupt() makes human-in-the-loop a first-class primitive. LangGraph Studio provides visual step-through and rewind/replay.[3]
400+ enterprise deployments: Klarna, Uber, LinkedIn, BlackRock, JPMorgan.[3] 34.5M monthly PyPI downloads.[3] Stateful caching cuts LLM calls 40–50% on repeat workflows.[3]
Benchmark: 94% multi-step accuracy, $0.08/task.[1]
Google ADK
Engineering-first, code-first, "low floor / high ceiling." Hierarchical agent trees where specialists delegate to sub-agents. Google reframed it as an agent execution framework (not a toolkit) in Feb 2026,[13] adding a graph-based engine with a dial from dynamic model-led reasoning to strict deterministic workflows.
ADK v2.0: A2A (agent-to-agent) protocol native, collaborative workflow APIs, Kotlin for Android, OpenTelemetry via MLflow.[13]
Role-Declarative — Define the Crew, Let the Framework Coordinate
You declare agents with roles, goals, and backstories; you assign tasks with expected outputs. The framework handles coordination. The mental model mirrors how humans describe team structure — fast to explain to non-engineers, fast to iterate on.
CrewAI
Version 1.14+ with A2A protocol support. Different agents in the same crew can use different LLMs. Enterprise platform (AMP) adds a visual workflow editor, RBAC, real-time monitoring, and team collaboration.[1]
10M+ agents/month;[3] ~60% of Fortune 500 exploring it.[3] Setup: 2–4 hours from install to running crew.[3]
Benchmark: 87% multi-step accuracy, $0.12/task; up to 3× token overhead vs LangGraph on simple tasks.[1][3]
Conversation-Emergent — Agents Debate to Consensus
Rather than pre-defining coordination topology, agents pass messages in a loop and coordination emerges from the dialogue. This school pioneered code-execution agents: one agent writes code, a critic reviews it, a tester runs it. Microsoft placed AutoGen in maintenance mode in late 2025 after a v0.4 rewrite diverged from the community; the community forked as AG2 to preserve the v0.2 API.
AutoGen (archived)
Still the most-starred framework in this landscape but placed in maintenance mode late 2025.[19] Microsoft's v0.4+ rewrite is a separate project with different primitives. Stars reflect legacy adoption, not 2026 momentum.
AG2 (community fork)
Community continuation of AutoGen v0.2 with event-driven async execution and MemoryStream pub/sub. Standardised Discover → Plan → Execute → Verify lifecycle. Strongest pattern for write+review+debug loops.[1]
Benchmark: 91% multi-step accuracy, $0.45/task — highest quality and highest cost.[1]
Handoff-Centric — Explicit Sequential Delegation
Five primitives: Agents, Handoffs, Guardrails, Sessions, Tracing. Control transfers explicitly from agent to agent via handoff calls. Less expressive topology than a graph, but minimal learning curve — onboarding measured in hours not weeks.[3]
OpenAI Agents SDK
Evolved from experimental Swarm to production-grade toolkit April 2026. Native sandboxing, GPT-4 Realtime integration for voice agents. TypeScript SDK reached Python parity in 2026. Supports 100+ LLMs via Chat Completions API despite OpenAI-centric design.[3]
Model-Driven Minimalism — The LLM Is the Orchestrator
Where the control-explicit school says "you define the graph," this school says "the model is smart enough to decide." Three primitives: Model, Tools, Agent. The LLM autonomously decides which tools to call and in what order. Explicit steering hooks — not prompts — provide guardrails where needed; hooks outperform prompt-only constraints empirically.[9]
Strands Agents
Released by AWS under Apache 2.0;[10] backed by Anthropic, Meta, Accenture, PwC.[10] Production origin: powers AWS Transform for .NET modernization at scale.[11]
Semantic search scales tool inventories to thousands of APIs. Steering handlers vs prompt-only: 100% vs 82.5% task accuracy.[9] Four coordination patterns: hierarchy, swarms, graphs, meta-agents.[9]
Platform-Integrated — Framework as Microsoft Ecosystem Glue
Semantic Kernel is less a general-purpose agent framework and more a deep integration layer for the Microsoft stack. It represents the only framework here with genuine first-class C# support — a rarity in a Python-dominated ecosystem.[15]
Semantic Kernel
Plugin-based planner with native Azure OpenAI, Azure AI Foundry, and Microsoft 365 integration. Supports Python, C#, and Java; C# is first-class.[15] Most mature governance features (RBAC, audit logging) alongside CrewAI Enterprise.[1]
Niche Schools — One Constraint, Done Best
These frameworks are not competing for the general-purpose slot. Each solves one constraint better than anything else in the landscape.
| Framework | ⭐ | Niche Philosophy | When to Pick It |
|---|---|---|---|
| Smolagents [16] | 27.7k | Code-as-action: the action primitive is generated Python — agents write and execute code rather than calling structured tool functions. 40 lines for a ReAct agent vs 120 in LangGraph.[17] | Local LLMs, code-generation-heavy tasks, minimal framework overhead |
| Pydantic AI [18] | 17.5k | Type-safe pipeline: FastAPI-style ergonomics — strict types, dependency injection, schema-validated structured outputs throughout the agent lifecycle.[19] | Apps where output format failures are unacceptable; teams already fluent in FastAPI patterns |
| Mastra [20] | 24.7k | TypeScript-native: Zod schemas flow end-to-end (tool input → structured output → workflow state → API response). Clean separation: agents decide, workflows orchestrate.[21] | Web-stack TypeScript teams; Next.js + AI agent integration without touching Python |
| LlamaIndex [22] | 49.9k | Data-first: built from the ground up to connect LLMs to external data. Agent capabilities are layered on top of the strongest retrieval/indexing foundation in the ecosystem. | Agents whose primary job is reasoning over private/indexed data corpora |
The Convergence Layer
All six primary frameworks now ship MCP (Model Context Protocol), streaming, persistence, and observability as table-stakes features.[1] ReAct (Reason + Act) is the default execution loop across all of them. The meaningful differences live above this layer — in the coordination model, the control surface the developer touches, and the ecosystem fit.
RBAC, audit logging, and enterprise SSO have moved from premium features to defaults across the major platforms in 2026. The notable outlier: Strands delegates governance to AWS infrastructure (IAM, VPC, CloudTrail) rather than embedding it in the framework — a philosophically consistent choice for its model-driven, minimal-ceremony school.