Atlas survey

The Contenders & Their Philosophies: 2026 Agent Harness Landscape

Seven design philosophies — control-explicit graphs to model-driven minimalism — define the 2026 agent harness landscape, with harness choice causing measurable performance variance on identical models.

23 sources ~5 min read #183 ai-agents · agent-frameworks · langgraph · crewai · strands-agents · openai-agents · google-adk · semantic-kernel · 2026
Decision: Seven schools of thought, not one right answer — each optimises for a different bottleneck. Control-explicit (LangGraph, Google ADK) wins on auditability; role-declarative (CrewAI) wins on time-to-prototype; model-driven (Strands) bets the LLM is smart enough to be the orchestrator. AutoGen is in maintenance mode — use AG2 only if already locked into the v0.2 API. And the scaffold is not neutral: identical models score measurably differently across different harnesses.[23]

All Contenders at a Glance

Framework Philosophy Core Primitive Sweet Spot
LangGraph [2] 26.9k Control-explicit Graph / Node / Edge Production stateful, regulated industries
Google ADK [12] 20k Control-explicit Agent Tree / Graph Engine Gemini ecosystem, A2A interop
CrewAI [4] 52.7k Role-declarative Crew / Agent / Task Fast prototyping, non-engineer stakeholders
AutoGen [5] (archived) 58.7k Conversation-emergent GroupChat / ConversableAgent Legacy systems only
AG2 [6] (community fork) 4.6k Conversation-emergent GroupChat / Event Bus Code-execution-heavy, research workflows
OpenAI Agents SDK [7] 33.7k Handoff-centric Agent / Handoff / Guardrail GPT-first, voice agents, sequential delegation
Strands Agents [8] ~6.7k Model-driven minimalism Model + Tools + Prompt AWS-native, minimal ceremony
Semantic Kernel [14] 28k Platform-integrated Plugin / Planner / Kernel .NET / Azure enterprise stacks
Smolagents [16] 27.7k Code-as-action CodeAgent / ToolCallingAgent Local LLMs, coding-heavy tasks
Pydantic AI [18] 17.5k Type-safe pipeline Agent / Model / RunContext Output-validation-critical apps
Mastra [20] 24.7k TypeScript-native Agent / Workflow / Tool Web-stack TypeScript teams
LlamaIndex [22] 49.9k Data-first Index / QueryEngine / Agent RAG-grounded agents, private data corpora

The Seven Schools

School 1 of 7

Control-Explicit — You Write the Graph, the Framework Runs It

Both frameworks in this school share a conviction: orchestration logic is too important to leave to a language model. You define nodes (computation steps), edges (state transitions), and conditional branches explicitly. The runtime executes exactly what you specified. Payoff: determinism, auditability, rewind. Cost: upfront graph design.

LangGraph

⭐ 26.9k [2]

Directed graph with typed state and reducer-based concurrent-update resolution. interrupt() makes human-in-the-loop a first-class primitive. LangGraph Studio provides visual step-through and rewind/replay.[3]

400+ enterprise deployments: Klarna, Uber, LinkedIn, BlackRock, JPMorgan.[3] 34.5M monthly PyPI downloads.[3] Stateful caching cuts LLM calls 40–50% on repeat workflows.[3]

Benchmark: 94% multi-step accuracy, $0.08/task.[1]

✓ Maximum control ✓ Best observability (LangSmith) ✗ Steepest learning curve (1–2 weeks) ✗ Graph mental model unfamiliar to imperative teams

Google ADK

⭐ 20k [12]

Engineering-first, code-first, "low floor / high ceiling." Hierarchical agent trees where specialists delegate to sub-agents. Google reframed it as an agent execution framework (not a toolkit) in Feb 2026,[13] adding a graph-based engine with a dial from dynamic model-led reasoning to strict deterministic workflows.

ADK v2.0: A2A (agent-to-agent) protocol native, collaborative workflow APIs, Kotlin for Android, OpenTelemetry via MLflow.[13]

✓ A2A protocol native ✓ Deepest Gemini ecosystem integration ✗ Weaker multi-model story than LangGraph ✗ Smaller community
School 2 of 7

Role-Declarative — Define the Crew, Let the Framework Coordinate

You declare agents with roles, goals, and backstories; you assign tasks with expected outputs. The framework handles coordination. The mental model mirrors how humans describe team structure — fast to explain to non-engineers, fast to iterate on.

CrewAI

⭐ 52.7k [4]

Version 1.14+ with A2A protocol support. Different agents in the same crew can use different LLMs. Enterprise platform (AMP) adds a visual workflow editor, RBAC, real-time monitoring, and team collaboration.[1]

10M+ agents/month;[3] ~60% of Fortune 500 exploring it.[3] Setup: 2–4 hours from install to running crew.[3]

Benchmark: 87% multi-step accuracy, $0.12/task; up to 3× token overhead vs LangGraph on simple tasks.[1][3]

✓ Fastest path to a working prototype ✓ Stakeholder-friendly abstractions ✗ Ceiling on complex conditional logic ✗ No built-in checkpoint / resume
School 3 of 7

Conversation-Emergent — Agents Debate to Consensus

Rather than pre-defining coordination topology, agents pass messages in a loop and coordination emerges from the dialogue. This school pioneered code-execution agents: one agent writes code, a critic reviews it, a tester runs it. Microsoft placed AutoGen in maintenance mode in late 2025 after a v0.4 rewrite diverged from the community; the community forked as AG2 to preserve the v0.2 API.

AutoGen (archived)

⭐ 58.7k [5]

Still the most-starred framework in this landscape but placed in maintenance mode late 2025.[19] Microsoft's v0.4+ rewrite is a separate project with different primitives. Stars reflect legacy adoption, not 2026 momentum.

✗ Maintenance-only ✗ Do not start new projects on archived branch

AG2 (community fork)

⭐ 4.6k [6]

Community continuation of AutoGen v0.2 with event-driven async execution and MemoryStream pub/sub. Standardised Discover → Plan → Execute → Verify lifecycle. Strongest pattern for write+review+debug loops.[1]

Benchmark: 91% multi-step accuracy, $0.45/task — highest quality and highest cost.[1]

✓ Best for code write+review+debug loops ✓ AutoGen v0.2 API continuity ✗ Highest token cost of any framework ✗ Small community post-fork (4.6k stars)
School 4 of 7

Handoff-Centric — Explicit Sequential Delegation

Five primitives: Agents, Handoffs, Guardrails, Sessions, Tracing. Control transfers explicitly from agent to agent via handoff calls. Less expressive topology than a graph, but minimal learning curve — onboarding measured in hours not weeks.[3]

OpenAI Agents SDK

⭐ 33.7k [7]

Evolved from experimental Swarm to production-grade toolkit April 2026. Native sandboxing, GPT-4 Realtime integration for voice agents. TypeScript SDK reached Python parity in 2026. Supports 100+ LLMs via Chat Completions API despite OpenAI-centric design.[3]

✓ Simplest API of all six major frameworks ✓ First-class voice agent support (Realtime API) ✗ Sequential handoffs limit complex topologies ✗ OpenAI-centric design philosophy
School 5 of 7

Model-Driven Minimalism — The LLM Is the Orchestrator

Where the control-explicit school says "you define the graph," this school says "the model is smart enough to decide." Three primitives: Model, Tools, Agent. The LLM autonomously decides which tools to call and in what order. Explicit steering hooks — not prompts — provide guardrails where needed; hooks outperform prompt-only constraints empirically.[9]

Strands Agents

⭐ ~6.7k [8]

Released by AWS under Apache 2.0;[10] backed by Anthropic, Meta, Accenture, PwC.[10] Production origin: powers AWS Transform for .NET modernization at scale.[11]

Semantic search scales tool inventories to thousands of APIs. Steering handlers vs prompt-only: 100% vs 82.5% task accuracy.[9] Four coordination patterns: hierarchy, swarms, graphs, meta-agents.[9]

✓ Minimal boilerplate (3 primitives) ✓ AWS-native (IAM, VPC, Bedrock, Lambda, AgentCore) ✗ Less control than graph-based frameworks ✗ Youngest framework — smallest community
School 6 of 7

Platform-Integrated — Framework as Microsoft Ecosystem Glue

Semantic Kernel is less a general-purpose agent framework and more a deep integration layer for the Microsoft stack. It represents the only framework here with genuine first-class C# support — a rarity in a Python-dominated ecosystem.[15]

Semantic Kernel

⭐ 28k [14]

Plugin-based planner with native Azure OpenAI, Azure AI Foundry, and Microsoft 365 integration. Supports Python, C#, and Java; C# is first-class.[15] Most mature governance features (RBAC, audit logging) alongside CrewAI Enterprise.[1]

✓ Only framework with true C# / .NET depth ✓ Tightest Azure / M365 integration ✗ Over-engineered for non-Microsoft stacks ✗ Smaller Python community vs LangGraph/CrewAI
School 7 of 7

Niche Schools — One Constraint, Done Best

These frameworks are not competing for the general-purpose slot. Each solves one constraint better than anything else in the landscape.

Framework Niche Philosophy When to Pick It
Smolagents [16] 27.7k Code-as-action: the action primitive is generated Python — agents write and execute code rather than calling structured tool functions. 40 lines for a ReAct agent vs 120 in LangGraph.[17] Local LLMs, code-generation-heavy tasks, minimal framework overhead
Pydantic AI [18] 17.5k Type-safe pipeline: FastAPI-style ergonomics — strict types, dependency injection, schema-validated structured outputs throughout the agent lifecycle.[19] Apps where output format failures are unacceptable; teams already fluent in FastAPI patterns
Mastra [20] 24.7k TypeScript-native: Zod schemas flow end-to-end (tool input → structured output → workflow state → API response). Clean separation: agents decide, workflows orchestrate.[21] Web-stack TypeScript teams; Next.js + AI agent integration without touching Python
LlamaIndex [22] 49.9k Data-first: built from the ground up to connect LLMs to external data. Agent capabilities are layered on top of the strongest retrieval/indexing foundation in the ecosystem. Agents whose primary job is reasoning over private/indexed data corpora

The Convergence Layer

All six primary frameworks now ship MCP (Model Context Protocol), streaming, persistence, and observability as table-stakes features.[1] ReAct (Reason + Act) is the default execution loop across all of them. The meaningful differences live above this layer — in the coordination model, the control surface the developer touches, and the ecosystem fit.

RBAC, audit logging, and enterprise SSO have moved from premium features to defaults across the major platforms in 2026. The notable outlier: Strands delegates governance to AWS infrastructure (IAM, VPC, CloudTrail) rather than embedding it in the framework — a philosophically consistent choice for its model-driven, minimal-ceremony school.

⚠ Harness selection has performance consequences, not just ergonomic ones. The same Claude Opus 4 model scores 64.9% inside one agent scaffolding and 57.6% inside another on the same benchmark task.[23] Framework selection is not a neutral infrastructure choice.

Citations · 23 sources

Click the Citations tab to load…