Research agents and deep-research tools in 2026 — what to steal for Scout — Grid

199-biotech / claude-deep-research-skill

github.com/199-biotechnologies

⭐ 509

Claude Skill MIT

8-phase pipeline: scope → plan → retrieve → triangulate → outline → synthesize → critique → refine → package. Disk-persisted citations survive context compaction.^[4]

Depth mechanism

8 phases · critique loop-back · auto-continue past 18k words

Citation rigor

Disk-persisted · DOI/URL hallucination check

VERY HIGH Same substrate as Scout — directly portable.

StealDisk-persisted citations + multi-persona critique + validate→fix→retry max 3 cycles.

509 stars · MITgithub.com/199-biotech ↗

GPT-Researcher

github.com/assafelovic

⭐ 26.6k

OSS Apache-2.0

The closest conceptual sibling to Scout. Three-role split: planner / executors / publisher. 2026 additions: tree-shaped Deep Research mode, ~5 min runs at ~$0.40 on o3-mini.^[2]

Depth mechanism

Planner+executor+publisher · tree DR mode · ~5 min

Citation rigor

Inline · 20+ sources per run

HIGH Closest analogue — production-ready, since May 2023.

StealThree-role separation. Final writer sees only the evidence bundle — not the noisy search trajectory.

26.6k stars · Apache-2.0github.com/assafelovic ↗

open_deep_research

github.com/langchain-ai

⭐ 11.2k

OSS MIT

MCP-native, model-agnostic via init_chat_model(). LangGraph-based. #6 on Deep Research Bench with 0.4943 RACE on GPT-5.^[3]

Depth mechanism

LangGraph · any MCP server pluggable

Bench score

RACE 0.4943 · #6 on Deep Research Bench

HIGH Most "configurable" of the OSS options — closest to a reference impl.

StealMCP-server-as-tool. Swap Tavily / SearXNG / Exa as config, not code.

11.2k stars · MITgithub.com/langchain-ai ↗

Weizhena / Deep-Research-skills

github.com/Weizhena

⭐ 483

Claude Skill MIT

Two phases: outline generation (user can expand it), then deep investigation per item in parallel. HITL checkpoints — approve outline before spending tokens.^[18]

Depth mechanism

Outline → parallel deep investigations

HITL

Approve outline before token spend

HIGH HITL pattern translates to Scout's "self-review the outline first."

StealApprove-outline-before-investigation. Scout has expedition plans — tighten the gate.

483 stars · MITgithub.com/Weizhena ↗

Claude Managed Agents

platform.claude.com · 2026-04-01

BETA

API commercial

Hosted agent harness behind managed-agents-2026-04-01 header — sandbox, Bash, file ops, web search/fetch, MCP servers. The Environment / Session / Events model maps cleanly onto "one research run = one Session."^[12]

Depth mechanism

Harness-defined · Environment/Session/Events

Citation rigor

Depends on system prompt

HIGH Hosting target if Scout outgrows GitHub Actions.

StealMigration path — Environment / Session / Events maps 1:1 onto research-per-run.

public beta · Apr 2026platform.claude.com ↗

STORM / Co-STORM

github.com/stanford-oval

⭐ 28.1k

OSS MIT

Different shape: simulates conversations between writers with different perspectives + a topic-expert LLM grounded in web sources, then builds the outline from the transcript. +10% absolute coverage, +25% organization vs outline-then-RAG.^[14]

Depth mechanism

Persona-guided Q&A · Co-STORM HITL turns

Coverage gain

+10% absolute · +25% organization

MEDIUM Long-form only · authors warn: not publication-ready.

StealPersona-guided Q&A — basis for a future scout-researcher-perspectives specialist.

28.1k stars · MITgithub.com/stanford-oval ↗

Perplexity Sonar Deep Research

perplexity.ai · API

94.3% cite

SaaS · API commercial

Only one of the "big five" with a production API. Fastest commercial option (2–3 min runs). ~$0.41 per typical query.^[8]

Pricing

$2/$8 per M tok + $5/1k searches

Citation rigor

94.3% Sonar Pro vs ~87% GPT-5.2 DR^[6]

MEDIUM API-driven — but opaque. No on-disk artifact.

StealOptional drop-in: delegate the search+synth step from Scout if cost permits.

~$0.41/querypricing ↗

local-deep-researcher

github.com/langchain-ai

offline

OSS MIT

Fully local: any Ollama- or LMStudio-hosted model, SearXNG search, nothing leaves the machine. Loop: query → search → summarize → reflect for gaps → next query, for N cycles.^[17]

Depth mechanism

Reflect-and-requery loop · user-set cycles

Citation rigor

Inline markdown sources

MEDIUM Reference for Scout's offline mode if that's ever on the table.

StealReflect-and-requery loop. Cleaner than today's ad-hoc "did I miss anything" check.

MIT · local-onlygithub.com/langchain-ai ↗

smolagents · Open Deep Research

github.com/huggingface

⭐ 26.8k

OSS Apache-2.0

Agents emit Python code instead of JSON tool calls — ~30% fewer steps. 55.15% on GAIA vs OpenAI DR's 67.36%. Known context-window blow-ups; demo unstable.^[16]

Depth mechanism

Code-agent · multimodal state handling

GAIA score

55.15% · −12pt vs OpenAI DR

LOW Proof-of-concept. Production gap is real — browser tooling and vision.

StealCode-emission idea, not the agent itself. Less JSON ceremony.

26.8k stars · Apache-2.0github.com/huggingface ↗

OpenAI Deep Research

openai.com · ChatGPT Pro

26.6% HLE

SaaS commercial

o3-tuned agent. Longest runs (15–25 min) and the most essay-like reports. 26.6% on Humanity's Last Exam at launch — frontier in its cohort.^[7]

Run length

15–25 min^[6]

Citation rigor

87% cited accuracy

LOW Closed · no on-disk artifact · no steering hints.

StealNothing portable. Reference frontier numbers only.

ChatGPT Pro/Plusopenai.com ↗

Claude Research / Advanced Research

anthropic.com · Claude Pro/Max

45 min

SaaS commercial

"Advanced Research" runs up to 45 min across hundreds of sources autonomously. Anthropic renamed the Claude Code SDK to Claude Agent SDK — internal use was dominated by research, video, note-taking, not just coding.^[11]^[21]

Run length

Up to 45 min · hundreds of sources

Citation rigor

Closed metric

LOW Not programmable as artifact. But: Claude-Code-as-research-platform is the sanctioned pattern.

StealValidation that Scout's substrate choice is right. No code to lift.

Pro / Max tieranthropic.com ↗

Gemini Deep Research

google.com · AI Pro

$19.99/mo

SaaS commercial

Differentiates on Workspace: pulls from Gmail, Drive, and the public web simultaneously and drops multi-page reports back into Docs.^[10]

Pricing

Google AI Pro $19.99/mo^[9]

Differentiator

Gmail/Drive/Docs round-trip

LOW Workspace-locked. Useless for a Scout-style standalone artifact.

StealNothing portable.

Workspace-lockedengagecoders ↗

Grok DeepSearch

x.ai · X Premium

X-native

SaaS commercial

Real-time X / web synthesis. Direct X-timeline access is the one thing competitors can't match — only interesting when the topic is breaking news or social sentiment.^[13]

Differentiator

Real-time X/Twitter timeline access

API

None relevant to Scout

LOW X-centric · breaking-news only.

StealNothing portable.

X Premiumtryprofound ↗

Elicit

elicit.com · academic

99.4% extr.

SaaS commercial

Best-in-class for academic literature. 138M papers + 545K clinical trials. 80% time-saved on abstract screening with quote-level rationale per decision. Systematic Review reports cap at 80 papers.^[19]

Coverage

PubMed · ClinicalTrials.gov · 138M papers

Cap

SR reports max 80 papers

LOW PubMed-only — not Scout's brief.

StealPer-claim rationale + source quote. Quality scoring not binary in/out.

academic SaaSelicit.com ↗

FutureHouse · Crow / Falcon / Owl / Phoenix

futurehouse.org · science

science

SaaS · API commercial

Scientific-discovery platform built on Claude^[23]. Task-specialized: Crow extracts genes/markers, Falcon does background, Owl checks if a hypothesis was already investigated, Phoenix designs chemistry.^[20]

Pattern

One agent per epistemic move

Caveat

Phoenix not as deeply benchmarked

LOW Science-only. Pattern is transferable; product isn't.

StealOne agent per epistemic move with named roles — Scout already does this implicitly via Explore sub-agents.

science platformfuturehouse.org ↗

Deep-research agents in 2026 — what to steal for Scout.

TL;DR

199-biotech / claude-deep-research-skill

GPT-Researcher

open_deep_research

Weizhena / Deep-Research-skills

Claude Managed Agents

STORM / Co-STORM

Perplexity Sonar Deep Research

local-deep-researcher

smolagents · Open Deep Research

OpenAI Deep Research

Claude Research / Advanced Research

Gemini Deep Research

Grok DeepSearch

Elicit

FutureHouse · Crow / Falcon / Owl / Phoenix

// BENCHMARKS — what "good" means in 2026

// PRODUCTION-READY vs EXPERIMENTAL

Production-ready today

Experimental / proof-of-concept

// IDEAS TO STEAL — RANKED

Disk-persisted citations

Reflect-and-requery

Multi-persona critique

Perspective-guided outline

Per-source credibility tag

Publisher role split

Auto-continuation past 18k

Fetch today's date first

MCP-first search backend

Future hosting target