Why Two Dimensions?
Most comparison tables collapse both axes into a single feature list, hiding the risk dimension. A tool can be feature-rich but abandoned; another can be minimal but rock-solid. Evaluating them on the same axis produces misleading rankings — high star counts are particularly unreliable, since they reflect passive appreciation and are distorted by marketing spikes.[2] Research in 2026 identified approximately 4.5 million suspected fake stars on GitHub, further undermining stars as a health proxy.[3]
Axis 1 — Health Metrics
CHAOSS Starter Model
The CHAOSS project[4] (Linux Foundation) publishes implementation-agnostic metrics for measuring open-source community health, maintained in github.com/chaoss/metrics ⭐ 182.[5] Its Starter Project Health model covers four foundational metrics designed to be quick to collect without specialist tooling:[6]
| Metric | Category | What It Signals | Red flag if… |
|---|---|---|---|
| Time to First Response | Responsiveness | Median time from issue/PR open to first maintainer reply | >14 days median |
| Change Request Closure Ratio | Efficiency | Ratio of closed PRs to total PRs in a rolling 90-day window | <50% closed |
| Contributor Absence Factor | Sustainability | Fewest people whose combined commits account for 50% of all commits (bus factor) | = 1 (single-author) |
| Release Frequency | Delivery | Cadence of point releases and bug fixes | Last release >6 months ago |
GitHub's OSPO recommends additionally tracking open vs. closed issue counts and PR merge ratios over time to detect growing maintainer backlog.[7]
Security Health: OpenSSF Scorecard & Criticality Score
OpenSSF Scorecard ⭐ 5.5k runs 20 automated checks scored 0–10, covering: Branch-Protection, CI-Tests, Code-Review, Contributors, Dependency-Update-Tool, Fuzzing, License, Maintained, Pinned-Dependencies, SAST, SBOM, Security-Policy, Signed-Releases, Token-Permissions, and Vulnerabilities.[8] Each check has a risk weight; scores below 5/10 aggregate are a yellow flag, below 3 a red flag. A weekly scan of the 1 million most-critical open-source projects is published as a BigQuery public dataset.[9]
The companion OpenSSF Criticality Score ⭐ 1.4k rates a project's ecosystem importance on 0→1, factoring in dependent count, commit frequency, and contributor breadth — useful for prioritising which projects to health-check first.[10]
Axis 2 — Feature Rubric
Score each dimension 1–5; multiply by weight before summing. Set weights before scoring to avoid post-hoc rationalisation.[11]
| Dimension | Weight | What to Assess |
|---|---|---|
| Reliability | High ×3 | Published SLA, 12-month uptime history, incident post-mortems, graceful degradation under load |
| Security | High ×3 | SOC 2 / ISO 27001 / HIPAA, SSO/MFA, audit logs, encryption standards, data residency controls |
| Cost Predictability | High ×3 | Transparent pricing page, overage behaviour, billing alerts, whether the free tier creates expensive habits at scale |
| Observability | Medium ×2 | Exposed metrics/health endpoints, actionable error messages, usage dashboards, programmatic API access to operational data |
| Lock-in & Exit | Medium ×2 | Data export in standard formats, open API standards, published migration docs, community-maintained alternatives |
| Team Fit | Medium ×2 | Onboarding time, documentation quality, community responsiveness, hiring pool alignment, CI/CD and workflow integration |
For AI-era tools, classify candidates as human-facing, agent-facing, or both before scoring — agent-mode tools need an additional eval on task-completion rate, tool-call accuracy, and latency, not just inline-suggestion quality.[11]
Health Measurement Tooling (Open Source)
| Tool | Type | Data Sources | Strength | Limitation |
|---|---|---|---|---|
| Augur ⭐ 693[12] | Python lib + REST API | GitHub, GitLab | Scales to 10k+ repos; raw SQL for custom research | No built-in visualisation; GitHub / GitLab only |
| GrimoireLab[13] | Modular self-hosted platform | Git, GitHub, GitLab, Gerrit, Slack, IRC, Discourse, JIRA, Redmine, Jenkins, DockerHub | Multi-source; best-in-class Kibana visualisations; contributor identity dedup | Self-hosted setup complexity |
| OSS Compass[14] | SaaS (hosted) | GitHub, Gitee | Zero setup; GrimoireLab backend; CHAOSS-aligned metrics models | Hosted in China; Gitee-centric emphasis |
| OSSInsight ⭐ 2.4k[15] | SaaS + natural-language query | GitHub (10B+ events) | NL queries over GitHub history; rankings; trend charts; no setup | GitHub only; no GitLab / Gitee |
| OpenSSF Scorecard ⭐ 5.5k[9] | CLI + REST API | GitHub | Security-focused; 20 automated checks; BigQuery public dataset for top-1M repos | Security only — no community or activity metrics |
Applied: AI Coding Tools (June 2026)
These are commercial tools — CHAOSS metrics don't apply directly. Health is proxied by release cadence, company backing, open-ecosystem signals, and pricing model stability. Six tools compared across the market's most differentiating capabilities.[16]
Feature Matrix
| Feature | Claude Code | GitHub Copilot | Cursor | Kiro (AWS) | Google Antigravity |
|---|---|---|---|---|---|
| Agentic / autonomous mode | ✓ | ✓ | ✓ | ✓ | ✓ |
| Terminal / CLI integration | ✓ native | ✓ limited | ✓ | ✓ | ✓ |
| Background / parallel agents | ✓ | ✓ | ✓ | ✗ | ✓ scheduled |
| MCP support | ✓ | ✓ | ✓ | partial | ✗ |
| Multi-model choice | ✗ Anthropic only | ✓ OpenAI/Anthropic/Google/xAI | ✓ | ✗ Amazon only | ✗ Google only |
| Built-in browser | ✗ | ✗ | ✗ | ✗ | ✓ Chromium |
| Spec-driven development | ✗ | ✗ | ✗ | ✓ first-class | ✗ |
| Hooks / event automation | ✓ | ✗ | ✗ | ✓ event-driven | ✓ scheduled |
| IDE support breadth | VS Code + JetBrains | VS Code, JetBrains, Neovim, Xcode, Eclipse, Zed… | VS Code fork only | VS Code + JetBrains | limited |
| Security certification | HIPAA-ready | IP indemnity | SOC 2 Type 2 | AWS compliance stack | Google-backed |
Health Proxy Matrix
| Tool | Backing | Public Repo / Stars | Release Cadence | Pricing Stability | Exit Path |
|---|---|---|---|---|---|
| Claude Code | Anthropic | anthropics/claude-code ⭐ 131k[19] | Weekly | Usage caps; plan tiers volatile | SDK-based; some portability via MCP |
| GitHub Copilot | Microsoft | Proprietary | Continuous | Flat monthly; free tier stable | Multi-IDE; config portable |
| Cursor | Anysphere | Proprietary | Bi-weekly | Tiered; historically volatile | IDE fork lock-in |
| Kiro | Amazon / AWS | Proprietary | Monthly | AWS enterprise support | Spec files are portable YAML |
| Google Antigravity | Google DeepMind | Proprietary | Rapid / continuous | Early-stage; pricing unstable | Deep Google ecosystem coupling |
Building Your Own Matrix
- Classify tool type first. Open-source → run full CHAOSS Starter Model + OpenSSF Scorecard.[6] Commercial → use proxy health table above. Agent tool → add task-completion rate and latency to the feature rubric.[11]
- Assign weights before scoring. High/Medium/Low per feature dimension; locks in priorities before candidates bias your judgement.
- Apply the red-flag rule before totalling. A tool that fails a critical dimension (Reliability, Security, Cost Predictability) is disqualified — don't let other high scores rescue it.[1]
- Run OpenSSF Scorecard on every open-source candidate. Aggregate score <5/10 is a yellow flag; <3/10 is a red flag.[9]
- Triangulate health with OSSInsight for trend data — commit velocity, contributor growth, fork rate — to distinguish an active project from a stale one with legacy stars.[15]
- Check the OpenSSF Criticality Score for any project you plan to deeply integrate — low-criticality projects (<0.5) have thinner dependency safety nets.[10]
- Re-evaluate annually. Health degrades silently; a once-healthy project can hit bus-factor = 1 within 12 months. Stars don't decay, health metrics do.[2]
Reference: OpenSSF Scorecard project page.[20]