← Default view
DECISION.SCORECARD candidates: 4 axes: 4 scale: 1-5 WEIGHTS Σ = 100% ◆ WEIGHTS LOCKED
Atlas  /  Extending Claude Code (session-3 blueprint)  /  Comparison & decision framework

Four axes. Four candidates.
One locked-before-scoring verdict.

Verdict

Skills wins by 10 pts (460 vs 450). Inside the 2% tie-band — tie-breaker hierarchy kicks in. Both finalists ship runnable live demos, so rung 1 doesn't break it. Plugins as packaging layer nests Skills + MCP servers + hooks, so the honest read of the matrix is: ship the combined session (this expedition's actual call), with Skills as the headline primitive and MCP as the depth chapter.

Harness scored last (300) because session 2 already covered it — series-continuity penalty was the dominant signal. The framework worked: a clear quantitative justification for "don't re-teach last month."

§01

The matrix

1 2 3 4 5
Axis Weight MCP protocol Skills markdown unit Plugins packaging Harness subagents+hooks
Audience fit
Topics land when matched to the group's current pain point, not the presenter's interest [2]
× 35% 5+175 5+175 4+140 3+105
Series continuity
Third-in-a-series sessions earn attendance from arc, not novelty — bridge cleanly from AI/security [3]
× 25% 4+100 5+125 4+100 2+50
Speaker readiness
Authentic, lived material beats researched material; pick what the presenter has actually shipped [4]
× 25% 4+100 4+100 3+75 4+100
Demo viability
A working demo is the highest-retention element of a 60–90 min virtual session. Tie-breaker rung 1.
× 15% 5+75 4+60 3+45 3+45
Weighted total 45090% 460★ TOP — 92% 36072% 30060%
§02

Weighted totals at a glance

0────────────500 max
surveySkills
460 ★
460
surveyMCP
450
450
surveyPlugins
360
360
reconHarness
300
300
§03 · Rejected framework

Why not RICE?

RICE = (Reach × Impact × Confidence) / Effort. Built by Intercom to compare product-feature ideas where reach is measured in "customers per quarter" and effort in "person-months". Both axes degenerate for a fixed-audience, fixed-duration session.

ReachDEGEN  same N attendees regardless of topic
Impactpartial — but already captured by audience-fit axis
Confidencepartial — captured by speaker-readiness axis
EffortDEGEN  bounded to the 90-min slot

Score still computes; stops discriminating. Weighted scoring is the more flexible choice when criteria are domain-specific [7] [6].

§04 · When totals are close

Tie-breaker hierarchy

Apply rungs in order. First rung that yields a clear winner ends the tie.

  1. Runnable live demo > slides-only. The 8-12 min active block is the session's retention spike.
  2. Internal speaker with lived experience > external expert presenting researched material [2].
  3. Topic the group has asked about > topic the organizer thinks they should care about.
§05

The 30-minute process

total budget: ≤30 min
STEP 01
≈ 5 MIN

List 3–5 candidates

Stop at five. The matrix breaks down at high candidate counts — every additional column doubles the discussion overhead.

STEP 02
≈ 8 MIN

Lock weights BEFORE seeing candidates

Agree with co-owners before the candidate list is visible. Prevents gaming scores toward a predetermined pick [3].

STEP 03
≈ 12 MIN

Score 1–5, compute, tie-break

Multiply, sum, rank. Apply the tie-breaker hierarchy if results land within ~5% of each other.

STEP 04
≈ 5 MIN

Sanity-check continuity

Does the winner connect to the prior session in one sentence? If not, re-score continuity — it was probably under-weighted.

§06

Discipline rules

break one → score is theatre
Σ
Weights must sum to 100%

Mechanical check before scoring. If they don't, fix before reading candidates — never adjust after.

≤ 4
Pick ≤4 axes

Eight+ collapses into multi-hour scoring sessions and dilutes weight signal below noise floor.

Never change criteria mid-evaluation

If a new axis seems vital after scoring two candidates, you've found a flaw in step 1 — restart, don't patch [3].

§07

Citations

7 sources · weighted-scoring lineage + session-design