Rubric version history

How the rubric evolved

The Context Management Index did not arrive fully formed. v1.0 is the product of seven internal research iterations over six months. Every major change was driven by external research rather than internal opinions, drawing on IBM's 47% accuracy study, Anthropic's attention paper, the AVRS framework, Augment Code's context engine case study, and 120+ internally scored teams.

This page is the full development log. v0.1 was intentionally simple, and we are not hiding that. Showing the full process is the credibility anchor, and it gives teams context when explaining the rubric to stakeholders.

Versions to v1.0

6 mo

Development span

3 → 8

Dimensions grew

120+

Teams calibrated v1.0

Version timeline

Version	Date	Status	Dim	Headline
v1.0	2026-04	Public	8	First public release. Data-calibrated weights + standardized criteria + full public methodology.
v0.9	2026-03	Internal	8	Added Measurement & Feedback Loops as the 8th dimension
v0.8	2026-02	Internal	7	Added Context Window Optimization; split out Tool Configuration
v0.7	2026-01	Internal	6	Split Team Context Sharing out; introduced team-size-based weighting
v0.5	2025-12	Internal	5	Added Memory & Persistence as a dedicated dimension; introduced the Tool Swap Test
v0.3	2025-11	Internal	4	Added Documentation-as-Context after the IBM 47% accuracy study
v0.1	2025-10	Internal	3	First draft: a binary checklist for context files

Detailed history

Click any version to expand. v1.0 is open by default.

Changes from v0.9

reweightedAll dimensions· calibrated against 120+ teams

renamedMeasurement & Feedback· shortened from 'Measurement & Feedback Loops' for table fit

Why it changed

Correlation analysis of 120+ scored teams from v0.8 and v0.9 internal runs allowed weights to be recalibrated against actual outcome predictors.
Multiple regression of dimension scores against self-reported AI productivity outcomes revealed which dimensions actually predict outcomes at each team size.
Need for a stable, public version to anchor the leaderboard launch.

Weights by team size

Dimension	Indie	Small	Medium	Large
Project Context Files	0.20	0.18	0.14	0.10
Memory & Persistence	0.20	0.14	0.10	0.08
Documentation-as-Context	0.08	0.12	0.14	0.14
Team Context Sharing	0.02	0.10	0.14	0.16
Context Window Optimization	0.10	0.08	0.10	0.10
Code Organization for AI	0.10	0.14	0.12	0.10
Tool Configuration	0.20	0.14	0.10	0.10
Measurement & Feedback	0.10	0.10	0.16	0.22

Research sources

120+ team correlation analysisAll prior version sources retained

Known limitations

120-team sample is enough to detect major patterns but not enough for 2-decimal precision.
No industry vertical analysis; fintech vs. game studio may have different optimal weights.
US/Europe heavy sample; APAC patterns may differ.
Self-reported outcomes carry bias; v1.2+ aims for objective signals (PR cycle time, turnover ratio).

Why this history matters

The rubric is not a marketing asset. It is a research artifact, and showing the full development process is the credibility anchor:

v0.1 was embarrassingly simple. We're not hiding that.
Every major change was driven by external research, not internal decisions. IBM's study, Anthropic's attention paper, AVRS, Augment Code's engine case study.
We got things wrong and fixed them. We collapsed team sharing into project context files in v0.5 and split it out in v0.7. We buried tool configuration and then separated it in v0.8.
The field is still moving. v1.0 is not the final version. v1.1 is targeted for the next quarter based on the next round of scoring data and emerging research.

For the leaderboard

A score under v1.0 is auditable and means something specific. Future versions never retroactively change past scores. Users can opt to re-score under any version.

For tool vendors

No tool gets scored, ever. If any vendor ships a feature that genuinely changes what good practice looks like, the rubric updates, regardless of who built it.

How the rubric evolved

Version timeline

Detailed history

Best in Class

Measurement Closes the Loop

The Attention Window Matters

Teams Are Not Individuals

Memory is a Practice

Documentation Matters

The CLAUDE.md Index

Why this history matters