The rubric is not a marketing asset. It is a research artifact, and showing the full development process is the credibility anchor:
- v0.1 was embarrassingly simple. We're not hiding that.
- Every major change was driven by external research, not internal decisions. IBM's study, Anthropic's attention paper, AVRS, Augment Code's engine case study.
- We got things wrong and fixed them. We collapsed team sharing into project context files in v0.5 and split it out in v0.7. We buried tool configuration and then separated it in v0.8.
- The field is still moving. v1.0 is not the final version. v1.1 is targeted for the next quarter based on the next round of scoring data and emerging research.
For the leaderboard
A score under v1.0 is auditable and means something specific. Future versions never retroactively change past scores. Users can opt to re-score under any version.
For tool vendors
No tool gets scored, ever. If any vendor ships a feature that genuinely changes what good practice looks like, the rubric updates, regardless of who built it.