Skill retro collector
One row per observed failure mode across every run of a skill. The value shows up across many runs — a pattern that's invisible in any single retrospective becomes obvious when you filter thousands of observations by category.
run_id + row numberSchema
| Column | Meaning |
|---|---|
| observation_id | Sequential, unique within the logbook. |
| run_id | Which skill run produced this observation. |
| skill_name | Denormalized for cross-skill analysis. |
| timestamp | When observed. Provenance column. |
| phase | Which phase of the skill: discover, plan, act, verify. |
| observation | One sentence describing what went wrong or could improve. |
| severity | nit, low, medium, high, blocking. |
| category | Clustering label: scorer-collapse, context-loss, tool-misuse, etc. |
| supersedes | If this row corrects an earlier one, its observation_id. |
Sample data
Common queries
# The pattern invisible in any single retro
filter category=scorer-collapse | count_by phase
# High-severity observations in the last 30 runs
filter severity in (high, blocking)
sort run_id --desc | head 30
# Superseded rows — audit trail preserved
filter supersedes != null
Actions
Trigger skill-improvement agent. If five or more unresolved observations share a category, the retro triggers a focused improvement session on that failure mode. The retro is the work queue; the improvement agent processes and patches results back.
Generate report. Quarterly summary for skill authors: top-5 categories by severity-weighted count, trend over time.
Retros are audit-sensitive — the record of what was observed should survive revision. Mistakes and clarifications get appended as new rows that reference the original via supersedes; the original stays visible. For refinement-heavy logbooks like backlog shaping, patch in place. For evidence-like logbooks, append and reference.