long-lived · audit-sensitive

Skill retro collector

One row per observed failure mode across every run of a skill. The value shows up across many runs — a pattern that's invisible in any single retrospective becomes obvious when you filter thousands of observations by category.

Shape

Single flat table

Storage

CSV per skill

Owner

Skill author

Lifetime

Life of the skill

Identity

run_id + row number

Corrections

Append supersession rows

Schema

Column	Meaning
observation_id	Sequential, unique within the logbook.
run_id	Which skill run produced this observation.
skill_name	Denormalized for cross-skill analysis.
timestamp	When observed. Provenance column.
phase	Which phase of the skill: `discover`, `plan`, `act`, `verify`.
observation	One sentence describing what went wrong or could improve.
severity	`nit`, `low`, `medium`, `high`, `blocking`.
category	Clustering label: `scorer-collapse`, `context-loss`, `tool-misuse`, etc.
supersedes	If this row corrects an earlier one, its `observation_id`.

Sample data

observation_id,run_id,skill_name,phase,severity,category,supersedes obs-0041,run-7891,ultra-brainstorming,act,medium,scorer-collapse, obs-0042,run-7891,ultra-brainstorming,act,high,context-loss, obs-0043,run-7892,ultra-brainstorming,plan,low,tool-misuse, obs-0044,run-7893,ultra-brainstorming,act,medium,scorer-collapse, obs-0045,run-7890,ultra-brainstorming,act,nit,scorer-collapse,obs-0029

Common queries

# The pattern invisible in any single retro
filter category=scorer-collapse | count_by phase

# High-severity observations in the last 30 runs
filter severity in (high, blocking)
sort run_id --desc | head 30

# Superseded rows — audit trail preserved
filter supersedes != null

Actions

Trigger skill-improvement agent. If five or more unresolved observations share a category, the retro triggers a focused improvement session on that failure mode. The retro is the work queue; the improvement agent processes and patches results back.

Generate report. Quarterly summary for skill authors: top-5 categories by severity-weighted count, trend over time.

Why supersession, not in-place patching

Retros are audit-sensitive — the record of what was observed should survive revision. Mistakes and clarifications get appended as new rows that reference the original via supersedes; the original stays visible. For refinement-heavy logbooks like backlog shaping, patch in place. For evidence-like logbooks, append and reference.

← Previous example Backlog shaping Next example Eval workspace →