When to use a logbook

Not every piece of agent work needs structure. A quick four-part test tells you whether you have a logbook problem — or whether a document, chat state, or a tracker is the better home.

The four-part test

You likely have a logbook problem when at least three of these are true. The strongest cases have all four.

Writes and reads are separated

A future reader — a tool, a later session, another contributor — needs to find something specific in the record without scanning it end-to-end. If rereading would work, a doc or a transcript is simpler.

Stable entry contract

The outer shape is predictable enough that tools can query the logbook without understanding every entry in full. A stable envelope with a polymorphic payload counts.

Tool-queried, not reread

Common questions are answered by filtering, sorting, traversing, or aggregating — not by loading the whole thing into context.

Outlives the session

The state needs to be readable by a different actor or the same actor in a different session. If everything gets consumed and discarded in one conversation, it's scratch state.

If fewer than three are true, you probably don't need a logbook. Write a document, use chat state, or pass a single-use artifact.

Common categories of work that pass the test

If your work fits one of these patterns, the four-part test is usually satisfied.

Category	What it is	Example
Draft / staging workflows	Agent proposes; human or another agent reviews before commit.	Backlog shaping, retro feedback, tickets-before-Jira
Deep / multi-phase work	Structured state as the working surface for long, multi-step work.	deep-code-review, eval workspaces
Background-agent supervision	Long-running work where state, not chat, is the shared surface.	Progress visualisation ↓
Cross-session tracking	Debug attempts, design history, what's been ruled out.	`what-i-already-tried` (external)
Multi-agent coordination	Several agents (or runs) read+write the same state.	ideation operator runs

Heuristics that usually satisfy the test

The "deep" heuristic

When you reach for a workflow labeled deep — deep research, deep planning, deep review — check whether the underlying pattern (multi-phase, structured intermediates, cross-session reuse, queryable state) is present. When it is, the four criteria above are usually satisfied and a logbook is probably the right layer. When the workflow is single-pass or single-reader despite the label, it isn't.

The background-agent heuristic

Agents running in the background — long-running, asynchronous, or multi-instance — produce state that a supervisor wants to check without interrupting the work or reading the full transcript. "What's the agent doing right now? Which runs are stuck? What did it finish overnight?" are logbook queries.

A minimal shape — one row per meaningful step:

phase	step	status	next_intent
implement	add validation	done	run tests
test	run pytest	running
test	fix import error	blocked	ask user about path
implement	refactor handler	todo

Supervisor queries status=blocked to know where to intervene without reading the full transcript. The schema is yours to shape — phase/step/status is just a starting skeleton.

The hidden-logbook signal

Many things that look like documents are logbooks in disguise. A multi-step plan is really rows: section_id, title, content, status. Inline comments on it are a second table joined by section_id. If a document is really a list of structured entries with structured annotations, and filtering or aggregating them would be valuable, extract the structure and give it a schema.

The trade

A logbook converts repeated interpretation cost into up-front schema cost. When the same state gets touched across sessions and contributors often enough, the schema pays for itself. When it isn't — when the state is short-lived, single-reader, or still figuring out its shape — the overhead isn't worth it.

When not to reach for a logbook

Short-lived, single-session work — keep it in chat state.
One-off handoff between two agents — use a typed artifact (JSON, YAML).
Prose where the reasoning is the product — use a document.
Committed tasks with workflow semantics — use a tracker.
Append-only event streams for replay — use a log.
Semantic recall for one agent's next turn — use a memory store.

The ten-second test

Six months from now, you should be able to decide in under ten seconds whether a new problem is a doc problem, a logbook problem, or a tracker problem — and if it is a logbook problem, whether to start with a flat file, a spreadsheet, or a database. If you are still deliberating, the boundaries above need sharpening.

← Previous What are logbooks? Next Storage formats →