The Method
The Failures First.
1,031 hours. ~1,015 sessions. Five platforms. Three years.
Not one of them started the same way twice — until I built a system that made them.
This is what working with AI at scale actually taught me. The failures first.
Phase 1: Interrogative. No System.
330+ Perplexity sessions, 45 ChatGPT conversations. Treated AI as a search engine with better answers. No session continuity, no handovers, no decisions log. Each session started cold. The same ground re-covered constantly. No artefacts survived.
No North Star Criteria From Session 1
The PIP build started without a scoring gate for feature decisions. By V2 (Sessions 41+), a North Star 3/5 scoring system was introduced. Every feature decision before that had no formal acceptance criteria. The cost: features built that didn't survive V2 scoping. The fix came 40+ sessions too late.
FDB Built Without a Spec. Error Surfaces at Session 49.
The data foundation phase was built without a written design document. RECONCILIATION_DESIGN.md was written before the reconciliation build (Session 50) — near-perfect implementation followed. The FDB work built without spec produced DQ-001: a data quality error undetected for 38 sessions, caught at Session 49. Design before build wins, consistently. The cost of skipping it is paid later, with interest.
Context Flooding
The instinct was to pre-load everything — every relevant file, every previous decision, every open question — into the session start. The effect: diffuse focus, multi-objective sessions, agent attention diluted across too many concerns simultaneously. Sessions with a single clear goal outperformed multi-objective sessions consistently across all 5 build streams.
Agent Prompt Drift
Hermes profiles and Cortex agent system prompts go stale without active maintenance. The degradation isn't visible until it causes an error — a hallucinated field name, a wrong assumption about schema structure, a stale priority stack driving the wrong decisions. Agent prompts are first-class artefacts. Treat them like code.
Prompt Regression
Good prompts — anti-fabrication protocols, context pre-loading sequences, constraint structures that produced reliable output — were not preserved. They were rebuilt from scratch each time, slightly differently, losing the accumulated calibration. No prompt library. No versioning. The same calibration work repeated across sessions.
The constraint structure, not model quality, is the explanatory variable. Any model, properly constrained, delivers reliably. The same model without constraints hallucinates constantly.
1. Handover-as-start-prompt
Every session closes with a verbatim handover document — a complete, self-contained start prompt for the next session. Copy. Paste. Send. Zero warm-up. Zero re-explanation of context. Zero drift across the session boundary.
2. GUARDRAILS as a Living Document
A GUARDRAILS file loaded at every session open. Not a static rulebook — a living document updated in real time. Every rule traces to a specific incident. New constraint added → the incident that caused it is the reason. The file self-corrects the system.
3. Append-Only Decisions Log
346+ formally documented architectural decisions across 5 build streams. Every decision marked APPROVED, REVERSED, SUPERSEDED, or DEPRECATED. Nothing deleted.
4. Tiered Read List
Every session has a read list: mandatory core (always loaded), conditional reference (loaded if in scope), never re-read (stable archive). Controlling what enters the context window is as important as controlling what the agent does with it.
5. One Objective Per Session
Sessions with a single clear goal delivered more reliably than multi-objective sessions. Not because AI can't handle multiple objectives — because the operator can't effectively review the output across multiple objectives without the session becoming a context management exercise.
6. Documentation Audit Every 20–25 Sessions
One meta-session, no delivery obligations. Review every live document: GUARDRAILS, BACKLOG, DECISIONS_LOG, agent system prompts. Update what's stale. Archive what's done. Align the documentation with the current state of the build.
The Handover Document: The Single Practice That Changed Everything
Copy. Paste. Send. Zero warm-up. 119 consecutive sessions without a hallucination-forced reset.
Why I Keep a GUARDRAILS File — and What Happens When I Don't
A living document that self-corrects the system. Every rule traces to a specific incident.
38 Sessions of Bad Data: The Most Expensive Oversight in the Build
DQ-001 — a data quality error undetected for 38 sessions. The cost of skipping a design document.
One Objective Per Session — What I Stopped Doing That Made Everything Work Better
Multi-objective sessions diffuse focus. Single-objective sessions deliver consistently.