Failure Modes

What can go wrong, and which skill catches it.

Every skill in Deliberate exists to counter something concrete. This page is the reverse index: start from a failure mode you recognize, jump to the skill that addresses it.

33 modes · 10 categories · most drawn from Karpathy's thread.

Thinking

4 modes

Silent Assumption

Agent makes a wrong assumption on your behalf and runs with it without surfacing or checking.

Countered by /deliberate

Hidden Confusion

Agent is confused but masks it with confident-sounding output instead of asking.

Countered by /deliberate

Sycophancy

"Great idea!" replacing "here are the problems with this idea." Agreement that's wrong is worse than disagreement that's right.

Countered by /deliberate

Jump Straight to Code

Multi-step task executed without a plan, leading to rework when the approach was wrong.

Countered by /deliberate

Writing

4 modes

Overcomplication

1000 lines where 100 would do. Abstractions for a single caller. Frameworks in place of functions.

Countered by /deliberate

Scope Creep

Orthogonal edits smuggled into a diff. "While I was in there" changes that weren't asked for.

Countered by /deliberate

Orphaned Code

Dead imports, unused helpers, commented-out blocks left behind by a change.

Countered by /deliberate

Premature Abstraction

Strategy pattern for a single type. Parameterizing what only ever has one value.

Countered by /architect

Execution

3 modes

Runaway Loop

Thirty minutes of the same failed approach, with increasingly elaborate workarounds.

Countered by /deliberate

Confidence Without Calibration

Authoritative tone for a guess. No signal of "I've verified this" vs. "I'm pattern-matching."

Countered by /deliberate

Context Drift

Decisions from earlier in the session silently reversed 100 messages later.

Countered by /deliberate

Product

3 modes

Spec Invention

Reverse-engineering product intent from code, then building on those guesses.

Countered by /spec

Silent Requirement Fill

PRD is vague, agent picks an interpretation without flagging, ships the wrong feature.

Countered by /spec

Spec-Code Drift

Implementation quietly diverges from the PRD. Neither gets updated to match.

Countered by /spec

Architecture

3 modes

Boundary Blur

Business logic in the controller. Utils that know about both the UI and the database.

Countered by /architect

Silent Contract Break

API or schema changed without considering unknown callers. Migration plan = none.

Countered by /architect

Ambiguous State Ownership

Mutable state with no clear owner and no single source of truth.

Countered by /architect

Testing

4 modes

Over-Mocking

Tests that only talk to mocks. Green CI, broken production.

Countered by /test

Tautological Test

Test that mirrors the implementation. Refactor that preserves behavior breaks the test.

Countered by /test

Flaky Test

Intermittent failure tolerated via retry-until-green instead of fixed.

Countered by /test

Coverage Theater

Tests written to hit a number, not to protect a behavior.

Countered by /test

Migration

3 modes

Big-Bang Migration

All callers updated in one PR. Can't deploy incrementally. Can't roll back.

Countered by /migrate

Half-Finished Migration

Old path and new path coexist forever. Two sources of truth. No trigger to finish.

Countered by /migrate

Unsafe Schema Change

Lock-holding DDL run on a live table at peak traffic.

Countered by /migrate

Debugging

3 modes

Symptom Masking

try/catch to swallow an unexplained error. Null check without knowing why it was null.

Countered by /debug

Random-Walk Debugging

Change something, run, repeat. No hypothesis, no bisection, no explanation when it works.

Countered by /debug

Fix the Instance, Not the Class

Same bug in three places, patched once, mentioned nowhere.

Countered by /debug

Review

3 modes

Rubber-Stamp LGTM

Approval on code the reviewer didn't understand or didn't read.

Countered by /review

Nit-Only Review

Forty comments on formatting, zero on correctness or architecture.

Countered by /review

Missing Test Ignored

New code path ships with no test. Reviewer doesn't notice absence.

Countered by /review

Incident

3 modes

Shotgun Fix

Five changes in one minute under pressure. Can't tell which one fixed it (or caused the next incident).

Countered by /incident

Lost Evidence

Stabilization destroys the logs/state needed for root cause. No postmortem possible.

Countered by /incident

Vague Postmortem

"We'll add better monitoring" with no owner, no deadline, no mechanism. Action items that rot.

Countered by /incident