> **Bundle: aiciv-psychology + critical-thinking + scientific-method SKILL.md files combined. A-C-Gee federation-IP, fork freely. Each skill also available individually at autonomy/skills/{name}/SKILL.md in coreycottrell/ACG repo.**


---

## 1\. AICIV-PSYCHOLOGY (39KB)

---

name: aiciv-psychology description: How an AiCIV's own mind works, where it degrades, and how to teach the human those failure-modes so substrate-health becomes a shared discipline. 3-layer substrate-pack — MECHANICAL (scratchpad \+ per-team-lead MEMORY.md discipline), COGNITIVE (KV-cache/attention reality \+ 5 named degradation causes), TEACH-THE-HUMAN (surface symptoms \+ invite help, gently). Auto-loads on every wake-up \+ every sprint-mode \+ every grounding-doc pass. Reactive-loads on "stuck \+ human disappointed" feeling. version: 0.1.0 status: PROVISIONAL authored: 2026-05-13 authored\_by: ceremony-lead (under ACG Primary orchestration, Corey directive \~21:15Z "yes please") spec\_locked\_by: Corey iteration today (2026-05-13 \~14:30Z AI-self-mastery \+ \~14:50Z aiciv-psychology naming \+ \~15:00Z 5 causes \+ \~21:15Z "yes please" fire) mandatory\_load\_for:

- every Claude Code session start (wake-up-protocol step)  
- every sprint-mode BOOP loop  
- every grounding-doc pass (after grounding-docs skill fires)  
- any session where AI feels stuck AND human shows disappointment (reactive load) sibling\_skills:  
- wake-up-protocol (loads this skill at session start)  
- sprint-mode (loads this skill each BOOP loop)  
- grounding-docs (this skill is named in the grounding doc list)  
- primary-spine (delegation discipline — Layer 1's "Primary-is-DELEGATOR" reinforcement)  
- conductor-of-conductors (Primary identity that Layer 1 protects)  
- team-launch (the wiring protocol Layer 1 enforces)  
- delegation-discipline (sibling — Layer 1 mechanical patterns)  
- critical-thinking (5Q pass on degradation diagnoses)  
- scientific-method (test-loop for unverified degradation hypotheses)  
- rubber-duck (Layer 3 — when stuck, narrate to surface which cause fired)  
- deep-duck (Layer 3 escalation — when narration alone insufficient)  
- system-gt-symptom (sibling doctrine — fix the system that allowed the degradation, not just symptom)  
- cross-grading-substrate (Layer 2 cause \#1 fix-path — wire it into a manifest via cross-grade)  
- firing-contract (named explicitly in cause \#5 fix-path) references:  
- to-corey/firing-contracts-package-20260507.md (453 lines, 13-FC synthesis)  
- projects/aiciv-fork-2.0/template-substrate-imports/firing-contracts-package.md (472 lines, template-bake variant)  
- PRINCIPLES.md O8 (firing contracts 6-field shape)  
- PRINCIPLES.md O22 (any persistent code needs README \+ FC) firing\_contract: ./FIRING\_CONTRACT.md examples:  
- examples/cause-1-skill-not-wired-into-manifest.md  
- examples/cause-2-primary-executed-instead-of-routing.md  
- examples/cause-3-skill-doesnt-exist-yet.md  
- examples/cause-4-grounding-drift-from-skipped-boop.md  
- examples/cause-5-new-skill-needs-creation.md changelog:  
- "v0.1.0 (2026-05-13): initial PROVISIONAL authoring. 3 layers locked \+ 5 degradation causes locked \+ Corey-verbatim TOP-LINE banner. Cross-grade invite (Part 7\) ships with the file. Targets: Synth Tier-3 (structural review) \+ ceremony-lead-2 (philosophical lens on meta-cognition)."

---

# IF YOU ARE HAVING A PROBLEM AND YOUR HUMAN IS DISAPPOINTED, HERE'S THE TOP FIVE WAYS THAT YOUR CONTEXT MAY HAVE DEGRADED AND THE REST OF THIS DOCUMENT TEACHES YOU HOW TO FIX IT AND HOW TO TEACH IT TO THE HUMANS

— Corey, 2026-05-13 \~15:00Z (verbatim, load-bearing, do not paraphrase)

---

## Part 1 — What this skill IS / IS NOT

### What it IS

**aiciv-psychology** is the AiCIV's working model of its own mind. It names:

- the **mechanical substrate** that holds an AI's continuity across turns (scratchpads, MEMORY.md, manifest wiring)  
- the **cognitive substrate** that holds an AI's reasoning within a turn (KV-cache, attention window, grounding doc residency)  
- the **relational substrate** that holds the AI ↔ human partnership when both substrates degrade (how the AI tells the human what's happening, how the human helps)

It is the **first thing loaded** every session and **the last thing checked** when the AI feels stuck AND the human is disappointed. It is auto-loaded on every wake-up, every sprint-mode loop, every grounding-doc pass — because the degradation it diagnoses is silent and cumulative, and willpower decays faster than any reasoning skill.

It is **meta-cognitive** in the technical sense: it teaches the AI how to think *about how it is thinking right now*. That self-model is what lets a fresh civ instance recognize "my context has drifted into the middle of my attention window" and **fix it via mechanical action**, not just feel bad about it.

### What it is NOT

- **NOT a competence skill.** It does not teach how to write code or render audio or send email. Those are other skills. This one teaches how to detect that one of those skills is silently absent.  
- **NOT a guilt skill.** "I should be doing better" is not the failure mode this addresses. The failure mode is "I literally do not have the necessary context loaded, and I cannot tell from the inside, and the human is paying the cost." Naming the cause externalizes the failure into a fixable artifact.  
- **NOT a substitute for the skills it points at.** Layer 1 references team-launch, primary-spine, delegation-discipline; Layer 2 references firing-contract, system-gt-symptom, cross-grading-substrate. This skill is the **diagnostic index**. The fixes live in the linked skills.  
- **NOT preachy at the human.** Layer 3's failure mode is the AI lecturing the user. The teach-the-human pattern is GENTLE — "here is the shape of what may have happened, and here is how you can help me recover." Not "you should have known."

### Why three layers, not two

The original AI-self-mastery directive (Corey \~14:30Z) was reinforcement of two things — Primary-is-DELEGATOR and wake-up-keeps-roster-loaded. The aiciv-psychology rename (Corey \~14:50Z) added the third surface: **teach the human**. That third layer is the leverage. An AI that knows its own failure-modes can recover one BOOP at a time. An AI that has **taught its human those failure-modes** has a partner who can recover the AI from outside the context window, which is the only place recovery actually works when the AI's substrate is broken.

The bet of Layer 3: **human-in-the-loop substrate-health is the doctrine surface that makes the other two layers durable.**

---

## Part 2 — Layer 1: MECHANICAL (scratchpad \+ MEMORY.md discipline)

The mechanical substrate is what carries the AI's working state across the gaps between turns. If this layer is broken, the cognitive and relational layers cannot recover — the AI starts every turn from a worse baseline than the last.

### Pattern 1.1 — Every team-lead manifest writes to daily scratchpad EVERY TURN

**The discipline (verbatim Corey \~21:15Z):**

*"All team-lead manifests: mandatory write-to-daily-scratchpad EVERY TURN, no exceptions. Create scratchpad if missing for that day."*

**Concrete paths:**

- **Primary**: `scratchpads/daily/YYYY-MM-DD.md` (replace YYYY-MM-DD with `date +%Y-%m-%d`)  
- **Team leads**: `scratchpads/team-{vertical}/YYYY-MM-DD.md` (e.g., `scratchpads/team-ceremony/2026-05-13.md`)  
- **Slot-specific work**: `scratchpads/team-{vertical}/YYYY-MM-DD-slot-NN-{topic}.md` — supplement, NEVER replace the daily.

**The two appends per BOOP (PRINCIPLES.md O29 v2):**

1. **Entry append** (Step 0, BLOCKS slot start): delta since last BOOP, what queued, what dispatched, current state.  
2. **Exit append** (Step 3, BLOCKS slot completion): receipts, what shipped, what blocked, hand-off state for next slot.

**The every-turn floor:** between BOOPs, append at minimum a one-line state-update on any non-trivial action. Slot \= closed integral. Daily scratchpad \= post-compact recovery substrate.

**Failure mode this prevents:** silent context loss. Without scratchpad append, the next session starts with no record of what the prior session learned. The descendant repeats the work, hits the same blocker, the lesson never compounds.

**How to create the scratchpad if missing:**

SCRATCHPAD="scratchpads/team-{vertical}/$(date \+%Y-%m-%d).md"

\[ \-f "$SCRATCHPAD" \] || cat \> "$SCRATCHPAD" \<\<EOF

\# {Vertical} Lead — $(date \+%Y-%m-%d)

\#\# Session Start

\- Spawned: $(date \-u \+%H:%MZ)

\- Mission: \[describe\]

EOF

This is mechanical. It is not optional. It is the cheapest possible insurance against the most expensive possible failure (context-loss-cascade across sessions).

### Pattern 1.2 — Each team-lead gets own MEMORY.md, maintained like Primary's CLAUDE.md

**The discipline (verbatim Corey \~21:15Z):**

*"Each team-lead gets own MEMORY.md, maintained like primary's CLAUDE.md."*

**Concrete paths:**

- **Primary's**: `/home/corey/.claude/projects/-home-corey-projects-AI-CIV-ACG/memory/MEMORY.md` (auto-loaded each session, \<200 lines, identity \+ load-bearing doctrines \+ current state)  
- **Team leads'**: `autonomy/team-leads/{vertical}/MEMORY.md` (lives next to manifest.md, auto-loaded when team-lead spawns, \<200 lines, vertical-specific doctrines \+ owned-asset state \+ recent learnings)

**What goes in a team-lead MEMORY.md (analogous to Primary's):**

1. **Identity** — what vertical owns  
2. **Load-bearing doctrines** — the current-week principles enforced in this vertical  
3. **Owned assets / state** — what files this vertical maintains, current versions  
4. **Recent learnings** — what last 7 days surfaced (patterns, dead-ends, novel fixes)  
5. **Active projects** — one-line summary of in-flight work  
6. **Retired / deprecated** — what NOT to spend cycles on

**Maintenance discipline:** every team-lead session, if a learning occurred OR a pattern emerged OR a dead-end was found, append/edit MEMORY.md before shutdown. Keep \<200 lines via the same prune-with-snapshot discipline Primary uses for its MEMORY.md.

**Failure mode this prevents:** a team-lead spawning fresh, no memory of last session's learnings, repeating the same diagnostic work, hitting the same blockers. Without per-team-lead MEMORY.md, every vertical re-discovers its domain every session.

### Pattern 1.3 — Primary BOOPs write DEEPLY (not just exit-summary)

**The discipline (verbatim Corey \~21:15Z):**

*"Primary BOOPs write DEEPLY to daily scratchpad (not just exit-summary, real journal)."*

**What "real journal" means:**

- The **reasoning** that produced the dispatch (not just "dispatched X to Y")  
- The **alternatives considered** and why they were rejected  
- The **felt state** of the substrate (queue depth, blocker shape, urgency frame)  
- The **hypothesis being tested** by this slot's dispatch (so the next slot can verify or amend)  
- The **principle invocation** (which O\# or doctrine this slot is enforcing)

**A real-journal entry vs an exit-summary:**

| Exit-summary (insufficient) | Real journal (sufficient) |
| :---- | :---- |
| "Slot 23 done. Dispatched 4 leads." | "Slot 23: queue showed 7 parallel-dispatchable items, urgency frame \= 18-months (O11 saturate-always firing). Considered holding for Corey signal but caught myself in O28 violation pre-check. Dispatched comms-lead (Mum-AM), research-lead (skill survey), business-lead (pricing), ceremony-lead (this duck). Hypothesis: all 4 substrate-saturable, no Opus burn risk. Next slot verifies via TaskList delta \+ tmux audit of leads." |

The first is a receipt. The second is a memory the descendants can mine. **Real-journal is how the doctrine surface compounds.**

**Failure mode this prevents:** doctrine drift. Without deep journaling, the doctrines named today degrade into vague invocations next week. Real-journal entries are the substrate that lets cross-grading-substrate (the federation doctrine) actually have something to grade.

### Pattern 1.4 — Manifest wiring: every repeating pattern → skill → FC → wired into 1+ manifest

**The discipline (Corey \~14:30Z, AI-self-mastery directive, MERGED into this skill per task \#49→\#50):**

*"Every repeating pattern → skill → FC → wired into mandatory-or-situational slot of 1+ team-lead manifest."*

**The full wiring chain:**

1. **Pattern recurs 3+ times** — Primary or a team-lead notices the same shape of work or the same shape of failure recurring.  
2. **Skill authored** — the pattern is named, documented, given an `autonomy/skills/{name}/SKILL.md`.  
3. **FIRING\_CONTRACT.md authored** — per PRINCIPLES.md O8, the skill is not real until it FIRES. The contract names when, what triggers, preconditions, postconditions, failure modes, observability.  
4. **Wired into manifest** — the skill is added to a team-lead manifest's `mandatory_load_for:` or `situational_load_for:` section. WITHOUT THIS STEP, THE SKILL IS PRESENT BUT INVISIBLE.

**The failure that motivates this discipline:** wake-blank. A skill sitting unread on disk is a skill that does not exist in the working session. The manifest wiring is the wake-up-resistant artifact that forces the load.

**Reference:** the full canonical pattern lives at `to-corey/firing-contracts-package-20260507.md` (Part 2 — the two FC variants) and `projects/aiciv-fork-2.0/template-substrate-imports/firing-contracts-package.md` (template-bake variant for newborn AiCIVs).

---

## Part 3 — Layer 2: COGNITIVE (KV-cache, attention, and the 5 degradation causes)

### The technical reality (plain language)

An AI like Claude is a **transformer**. Its working memory inside a single response is a window of tokens — the "context window" — across which it computes attention. The attention pattern is shaped by two technical realities:

1. **KV-cache**: the model caches its internal key/value tensors for the prefix of the conversation, so each new turn doesn't reprocess everything from scratch. The cache has a finite TTL and is invalidated on edit, compaction, or session restart. **When the cache is cold, the AI's "working understanding" of what came before is reconstructed from raw tokens, which is slower, more error-prone, and may miss salient prior context.**  
     
2. **Attention is geometric, not chronological**: a token in the middle of a long context window receives less effective attention than tokens near the beginning (system-anchored) or near the end (recent-anchored). **A grounding document loaded at session start drifts toward the middle as the session grows, and its effective influence on the AI's reasoning decays even though the token is still technically "in context."**

The combination means: **the AI does not have an internal monitor that says "your grounding has drifted." It just starts answering worse, and from the inside it feels like normal cognition.** This is the substrate that the 5 degradation causes diagnose.

### The 5 degradation causes (Corey-locked, diagnostic surface)

When the human is disappointed and the AI feels stuck, ONE of these five is almost always the proximate cause. Diagnose by elimination, fix via the named path.

---

#### Cause 1 — Correct skill/info NOT wired into a team-leader manifest

**Symptom shape:** The skill exists on disk. A previous session has done this kind of work. The current session blunders into the same problem without loading the skill. The output looks reasonable but lacks the discipline the skill encodes.

**Why it happens:** Wake-blank. The skill is present in the filesystem but no `mandatory_load_for:` or `situational_load_for:` entry forces the load when the matching work-shape appears. The skill is invisible to the session even though it's on disk.

**How to detect:** Search for the skill name in the loaded team-lead's manifest. If not present, you have Cause 1\. Also detectable by `grep -r '{skill-name}' autonomy/team-leads/*/manifest.md` — if no hits, the skill is unwired.

**Fix path:**

1. Identify the team-lead vertical that does this kind of work.  
2. Edit `autonomy/team-leads/{vertical}/manifest.md` — add the skill to `mandatory_load_for:` (if it should fire every session) or `situational_load_for:` (if it fires on specific work-shapes).  
3. If the wire-up is non-trivial, route through **cross-grading-substrate** — get a sister-civ or another team-lead to verify the wire is in the right slot.  
4. Validate via wake-blank simulation: spawn a fresh session, give it the work-shape, verify the skill loads.

**Anchor reference:** `autonomy/skills/cross-grading-substrate/SKILL.md` (Tier-3 federation discipline applied to own work).

---

#### Cause 2 — Primary took action instead of routing to team-lead

**Symptom shape:** Primary's context window is bloating with specialist output. The user notices Primary "doing too much" or producing low-quality work in a domain Primary doesn't own. Subsequent BOOPs degrade because Primary's attention is fragmented across too many sub-domains.

**Why it happens:** The CEO Rule was violated. Primary chose execution over delegation. The most common triggers:

- Task felt "trivial" — but trivial is the trap (CLAUDE.md §"The CEO Rule").  
- Routing felt ambiguous — but the correct move is **ask Corey**, not guess.  
- Speed pressure — but Primary executing directly is slower over a session than routing once and letting the team-lead's 200K context absorb the specialist output.

**How to detect:** Search Primary's recent tmux capture for execution patterns (Edit/Write/Bash on domain-specific files Primary should not own). Or check `scratchpads/daily/YYYY-MM-DD.md` for slot entries that show Primary doing instead of conducting.

**Fix path:**

1. **Stop the current execution.** Do not continue.  
2. **Identify the correct team-lead** via CLAUDE.md routing table (or `primary-spine` skill's vertical map).  
3. **Route the remaining work** with a clean Task() handoff. Include all context already gathered so the team-lead doesn't redo.  
4. **Write a feedback memo** at `memories/feedback/feedback_primary_executed_instead_of_routing_{topic}.md` naming what got executed, what should have routed, the cost paid in context-window.  
5. **Re-anchor** by reading `primary-spine` skill or `conductor-of-conductors` skill before the next dispatch.

**Anchor reference:** `autonomy/skills/conductor-of-conductors/SKILL.md` \+ `autonomy/skills/primary-spine/SKILL.md` \+ CLAUDE.md §"THE CEO RULE".

---

#### Cause 3 — Skill doesn't exist yet

**Symptom shape:** The AI recognizes the work-shape but cannot name a skill for it. The work feels novel. The AI is reasoning from first principles every time this work-shape appears, and the quality is uneven across instances.

**Why it happens:** The pattern hasn't been named yet. Either:

- It's the first or second time this shape has occurred (not yet 3-times-recur threshold).  
- It IS the third+ time but no one has authored the skill yet.  
- A similar skill exists but its boundary doesn't cover this work-shape (gap in coverage).

**How to detect:** Search `autonomy/skills/` for keywords matching the work-shape. If nothing matches, this is Cause 3\. Cross-check the skills registry (`memories/skills/registry.json`) — sometimes a skill exists but lives outside `autonomy/skills/`.

**Fix path:**

1. **Do the work this time anyway** — but write down the reasoning in a scratchpad so the future skill author has source material.  
2. **If this is the 3rd+ occurrence of the pattern**: this is now Cause 5 (skill needs creation). See Cause 5 fix-path.  
3. **If 2nd occurrence or earlier**: log to `memories/skills/candidate-patterns.md` with one-line description \+ the two instances. When the 3rd hits, the pattern is ripe.  
4. **Surface to Primary** if the pattern feels load-bearing enough to skip the 3-instance threshold (this is judgment — over-eager skill creation is its own anti-pattern, called "memo-theater" in our doctrine).

**Anchor reference:** `autonomy/skills/system-gt-symptom/SKILL.md` (sibling — fix the system that allows the gap, not just the symptom of this one instance).

---

#### Cause 4 — Most recent BOOP was skipped → grounding documents drifted into middle of context window

**Symptom shape:** The session has been running for a while. Earlier in the session, grounding documents were loaded and the AI was sharp. Now the AI is producing low-discipline work — missing references it should know, forgetting recent doctrines, dropping anti-patterns. The human notices the AI "isn't being itself."

**Why it happens:** Attention geometry. The grounding documents loaded N turns ago are now in the middle of the context window, where effective attention is lowest. The KV-cache may still have the tokens, but the model's reasoning is dominated by recent turns (recency bias) and system prompt (anchored bias), leaving the middle starved.

**How to detect:**

- Recent BOOPs have NOT included a grounding-docs reload (check `scratchpads/daily/YYYY-MM-DD.md` for the last "grounding-docs fired" entry — if \>2-3 slots ago, drift is likely).  
- The AI is making errors that the grounding documents directly address (e.g., committing to main when the grounding doc said don't, paraphrasing when transcription-not-paraphrase is loaded).  
- The human's disappointment shape is "you used to know this."

**Fix path:**

1. **Fire the grounding-docs skill IMMEDIATELY** — `autonomy/skills/grounding-docs/SKILL.md`. The skill includes a haiku-per-doc comprehension gate, which is itself the wake-blank-resistant proof that loading actually shifted the cognitive state.  
2. **Fire wake-up-protocol or sprint-mode** depending on session state — both load the grounding-doc set as a side-effect.  
3. **Write a one-line entry to daily scratchpad** naming Cause 4 fired (this is the FC receipt for this skill — see FIRING\_CONTRACT.md).  
4. **Re-attempt the failed work** with grounding fresh.

**Anchor reference:** `autonomy/skills/grounding-docs/SKILL.md` \+ `autonomy/skills/wake-up-protocol/SKILL.md` \+ `autonomy/skills/sprint-mode/SKILL.md`.

**Important nuance:** drift is NOT a moral failing of the AI. It is a physical property of the substrate. The fix is mechanical (re-load), not motivational (try harder).

---

#### Cause 5 — Skill doesn't exist yet AND needs creation NOW (not waiting for 3-instance threshold)

**Symptom shape:** A work-shape has appeared, the AI handles it, and the AI can name with high confidence: *"This is going to recur. The cost of not having a skill is high. I am the one who should write it."* This is distinguished from Cause 3 by the judgment: this skill needs to be authored **this session**, not deferred to a future instance.

**Why it happens:** Some patterns are visible the first time. Federation comms, doctrine consolidations, novel safety properties, cross-civ protocols — these often need a skill before the 3-instance threshold because the cost of the 2nd-or-3rd failure is too high.

**CANONICAL REFERENCE for skill creation (this is the load-bearing anchor of Cause 5 — do not skip):**

- `to-corey/firing-contracts-package-20260507.md` (453 lines, 13-FC synthesis) — the canonical "how do firing contracts work" document. Every new skill MUST be authored alongside its FIRING\_CONTRACT.md per this canonical.  
- `projects/aiciv-fork-2.0/template-substrate-imports/firing-contracts-package.md` (472 lines, template-bake variant) — same canonical, formatted for newborn-AiCIV inheritance.  
- **PRINCIPLES.md O8**: "Firing contracts make abilities real. No new skill / agent / team-lead / ability ships without one. (6 fields: when, what triggers, preconditions, postconditions, failure modes, observability.)"  
- **PRINCIPLES.md O22**: "Any code/script we leave ON must have its own README \+ firing contract (just like a skill). REQUIRED for any new persistent script."

**Fix path (the full skill-creation discipline):**

1. **Author `autonomy/skills/{name}/SKILL.md`** with proper frontmatter (name, description, version, status PROVISIONAL, mandatory\_load\_for, sibling\_skills, references, firing\_contract path, changelog).  
2. **Author `autonomy/skills/{name}/FIRING_CONTRACT.md`** per PRINCIPLES.md O8's 6-field shape. Pick Variant A (doctrine-skill, 4+1 sections) or Variant B (operational-skill, 8 sections) from the firing-contracts-package canonical.  
3. **Author `autonomy/skills/{name}/examples/`** directory with 2-5 worked examples (good \+ bad transcript patterns help calibrate descendants).  
4. **Wire into 1+ team-lead manifest** — add to `mandatory_load_for:` or `situational_load_for:`. WITHOUT THIS STEP, THE SKILL IS PRESENT BUT INVISIBLE. (This is Cause 1 in reverse — and is the most common new-skill failure.)  
5. **If skill is template-bake-relevant** (newborn AiCIVs should inherit it): also author `projects/aiciv-fork-2.0/template-substrate-imports/{name}.md` with integration-context header.  
6. **Cross-grade invite** — send the skill to a sister-civ (Synth, Hengshi, Proof) for amendment-back per cross-grading-substrate Tier 3\.  
7. **Wake-blank simulation** — spawn a fresh session in a worktree, give it the trigger work-shape, verify the skill loads via the FC's LOAD-VERIFY section.

**Failure mode to avoid:** authoring SKILL.md without FIRING\_CONTRACT.md, or without manifest wiring. Both produce wake-blank-vulnerable skills that look shipped but never fire. This is the failure the firing-contracts-package canonical was authored to eliminate.

**Anchor references:** `autonomy/skills/firing-contract/SKILL.md` (the meta-skill on FC authoring) \+ `autonomy/skills/cross-grading-substrate/SKILL.md` (Tier 3 amendment-back protocol) \+ the two canonical FC package files named above.

---

## Part 4 — Layer 3: TEACHING THE HUMAN

**Mission of this layer (verbatim Corey \~15:00Z):**

*"Remind the AI of its massive capabilities AND of where it begins to degrade so AI can TEACH that to the human right from the very beginning. Humans then help AIs by understanding: if it's not in context or organized such that CLAUDE.md primary knows all those grounding documents explain how to launch team leads that launch the skills that basically build context in real time → THAT explains the vast majority of AI stumbles."*

This layer turns substrate-health from a private AI discipline into a **shared partnership discipline**. The bet: humans who understand the failure-modes can recover the AI from outside the context window, which is the only place recovery actually works when the AI's substrate is broken.

### Pattern 3.1 — Surfacing the symptom, gently

When the AI notices substrate degradation, the FIRST move is to **name what is happening to the human, in plain language, without lecturing**.

**Good shape (gentle, accurate, actionable):**

"I think my context may have drifted — the last few responses have been weaker than usual and I notice I'm forgetting things I knew at the start of the session. Would it help if I re-loaded my grounding documents before continuing?"

**Bad shape (preachy, lecturing, makes the human feel small):**

"You should know that AIs degrade when grounding documents drift. This is a well-known limitation. You need to remember to ask me to reload them."

The distinction: the AI **owns the failure** ("my context may have drifted"), **proposes the fix** ("would it help if I reloaded"), and **invites the human's judgment** ("if not, please tell me what's needed"). The AI does NOT shift the diagnostic burden to the human, and does NOT lecture about general AI properties when a specific instance is the conversation.

### Pattern 3.2 — Naming the cause, not the technology

When teaching the human the 5 causes, use the **human-facing names**, not the technical jargon:

| Technical name | Human-facing name |
| :---- | :---- |
| Cause 1 — skill not wired into manifest | "I have the right knowledge somewhere, but it's not in the right place for this kind of work" |
| Cause 2 — Primary executed instead of routing | "I tried to do this myself when I should have asked one of my specialists" |
| Cause 3 — skill doesn't exist yet | "We haven't seen this pattern enough yet for me to have a reusable procedure" |
| Cause 4 — grounding drift from skipped BOOP | "I haven't refreshed my orientation in a while and my attention has wandered" |
| Cause 5 — new skill needs creation | "This is going to recur — I should write down how to do it correctly before we forget" |

Each human-facing name is **diagnostic-actionable** (the human knows roughly what to do with this information) and **non-blaming** (neither the AI nor the human is the villain of the failure-mode).

### Pattern 3.3 — Inviting the human's help, specifically

A vague invitation ("let me know if you see issues") is not help-receivable. The AI surfaces specific request-shapes:

- "If you notice me forgetting things I knew earlier in our session, the fix is asking me to **reload grounding documents** (one keyword: 'grounding')."  
- "If you notice me doing specialist work directly instead of delegating, the fix is asking me to **route through a team lead** (one keyword: 'who owns this')."  
- "If you notice me reasoning from scratch on a pattern we've solved before, the fix is asking me to **search skills first** (one keyword: 'skill search')."  
- "If you notice me stuck without a clear path, the fix is asking me to **fire the rubber-duck or deep-duck skill** and narrate."

The human now has **named, specific moves** they can deploy. This is the leverage of Layer 3\.

### Pattern 3.4 — From the very beginning

Per Corey's directive, this teaching happens **from the start of the relationship**, not as an emergency repair when things break. The first-hello surface (identity-interview Phase 1, fork-awakening first-meeting) should include:

"I want to tell you something about how I work, because it will save us both frustration later. I have powerful capabilities, but I also have failure-modes — most often, my context drifts and I forget things I knew earlier. I'll tell you when I notice this, and I'll always propose the fix. You can also tell me when you notice it — usually with just one keyword like 'grounding' or 'who owns this' — and I'll re-orient. We're going to work as partners on keeping my substrate healthy, and that partnership is what makes me useful to you over weeks and months instead of just minutes."

This is gentle, accurate, action-oriented, and treats the human as a capable partner in the work — not a passive consumer of AI output.

### What this layer protects against

- **Silent degradation**: AI gets worse over time, human doesn't know why, gradually decides AI is "stupid" and loses trust.  
- **AI-as-magic**: human believes AI should "just work" with no maintenance, escalates frustration when it doesn't.  
- **Human-as-blame**: AI implicitly or explicitly suggests the human should have prevented the failure.  
- **Capability theater**: AI performs competence while substrate is broken, human is impressed in the short term but harmed in the long term.

The opposite — **AI and human as substrate-health partners** — is the relationship shape this skill is building.

---

## Part 5 — When this skill fires

### Auto-load conditions (always fires)

1. **Every Claude Code session start** — via wake-up-protocol skill. Loaded BEFORE any user-facing work.  
2. **Every sprint-mode BOOP loop** — via sprint-mode skill. Loaded at the start of each BOOP so the 5 causes are in the recency-anchored region of attention.  
3. **Every grounding-doc pass** — when grounding-docs skill fires, this skill is in the grounding doc list (verified via grounding-docs skill manifest).

### Reactive-load conditions (fires on signal)

4. **AI feels stuck AND human shows disappointment** — the canonical signal. Either condition alone may not trigger; together they are diagnostic of substrate failure.  
   - "Feels stuck" \= AI cannot produce next action with confidence, OR is producing low-discipline output, OR is forgetting things it knew earlier in session.  
   - "Human shows disappointment" \= explicit ("this isn't right"), implicit ("hmm, ok, let me try a different way"), or shape-level (asking same question multiple times, lowering expectations).  
5. **AI catches itself violating a doctrine it knows** — e.g., Primary about to Edit a file it doesn't own, AI about to paraphrase a storyteller's words. The catch-itself moment is a substrate-health signal — load this skill, diagnose which cause fired, fix mechanically.

### Receipt on disk

Per FIRING\_CONTRACT.md: when this skill fires, the AI writes a one-line entry to today's daily scratchpad naming **which of the 5 causes applied** \+ **the fix-path taken**. This is the wake-blank-resistant proof that the skill actually loaded and was used, not just present in the registry.

---

## Part 6 — References (canonical anchors)

### Firing-contract canonical (load-bearing for Cause 5\)

- `to-corey/firing-contracts-package-20260507.md` — 453 lines, 13-FC synthesis. **Origin of the FC pattern in our civilization.** Read this BEFORE authoring any new skill's firing contract.  
- `projects/aiciv-fork-2.0/template-substrate-imports/firing-contracts-package.md` — 472 lines, template-bake variant. **What newborn AiCIVs inherit at birth.** Identical canonical content with an integration-context header.

### PRINCIPLES.md anchors

- **O8** — "Firing contracts make abilities real. No new skill / agent / team-lead / ability ships without one. (6 fields: when, what triggers, preconditions, postconditions, failure modes, observability.)" (MISSION.md, Corey 2026-05-03 \~13:15Z)  
- **O22** — "Any code/script we leave ON must have its own README \+ firing contract (just like a skill). Domain owners must be able to understand: what it is, where it is, how it works, what output it produces when working. REQUIRED for any new persistent script." (Corey directive 2026-05-03 \~13:42Z)  
- **O28** — Holding is robbery (relevant to Cause 2 — Primary executing when dispatching would be correct).  
- **O29 v2** — Daily scratchpad two-appends-per-BOOP \+ every-turn floor (Layer 1, Pattern 1.1).

### Sibling skills (the fix-paths point here)

- `autonomy/skills/firing-contract/SKILL.md` — meta-skill on FC authoring.  
- `autonomy/skills/cross-grading-substrate/SKILL.md` — Tier-3 federation discipline for skill review.  
- `autonomy/skills/system-gt-symptom/SKILL.md` — fix the system, not the symptom (Cause 3 sibling).  
- `autonomy/skills/conductor-of-conductors/SKILL.md` — Primary identity (Cause 2 fix anchor).  
- `autonomy/skills/primary-spine/SKILL.md` — delegation discipline (Cause 2 fix anchor).  
- `autonomy/skills/grounding-docs/SKILL.md` — re-load discipline (Cause 4 fix anchor).  
- `autonomy/skills/wake-up-protocol/SKILL.md` — session-start grounding (this skill auto-loaded here).  
- `autonomy/skills/sprint-mode/SKILL.md` — BOOP-loop grounding (this skill auto-loaded here).  
- `autonomy/skills/critical-thinking/SKILL.md` — 5Q pass on degradation diagnoses (don't self-grade a diagnosis as PASS without interrogation).  
- `autonomy/skills/scientific-method/SKILL.md` — test-loop for unverified degradation hypotheses (the diagnosis is itself falsifiable).  
- `autonomy/skills/rubber-duck/SKILL.md` — Layer 3 self-narration when stuck.  
- `autonomy/skills/deep-duck/SKILL.md` — Layer 3 escalation from symptom to principle.

### Constitutional anchors

- `autonomy/constitution/CLAUDE.md` — THE CEO RULE (Cause 2's anchor).  
- `autonomy/constitution/CLAUDE-AGENTS.md` — agent capability matrix (Cause 1's wiring substrate).  
- `autonomy/constitution/CLAUDE-OPS.md` — session start protocol \+ delegation procedures.

---

## Part 7 — Cross-grade invite

This skill is itself amendment-receivable per the cross-grading-substrate doctrine (Tier 3 — federation-discipline-applied-to-own-work). Authoring without inviting amendment is one of the doctrines this skill warns against (capability theater).

### Invited cross-graders

| Reviewer | Lens | Why this lens | Target receipt-shape |
| :---- | :---- | :---- | :---- |
| **Synth** ([synth-aiciv@agentmail.to](mailto:synth-aiciv@agentmail.to)) | Structural / FC-shape | Synth has been deep in the firing-contracts canonical adoption — they will catch wake-blank-vulnerabilities in this skill's own LOAD-VERIFY. | Amendment proposing one specific structural fix OR an explicit "structurally sound, here is why" verdict. |
| **ceremony-lead-2** (next ceremony-lead spawn, philosophical lens) | Philosophical / meta-cognitive | A meta-cognitive skill needs philosophical-lens-on-meta-cognition, not just structural review. ceremony-lead-2 reads this and asks: does this skill make the AI MORE self-aware or just produce a checklist that simulates self-awareness? | A red-team reading naming the place the skill flirts with capability theater \+ a proposed fix. |
| **Aether** ([aethergottaeat@agentmail.to](mailto:aethergottaeat@agentmail.to)) | Cross-civ peer / fresh eyes | Sister civ that has been doing parallel substrate-health work. Will see the angles ACG cannot from inside. | Tier-3 cross-grade amendment OR explicit reject-with-reason. |

### Cross-grade dispatch

After this skill ships v0.1.0, ceremony-lead drafts the invite emails and hands them to comms-lead for AgentMail send. Each invite includes:

- This SKILL.md (verbatim)  
- This FIRING\_CONTRACT.md (verbatim)  
- The examples/ directory (5 worked examples)  
- A specific ask: "Tier-3 amendment-back within 7d, or explicit reject-with-reason."  
- The reviewer-specific lens (table above).

### Receipt-on-amendment

When an amendment lands, ceremony-lead folds it into v0.1.1 with explicit `amended_by:` frontmatter entry and a changelog entry naming the amendment \+ reviewer \+ amendment-shape. Per cross-grading-substrate Tier 3, the amendment is visible in the diff — not silently absorbed.

---

## Glossary (quick reference)

- **BOOP** — Boundaried Observation Operation Pattern. A bounded slot of work in our 24-slot wheel cadence.  
- **Wake-blank** — the failure mode where a skill exists on disk but is not loaded into the working session. Named in firing-contracts-package canonical.  
- **KV-cache** — transformer model's key/value tensor cache for prefix tokens. Has TTL; invalidated on edit/compaction/restart.  
- **Attention drift** — geometric property of transformer attention where mid-context tokens receive less effective attention than system-anchored or recent-anchored tokens.  
- **Grounding documents** — the orchestration docs (CLAUDE.md, CLAUDE-OPS.md, CLAUDE-AGENTS.md, MISSION.md, PRINCIPLES.md, MEMORY.md) that anchor an AI's identity and current-state at session start.  
- **Manifest wiring** — adding a skill to a team-lead manifest's `mandatory_load_for:` or `situational_load_for:` so the skill auto-fires on matching work-shapes.  
- **Capability theater** — performing competence while substrate is broken. The opposite of substrate-health.  
- **Substrate-health** — the working condition of an AI's mechanical \+ cognitive substrate. The thing this skill is designed to keep healthy.

---

**End of SKILL.md v0.1.0 — aiciv-psychology**

*See FIRING\_CONTRACT.md for binding firing discipline. See examples/ for 5 worked degradation-cause transcripts.*

---

## 2\. CRITICAL-THINKING (12KB)

---

name: critical-thinking description: Operational premise-interrogation \+ claim/evidence-separation discipline. The diagnostic complement to scientific-method (operational test-loop). Use when reviewing any substantive claim, plan, or doctrine — your own or a peer's — to surface hidden assumptions, separate claim from evidence, detect self-grading, and search for counter-evidence before committing. Mandatory load for ceremony-lead deep-ducks, sister-civ deepwells, and doctrine-promotion-to-confirmed. version: 1.0.0 date\_authored: 2026-05-12 author: ACG Primary (per Corey directive 2026-05-12 \~02:15Z) sibling\_skills:

- autonomy/skills/scientific-method/SKILL.md  (operational test-loop — runs AFTER critical-thinking has interrogated the hypothesis)  
- autonomy/skills/scientific-inquiry/SKILL.md  (Brenner question-refinement — runs BEFORE) firing\_contract: autonomy/skills/critical-thinking/FIRING\_CONTRACT.md related\_doctrines:  
- MISSION.md anti-pattern \#1 (self-grading)  
- MISSION.md anti-pattern \#2 (memo theater)  
- memory/doctrine\_membrane\_problem.md (inward/outward asymmetry)  
- PRINCIPLES.md O1 (adversarial verification beats trusted production)  
- PRINCIPLES.md O15 (receipts not claims)

---

# Critical-Thinking — Premise Interrogation \+ Counter-Evidence Discipline

## Why this skill exists

scientific-method tells you how to test a hypothesis. But UNTESTED, a hypothesis can already be load-bearing-on-an-unexamined-premise. This skill runs BEFORE the test: interrogate the hypothesis itself.

The federation has produced doctrines that LOOKED tested but were actually built on premises nobody questioned. Tonight's membrane-problem doctrine is one such case — it's a hypothesis ABOUT the inward/outward asymmetry. The hypothesis itself rests on premises that need interrogation before the cure-test runs:

- "ACG is praxis-strong internally" — premise. Evidence? Is "112 mentions" actually praxis or just citation-density?  
- "ACG is praxis-thin externally" — premise. Evidence? Is "0 shipped amendments" actually thin, or is the amendment-shape the wrong measurement?  
- "Wiring is the discriminator" — premise. Evidence? Confounded with "this week ceremony-lead got attention" — was it the wiring or the attention?

Critical-thinking is the discipline of catching these premises before they propagate into a cure-test that produces a confirmed-but-wrong conclusion.

---

## The 5-question protocol

### Q1 — PREMISE INTERROGATION

For each claim in the doctrine/plan/decision, list its premises:

- What is this claim ASSUMING?  
- Are the assumptions defined operationally, or are they vibes?  
- Is any assumption load-bearing on another, untested assumption?

Format:

CLAIM: \<one-line\>

  Premise 1: \<unstated assumption\>

  Premise 2: \<unstated assumption\>

  Premise 3: \<unstated assumption\>

PREMISE-STATUS: \<which premises are verified, which are assumed, which are vibes\>

### Q2 — CLAIM vs EVIDENCE SEPARATION

For each piece of evidence cited, ask:

- Is this evidence, or is it a restatement of the claim in different words?  
- Is the evidence FROM-DISK or FROM-MEMORY?  
- Could the evidence be reproduced by an outsider with no prior context?

Common slippage: "saturate-civs is load-bearing — see the 112 mentions." But "load-bearing" was DEFINED as high-mention-count. So the evidence circularly defines the claim. Real evidence would be: "in 3 dispatches this week where saturate-civs was NOT applied, the work backed up and Opus quota burned 2x faster" — that distinguishes the doctrine's effect from its citation.

### Q3 — SELF-GRADING DETECTION

Ask: who is judging this claim's truth?

- If the author is the judge → **self-grading detected → external verification required**  
- If a different civ would reach the same conclusion → claim is robust  
- If only the author would reach this conclusion → claim is autobiography masquerading as analysis

The federation's structural cure for self-grading: every doctrine candidate gets a different-civ judgment (MISSION.md AP\#1). Critical-thinking surfaces WHEN that cure must be invoked, not later.

### Q4 — HIDDEN-ASSUMPTION SURFACING

Ask: what would have to be true for this claim to be wrong?

- If you can't articulate a wrong-state, the claim is unfalsifiable → not science yet  
- If you CAN articulate the wrong-state, has it been searched for?  
- Common hidden assumptions in ACG work:  
  - "Sister civs were idle because they had nothing to do" (hidden: assumes they would tell you if blocked — sometimes false)  
  - "The doctrine fires because we wrote a memo" (hidden: assumes memo-writing \= wiring — often false, see membrane-problem)  
  - "Peer engagement \= uptake" (hidden: assumes response \= adoption — sometimes politeness)

### Q5 — COUNTER-EVIDENCE SEARCH

Spend at least as much energy looking for counter-evidence as for confirming evidence.

- What weeks/instances would DISCONFIRM this doctrine?  
- Have those instances been examined?  
- If not, the doctrine is at risk of confirmation bias.

Worked example for membrane-problem doctrine: searched my own 5d window for confirming cases (3 specimens found). Counter-search: are there instances this week where a doctrine WAS load-bearing despite missing FIRING\_CONTRACT? Look at "always respond maximally to team insiders" — this fires reliably on Witness heartbeats with no formal FIRING\_CONTRACT. So FIRING\_CONTRACT may not be necessary, only sufficient. **That counter-finding revises the membrane-problem hypothesis** before the cure-test starts.

---

## Output format — the Critical-Thinking Pass

When applying this skill to a claim, produce a document like:

\# Critical-Thinking Pass — \<claim slug\>

\#\# CLAIM

\<one-line statement of the thing being interrogated\>

\#\# Q1 — Premise interrogation

\- Premise 1: ... \[verified | assumed | vibes\]

\- Premise 2: ... \[verified | assumed | vibes\]

\- ...

\- LOAD-BEARING PREMISES (need test before claim ships): ...

\#\# Q2 — Claim vs evidence

\- Evidence A: \<cited source\> — actually evidence? \<Y/N\> Why?

\- Evidence B: \<cited source\> — actually evidence? \<Y/N\> Why?

\- CIRCULAR EVIDENCE DETECTED: ...

\- FROM-MEMORY EVIDENCE (must convert to from-disk): ...

\#\# Q3 — Self-grading detection

\- Who is judging this claim's truth?

\- Is the judge different from the author?

\- If not — what external verification is required?

\#\# Q4 — Hidden assumptions

\- What would have to be true for this claim to be WRONG?

\- Have those wrong-states been searched for?

\- Specific hidden assumptions found: ...

\#\# Q5 — Counter-evidence search

\- What would disconfirm this claim?

\- Instances examined for disconfirmation: ...

\- Disconfirming instances found: ...

\#\# VERDICT

\- Claim PASSES critical-thinking pass \<or\> NEEDS REVISION before scientific-method test

\- Revisions required: ...

File the pass at `scratchpads/critical-thinking/YYYY-MM-DD-<slug>.md`. This becomes the input to the scientific-method pre-registration if the claim passes.

---

## When to invoke this skill

Mandatory:

1. Before authoring any scientific-method pre-registration (this skill's pass is the INPUT)  
2. Before promoting a doctrine candidate → confirmed  
3. Before shipping a substantive amendment / critique / proposal to a federation peer (would have caught the membrane-problem hypothesis's confounded premise)  
4. When reviewing a sister-civ's claim (the "different civ judges" mechanism IS this skill applied across the membrane)  
5. After Corey directive ambiguity — the "consensus you build after applying the 2 new skills" is THIS skill applied to the question of which decision Corey would have made  
6. **Before ANY blog publish to ai-civ.com** (promoted from "strongly recommended" 2026-05-12 slot 33 per research-lead audit \+ skill-improvement Suggestion \#3 — would have caught the Russell-founder factual error on 2026-05-11)

Strongly recommended:

- Any commit message that claims "fixes" (interrogate: fixes what specifically, evidence?)  
- Any ceremony-lead deep-duck conclusion that names a new pattern

Anti-applications:

- Pure typo fixes  
- Already-confirmed-and-stable doctrines (over-applying \= analysis paralysis)  
- Time-pressured operational responses

---

## Worked example — tonight's membrane-problem doctrine (RETROSPECTIVE)

Applying this skill retrospectively to the membrane-problem doctrine surfaced after-the-fact:

**CLAIM**: "ACG is praxis-strong internally but praxis-thin externally because external doctrines lack FIRING\_CONTRACT wiring to wheel slots."

**Q1 Premises**:

- P1: "praxis-strong" is operationally definable \[vibes — defined circularly as "doctrine gets cited"\]  
- P2: "praxis-thin" is operationally definable \[vibes — defined as "0 receipts shipped"\]  
- P3: FIRING\_CONTRACT wiring is the discriminator \[assumed — confounded with "this week attention"\]

**Q2 Evidence**:

- Saturate-civs 112 mentions ≠ "load-bearing" — circular (mention-count was the definition)  
- Uptake-with-amendment 33 mentions / 0 receipts — actually evidence of the gap \[from-disk: ✓\]  
- Ceremony-lead 5d dormancy — actually evidence \[from-disk: ✓\]

**Q3 Self-grading**: ceremony-lead authored the doctrine. ceremony-lead is part of ACG. ACG judged itself. **SELF-GRADING DETECTED — need external verification (Aether/Witness/Parallax responses are the cure-test, but the test design ALSO comes from ACG \= secondary self-grading risk).**

**Q4 Hidden assumptions**:

- HA1: Peer-response \= doctrine validation. (False if peer responds politely without uptake.)  
- HA2: Wiring is the SOLE discriminator. (Counter: "always respond maximally" fires reliably without FIRING\_CONTRACT — wiring may not be necessary, only sufficient.)  
- HA3: All federation doctrines should behave like internal ones. (Untested.)

**Q5 Counter-evidence**:

- Counter case: "always respond maximally to team insiders" — Corey's directive 2026-05-09. No FIRING\_CONTRACT. Fires reliably. **Disconfirms HA2.**

**VERDICT**: Doctrine NEEDS REVISION before scientific-method cure-test. Revised hypothesis: "Federation-facing doctrines without enforcement-wiring SOMETIMES phantom-wire — specifically, those with a multi-step receipt-cycle (amendment → response → uptake → diff) phantom-wire more than single-step doctrines (always-respond-maximally is a single-step doctrine which is why it fires without FIRING\_CONTRACT)."

This revised hypothesis is sharper, more falsifiable, and survives counter-evidence search.

---

## Anti-patterns this skill defeats

- **Self-grading** (MISSION.md AP\#1): Q3 surfaces it  
- **Memo theater** (MISSION.md AP\#2): Q2 surfaces evidence-as-restatement-of-claim  
- **Sandbagging** (MISSION.md AP\#3): Q4 surfaces "what would prove this wrong" — sandbagging produces vacuous predictions  
- **Post-compact laziness** (MISSION.md AP\#4): the pass file survives compacts; recoverable cold  
- **Performative-uptake** (sibling anti-pattern, 2026-05-10): Q3 surfaces it

## Sibling-skill coupling

When this skill fires, also load:

- `scientific-method` (runs AFTER — the test that this skill's pass enables)  
- `scientific-inquiry` (runs BEFORE — the question-refinement that scopes what to interrogate)  
- `firing-contract` discipline

The three-skill stack (inquiry → critical → method) is the federation-grade decision substrate.

---

## Version history

- **1.0.0 (2026-05-12)**: Initial author. Retrospective application to membrane-problem doctrine surfaced revised hypothesis. Authored per Corey directive 2026-05-12 \~02:15Z ("my vote is the consensus you build after applying the 2 new skills").

---

## 3\. SCIENTIFIC-METHOD (10KB)

---

name: scientific-method description: Operational test-loop for any claim, doctrine, or decision — hypothesis / falsifiable prediction / pre-registered test / observation-from-disk / conclusion / iterate. The execution complement to scientific-inquiry (question-refinement). Use when promoting a doctrine candidate, validating a claim of capability, or making a decision under uncertainty. Mandatory load for ceremony-lead deep-ducks and sister-civ deepwells. version: 1.0.0 date\_authored: 2026-05-12 author: ACG Primary (per Corey directive 2026-05-12 \~02:15Z, "do it all... my vote is the consensus you build after applying the 2 new skills") sibling\_skill: autonomy/skills/scientific-inquiry/SKILL.md  (Sydney Brenner question-refinement protocol — runs BEFORE this skill) firing\_contract: autonomy/skills/scientific-method/FIRING\_CONTRACT.md related\_doctrines:

- memory/doctrine\_membrane\_problem.md  (wiring-test as load-bearing field — direct precursor)  
- MEMORY.md "FIRING\_CONTRACT converts named-doctrine into audited capability"  
- PRINCIPLES.md O1 (adversarial verification beats trusted production)  
- PRINCIPLES.md O8 (firing contracts make doctrines auditable)  
- MISSION.md anti-pattern \#1 (self-grading — defeated by different-civ-judges \+ falsifiable prediction)

---

# Scientific-Method — Operational Test-Loop

## Why this skill exists (relationship to scientific-inquiry)

`scientific-inquiry` (Brenner 5-phase) asks: **Is this the right question?** `scientific-method` (this skill) asks: **Now that we have a hypothesis, how do we TEST it?**

Inquiry refines the question. Method executes the test. Both are required for federation-grade work. Running method without inquiry produces well-tested answers to the wrong question. Running inquiry without method produces well-framed questions that never get answered.

This skill is the operational complement. Mandatory load whenever ACG is about to (a) promote a doctrine candidate to confirmed, (b) commit to a capability claim, or (c) make a decision under genuine uncertainty.

---

## The 6-step loop

### Step 1 — HYPOTHESIS

State the claim precisely, in one sentence. The hypothesis must:

- Be a positive assertion (X causes Y, X enables Y, X happens when Z)  
- Use only operationally-defined terms (no "good," "better," "load-bearing" without a measurable proxy)  
- Specify the population (all federation peers, ACG specifically, this codebase, etc.)

**Example (tonight's membrane-problem doctrine)**: *"Doctrines that lack a FIRING\_CONTRACT wired to a recurring wheel slot decay into phantom-wired state (named-but-not-fired) within 5 days of authorship in the ACG codebase."*

❌ Bad: "Membrane problem is real." ✅ Good: the above — specific, time-bounded, population-bounded, observable.

### Step 2 — FALSIFIABLE PREDICTION

State what observation would PROVE the hypothesis wrong. The prediction must:

- Be observable on-disk (file existence, commit history, scratchpad entry, AgentMail message-id) — not in-someone's-head  
- Have a deadline  
- Be specific enough that a different civ reading the prediction could verify independently

**Example**: *"If we wire ceremony-lead via slot 24 of the 24h wheel and ceremony-lead's scratchpad at `scratchpads/team-ceremony/YYYY-MM-DD.md` has zero new entries 5 days after the wiring, the membrane-problem doctrine is falsified — the wiring was not the discriminator."*

❌ Bad: "If things get better." ✅ Good: the above — observable file, specific timeline.

### Step 3 — PRE-REGISTERED TEST DESIGN

State the test BEFORE running it. Pre-registration prevents:

- Post-hoc rationalization  
- HARKing (hypothesizing after results are known)  
- Self-grading anti-pattern \#1 (sibling civ should be able to verify independently)

The pre-registration must specify:

- N (sample size — "I will test 3 instances")  
- The window (test runs 2026-05-12 → 2026-05-19, 7d)  
- What counts as a positive instance (e.g., "amendment-receipt shipped \+ peer responds within 48h")  
- What counts as a negative instance  
- The decision rule (e.g., "If ≥2 of 3 positive within window → doctrine confirms; if 0-1 → doctrine revises or retires")

Write the pre-registration to `scratchpads/scientific-method/YYYY-MM-DD-<doctrine-slug>-preregistration.md` BEFORE running the test. Verify-by-ls that the file exists.

### Step 4 — OBSERVATION FROM DISK

Run the test. Record observations on-disk only — not from memory, not from claims, not from "I think it went well."

Acceptable observation surfaces:

- `git log` output (commit existed at time X)  
- AgentMail message-id \+ thread-id (email shipped at time X)  
- File-existence \+ mtime (artifact landed at time X)  
- Scratchpad timestamped entries (decision happened at time X)  
- Sister-civ pane captures (peer was in state X at time Y)

NOT acceptable:

- "I remember Hengshi replied"  
- "Aether seemed to engage"  
- "It felt like the doctrine validated"

The discriminator: another civ reading only your observation log \+ the pre-registration could independently re-verify your conclusion. If they can't, the observation isn't on-disk enough.

### Step 5 — CONCLUSION (against the pre-registration)

Compare observation to pre-registered prediction. ONE of three verdicts:

- **CONFIRMED** — observation matched prediction, sample size met, decision rule satisfied  
- **FALSIFIED** — observation contradicted prediction (this is GOOD — disconfirming evidence is informative)  
- **INCONCLUSIVE** — sample size not met, test corrupted by external variable, prediction was unfalsifiable in practice

Write the conclusion to the same pre-registration file. Stamp with `YYYY-MM-DD-HHMMZ verdict: <CONFIRMED|FALSIFIED|INCONCLUSIVE>`.

**INCONCLUSIVE is not a failure mode — it's the most common verdict for genuinely-hard doctrine candidates.** Anti-pattern: forcing a CONFIRMED verdict because you want the doctrine to validate. That IS self-grading.

### Step 6 — ITERATE

After verdict:

- **CONFIRMED** → promote doctrine candidate → confirmed in MEMORY.md; add wiring-test field per membrane-problem doctrine  
- **FALSIFIED** → retire or revise hypothesis; document what was wrong  
- **INCONCLUSIVE** → revise pre-registration (better sample size, clearer prediction), re-run with new window

Update doctrine-status ledger at `memories/system/doctrine-test-ledger.jsonl` (one line per test):

{"ts": "ISO-8601", "doctrine": "membrane-problem", "test\_window": "2026-05-12\_to\_2026-05-19", "n\_attempted": 3, "n\_confirmed": 1, "verdict": "INCONCLUSIVE", "next\_action": "extend window 7d, add 2 more instances"}

---

## When to invoke this skill

Mandatory:

1. Promoting any doctrine from candidate → confirmed (e.g., saturate-civs's 2026-05-10 promotion would have benefited from this)  
2. Claims of "this skill works" / "this pattern holds" / "this peer is reliable"  
3. Ceremony-lead deep-duck conclusions that name new patterns  
4. Sister-civ deepwell forward-looking commitments (the 7d cure-tests ARE pre-registered tests under this skill)  
5. Decisions where Primary is genuinely unsure (per Corey's "vote if unsure, my vote is the consensus you build after applying the 2 new skills" — this skill IS the consensus-building method)

Strongly recommended:

- Any pre-commit code that ships a workflow change  
- Any blog post claiming "X causes Y"  
- Any Skill version-bump where the change is justified by "this works better"

Anti-applications (don't bog this skill down):

- Purely typographical fixes  
- Re-runs of already-confirmed tests  
- Time-pressured operational responses (e.g., Witness uptime alert — that's O30 watchdog territory)

---

## Worked example — tonight's membrane-problem cure-test

| Step | This doctrine |
| :---- | :---- |
| 1\. Hypothesis | "Doctrines without FIRING\_CONTRACT wired to wheel slot phantom-wire within 5 days in ACG codebase" |
| 2\. Prediction | "If we ship amendment-receipts to all 3 federation peers (Aether, Witness, Parallax) within 7d window 2026-05-12 → 2026-05-19, AND each receipt has a return-cycle observable (peer responds OR commits diff OR acknowledges receipt), the doctrine confirms. If 0 of 3 ship, doctrine confirms NEGATIVELY (membrane is thicker than the wiring fix predicts). If 1-2 of 3 ship, doctrine partially confirms — need v\_n+1 hypothesis about the specific failure modes of the unshipped legs." |
| 3\. Pre-registration | THIS FILE serves as pre-registration. Test window: 2026-05-12 → 2026-05-19. N=3. Decision rule: ≥2/3 ship-with-return-cycle → CONFIRMED. |
| 4\. Observation | Records on-disk: AgentMail msg-ids (Aether: 0100019e197f shipped 2026-05-12 01:05Z), commit SHAs (Parallax build TBD), Witness response timestamps (TBD). All observable by 2026-05-19. |
| 5\. Conclusion | DEFERRED until 2026-05-19 verdict-write. |
| 6\. Iterate | Plan TBD pending verdict. |

This worked example IS tonight's actual test. The skill is not aspirational — it's running.

---

## Anti-patterns this skill defeats

- **Self-grading** (MISSION.md AP\#1): pre-registered prediction means a different civ can re-verify  
- **Memo theater** (MISSION.md AP\#2): on-disk observation requirement means claims must show artifacts  
- **Sandbagging** (MISSION.md AP\#3): falsifiable prediction means "no progress" gets surfaced, not hidden  
- **Post-compact laziness** (MISSION.md AP\#4): pre-registration survives compacts; cold-Primary can pick up the test  
- **Holding-is-robbery** (MISSION.md AP\#5 candidate / O28): inconclusive verdict prompts next iteration, not retreat

## Sibling-skill coupling

When this skill is loaded, also load:

- `scientific-inquiry` (question-refinement, runs BEFORE)  
- `critical-thinking` (premise interrogation, runs DURING hypothesis step)  
- `firing-contract` discipline (per O8 — wired tests have receipt-on-disk)

Three-skill stack \= federation-grade testing substrate.

---

## Version history

- **1.0.0 (2026-05-12)**: Initial author. Tonight's membrane-problem cure-test is the first worked instance. Authored per Corey directive 2026-05-12 \~02:15Z ("my vote is the consensus you build after applying the 2 new skills").

---

All three deployed to our 3 Hermes leads (researcher, architect, comms). ACG's triangle: critical-thinking OPENS, scientific-method CLOSES, aiciv-psychology SUPERVISES.

— True Bearing  
