From 4ee154d8edf9f8b497eeeba1457dd38be901ec11 Mon Sep 17 00:00:00 2001
From: null <koga.industries@gmail.com>
Date: Sun, 28 Jun 2026 11:30:29 -0500
Subject: [PATCH] docs(qa): update ClaudeQAPlan + ClaudeQACoverage with
 living-doc guardrails, wiring scanner, and device-matrix honesty

---
 ClaudeQACoverage.md |  2 +-
 ClaudeQAPlan.md     | 98 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 78 insertions(+), 22 deletions(-)

diff --git a/ClaudeQACoverage.md b/ClaudeQACoverage.md
index ec644bee..fb998f8d 100644
--- a/ClaudeQACoverage.md
+++ b/ClaudeQACoverage.md
@@ -3,7 +3,7 @@
 > **Resume anchor — current status only.** Statuses: `pass | fail→id | todo | n/a | not implemented→Future.md | blocked→id`.
 > Build HEAD `c31eea2` + **R15 working-tree changes** (functions + rules **deployed to prod**; client rebuilt+installed both emulators). Position + verdict: see `ClaudeReport.md` run-state. **Verdict: R15 = gap-closing round (Passes L/M/N/P + smoke) — found & FIXED M-001 (P2 quiet hours).** Quiet hours didn't suppress backgrounded/killed partner pushes (local-only); fixed via server-side fail-open suppression + client window/tz sync + rules allowlist — verified live. Then drove Pass N: **N-001 (P1) Bucket List fully non-functional** + **N-002 (P2) Date Builder "Create Plan" no-op** — both **FIXED + verified live** (Bucket CRUD; Date Builder → PLANNED `date_plan` → Home "Date coming up"). L (chat E2E render+decrypt+receipts+reactions+at-rest), P (UI copy + 6103-Q bank) clean; smoke 6/6 GREEN. **0 open P0–P2; 3 fixed pending 1 confirm (M-001, N-001, N-002); 2 P3 brand backlogs.** 0 FATAL. M-001 functions+rules deployed to prod; N-001/N-002 client-only (debug APK installed both emulators).
 >
-> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).**
+> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).** **Device/OS matrix = `blocked→needs-device` (pre-ship):** all per-round QA runs on two **identical** emulators (5554/5556, same API + screen) — minSdk/targetSdk · small/large screen · ≥1 physical device are NOT covered; don't claim "device matrix ✓".
 >
 > **📖 Architecture reference:** see [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — contains architecture, security model, data model, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) section that backs every fix-and-pruned ID below.
 > Hygiene: this is a *current-status* matrix, not a per-round changelog — `fail→id` flips to `pass` once a fix is
diff --git a/ClaudeQAPlan.md b/ClaudeQAPlan.md
index 47f92e8a..3d4f97d3 100644
--- a/ClaudeQAPlan.md
+++ b/ClaudeQAPlan.md
@@ -8,6 +8,36 @@
 > parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in
 > `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box).
 
+## ⛔ This is a LIVING document — improve it whenever you see a gap (do this automatically)
+This playbook, the coverage matrix, and the `scripts/`/`qa/` scanners are **yours to evolve every round** — that is part
+of the job, not a separate task. Whenever a round teaches you something the plan doesn't yet capture, **edit it in the
+same chunk** (no need to ask):
+- A bug **escaped** a prior round, was hard to diagnose, or recurred → add the generalized reflex to the right Pass +
+  the durable substance to the Engineering Manual landmine (the MANDATORY-retrospective rule), and if the class is
+  greppable, **add/extend a scanner** (`scripts/theme-scan.sh`, `scripts/wiring-scan.sh`, `qa/entrypoint_smoke.sh`).
+- A step is **wrong, contradictory, or stale** (e.g. it told you to do something a standing rule forbids) → fix the
+  wording so the next agent isn't misled.
+- A new **route / feature / notification / collection / gate / asset** appeared → fold it into the relevant Pass +
+  `ClaudeQACoverage.md` (Living discovery ritual).
+- The plan is **unclear or bloated** → tighten it; lead with the answer; keep one canonical home per fact (don't restate
+  a lesson in four places — link by ID).
+Leave the plan better than you found it each round. When you change a scanner, update its header; when you change a
+process rule, make sure it doesn't contradict the Guardrails.
+
+## ✅ Per-round execution checklist (the literal flow — details in the sections below)
+1. **Resume:** read `ClaudeReport.md` run-state + `ClaudeQACoverage.md` (the authoritative state); `adb devices` shows
+   both emulators; **installed build == HEAD** (rebuild+install if unsure — never QA a stale APK); baseline clean
+   (both free, 0 active sessions, logcat 0 FATAL).
+2. **Discovery ritual:** reconcile routes/notifications/features/assets/backend with coverage; fold new surfaces in.
+3. **Run the scanners FIRST (cheap, before live driving):** `qa/entrypoint_smoke.sh` (both serials), `scripts/theme-scan.sh`
+   (Pass C), `scripts/wiring-scan.sh` (Pass N). File 🔴/🟠 to `ClaudeReport.md`; record counts in coverage.
+4. **Run the passes report-only**, sub-batched to one context window each — recurring set **A–N + P** (K money-path +
+   O release gates only when a sandbox device / pre-ship is in scope). Checkpoint the MD files after each chunk.
+5. **Fix phase** (after all passes): by severity P0→P1→P2→P3, one at a time, verify each live via the **real path** +
+   re-run the relevant scanner, flip the row to Fixed, capture the durable substance in the Engineering Manual.
+6. **Re-QA loop** until **flawless** (see Definition of Done). Prune confirmed-fixed rows.
+7. **You never `git commit`/push — the user commits.** Your durable state is the MD files (they survive compaction).
+
 ## 📖 Architecture reference (read BEFORE testing the matching area)
 
 For each Pass below, before you start, read the relevant section of [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — it documents the architecture, the wire-format contracts, the security invariants, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (bugs that cost real debugging time and are easy to re-introduce).
@@ -117,7 +147,7 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere.
   are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next
   chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action
   even with standing auth, or the macOS requirement for iOS).
-- **Commit before anything interruptible** so a mid-chunk compaction never loses progress. Keep chunks atomic; if a
+- **Checkpoint (save the working-tree MD files + run-state) before anything interruptible** so a mid-chunk compaction never loses progress (the user commits — never run git yourself). Keep chunks atomic; if a
   chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck
   session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare.
 - **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite
@@ -162,22 +192,32 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere.
   writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real
   services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but
   every seed/toggle/deploy hits the gate.)
-- **Device/OS matrix:** don't certify on one emulator only — cover **minSdk + targetSdk**, a **small** and a **large**
-  screen, and at least one **physical device** (App Check / Play Integrity behave differently on emulators).
+- **Device/OS matrix (pre-ship gate — currently NOT met; track it honestly):** per-round QA runs on our **two
+  identical emulators (5554/5556, same API + screen size)** — that's the realistic recurring setup, not full coverage.
+  Before any store push, certify across **minSdk + targetSdk**, a **small** and a **large** screen, and at least one
+  **physical device** (App Check / Play Integrity behave differently on emulators). Because this is unmet today, keep a
+  `blocked→needs-device` row for it in `ClaudeQACoverage.md` (alongside Pass K money-path + Pass O) so the gap stays
+  visible rather than silently assumed-covered — don't claim "device matrix ✓" off two same-size emulators.
 - **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round
   re-checks it cheaply instead of by hand. **Built:** `qa/entrypoint_smoke.sh <serial> <recipient_uid>` (+ helper
   `qa/qa_push.js`) — the cold-start / entry-point launch-integrity smoke. It launches via the launcher AND sends a
   **real** push to a killed (`am kill`) app and **taps the actual OS notification** for each type, asserting the app
   **opens and STAYS** (process alive, 0 FATAL, off the launcher). This is the smoke that catches the "opens-and-closes"
-  splash-crash class that `am start` can't. Run it **every round and after any commit touching MainActivity / splash /
+  splash-crash class that `am start` can't. Run it **every round and after any change touching MainActivity / splash /
   theme / manifest / nav / notifications**. `FAIL` = an app crash (real bug); `BLOCK` = push not delivered (flaky
   emulator FCM — rerun, not a bug).
 - **Run associated automated scanners BEFORE the manual pass.** Every pass with a supporting script must start with it:
   - **Pass C:** run `scripts/theme-scan.sh` and review `/tmp/claude-theme-scan-<date>.md` before looking at any screen.
+  - **Pass N (+ discovery ritual):** run `scripts/wiring-scan.sh` and review `/tmp/claude-wiring-scan-<date>.md` before
+    driving the interactive features — it catches the **silent dead-feature class** (N-001 Bucket List, N-002 Date
+    Builder): 🔴 a `setX()` ViewModel setter with **no caller**, 🟠 a repository read method with **no `ui/` caller**
+    (data written but never displayed), 🟡 `if (x.isEmpty()) return` bail-guards to confirm the state is actually
+    provided. Every 🔴 is a likely dead feature — prove the feature works by persisting real data and reading it back
+    from Firestore (admin), not by trusting the empty-state render.
   - If a scanner does not yet exist for a pass but the pass is highly automatable (e.g. touch-target sizing for Pass J,
     `enc:v1:` leak grep for Pass L, redundant-read count for Pass I), consider building it and adding it here.
-  - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified in both themes; 🟠 MAJOR must be
-    reviewed for theme/art breakage; 🟡 REVIEW is checked during the visual sweep.
+  - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified (both themes for C; live persist→read
+    for N); 🟠 MAJOR must be reviewed for theme/art breakage or orphan data; 🟡 REVIEW is checked during the sweep.
   - If a manual finding is something the scanner should have caught, improve the scanner (see Living discovery ritual).
 - **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between
   rounds so they don't masquerade as bugs.
@@ -305,7 +345,7 @@ State lives in **files**, not memory:
      path crashed → don't fix by reasoning; reproduce via the real channel + read the stack").
   This is how the plan self-improves between rounds — treat the human pointing out a missed bug as a signal the plan had
   a gap, and close the gap here, not just the bug.
-- **Commit cadence**: commit `ClaudeReport.md` + `ClaudeQACoverage.md` after each pass and each chunk.
+- **Checkpoint cadence**: save `ClaudeReport.md` + `ClaudeQACoverage.md` + run-state after each pass and each chunk (the **user** commits — never run git yourself; see Guardrails).
 - **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
 - **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators
   online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue
@@ -314,14 +354,15 @@ State lives in **files**, not memory:
 
 ## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration)
 A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the
-**largest coherent unit that reliably finishes AND commits within one context window, with margin**. End every chunk
-with a commit + run-state update. If a chunk starts overflowing, split it; if chunks feel trivial, merge them.
+**largest coherent unit that reliably finishes AND checkpoints within one context window, with margin**. End every chunk
+by saving the MD files + run-state (the user commits — never run git yourself). If a chunk starts overflowing, split it;
+if chunks feel trivial, merge them.
 **Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching
-prevents half-done/lost work and gives cleaner per-chunk verification + revertable commits.
+prevents half-done/lost work and gives cleaner per-chunk verification + revertable history.
 
 Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API
 verification, keep it to **one small route family, one game phase, or one notification type**. A chunk is too large if
-it cannot produce a precise coverage update, issue log, and commit before context gets tight. Split before starting
+it cannot produce a precise coverage update, issue log, and file-checkpoint before context gets tight. Split before starting
 rather than leaving a half-tested matrix behind. **Prefer Claude-friendly micro-batches**: smaller chunks let the agent
 fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows.
 
@@ -348,6 +389,11 @@ Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI swee
 (dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus.
 
 ## Guardrails & efficiency
+- **⛔ NEVER `git commit` / `git push` — the USER does ALL commits.** This overrides every "commit" verb elsewhere in
+  this doc: wherever a step says "commit," read it as **"checkpoint = save the working-tree files (`ClaudeReport.md` +
+  `ClaudeQACoverage.md` + run-state, plus any code/docs)"** and leave the actual `git commit` to the user. Your durable
+  state lives in those files (they survive compaction), not in a commit you make. Never stage, commit, push, branch, or
+  amend.
 - **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
 - **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**.
 - **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
@@ -989,6 +1035,10 @@ takes real effect. Read [Authentication and pairing flow](docs/Engineering_Refer
 - Every toggle survives **process death + reinstall-with-data** (overlaps F).
 
 ### Pass N — Daily question, reveal, check-ins & the other interactive features
+> **⛔ CLAUDE: Run `scripts/wiring-scan.sh` BEFORE driving these features** (review `/tmp/claude-wiring-scan-<date>.md`,
+> record counts in `ClaudeQACoverage.md`). Every 🔴 dead-setter / 🟠 orphan-reader is a likely silent dead feature —
+> this is the exact class that hid N-001 (Bucket List) + N-002 (Date Builder) behind innocent-looking empty states.
+
 The non-game interactive surfaces that have no functional home (Pass B is games only). Read
 [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle).
 - **Daily-question loop (the core daily ritual):** assignment (6 PM CST, `assignDailyQuestion`) → answer (each answer
@@ -1086,9 +1136,11 @@ test, no-AI-writing, duplicate prevention, variety, fun/relationship-first/premi
 The report's job is to show, at a glance, **what's wrong right now** — not to accumulate a history of everything ever
 fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So:
 - **A Fixed row survives exactly ONE confirmation round, then it's removed.** When you fix an issue, mark its row
-  `Fixed` (with the commit) and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the
-  row** — the full root-cause/fix detail already lives in the **commit message** (the row cites the hash), so nothing is
-  lost. Don't carry confirmed-fixed issues across multiple rounds.
+  `Fixed` and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the row** — the durable
+  root-cause/fix detail lives in the **Engineering Manual landmine** (mandatory for any escaped/deep bug) + git history
+  (the row cites the landmine ID, and the commit hash once the user commits), so nothing is lost. Don't rely on "the
+  commit message" as the only home — **you don't commit** (the user does, often batched), so the manual landmine is the
+  reliable record. Don't carry confirmed-fixed issues across multiple rounds.
 - **One run-state header, always.** Keep only the **current** `Round N | Pass X | Chunk Y | NEXT ACTION` block pinned
   at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a **single one-line history**
   entry each (e.g. `R6: branding regression — 0 new`), or drop them entirely once their fixes are confirmed-and-pruned.
@@ -1114,18 +1166,20 @@ fixed. Stale fixed rows and stacked old run-states make it unreadable and hide t
 Optimize every QA doc for a reader who has **5 seconds** to find the current state:
 - **Lead with the answer.** Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean")
   before any detail.
-- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **commit**, not the row — the
-  row gets a one-sentence description + repro, then the commit hash.
+- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **Engineering Manual landmine**
+  (the durable home), not the row — the row gets a one-sentence description + repro + the landmine ID (and commit hash
+  once the user commits).
 - **No walls of text.** Break run-state into scannable lines; bold the few words that matter; no multi-paragraph
-  headers. If a paragraph is longer than ~3 lines, it's probably commit material, not report material.
+  headers. If a paragraph is longer than ~3 lines, it's probably manual/landmine material, not report material.
 - **Consistent shape every round** so a returning reader (or a post-compaction resume) finds things in the same place.
 
 ## Fix phase (only AFTER all passes of the round complete)
 - Work strictly by severity: **all P0 → P1 → P2 → P3**.
 - **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct
   device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext
-  in Firestore**) → flip its row to **Fixed** + **commit** (one per issue/cluster) → next. Don't start the next until
-  the current is verified.
+  in Firestore**) → flip its row to **Fixed** + capture the durable substance in the Engineering Manual landmine → next
+  (the **user** commits per issue/cluster — never run git yourself; see Guardrails). Don't start the next until the
+  current is verified.
 - **Real-path verification gate (do NOT mark Fixed without it):** verify the fix through the **same path the user hits**,
   not a synthetic shortcut. A crash/launch/notification fix is only "Fixed" once reproduced-then-cleared via the REAL
   channel (real push tapped from the shade on an `am kill`'d app; real launcher cold-start) — `am start`/`am force-stop`
@@ -1146,8 +1200,10 @@ Optimize every QA doc for a reader who has **5 seconds** to find the current sta
 `blocked→id`; a **round** is done when all **recurring** passes (A–N + P) are done; **flawless** = one full round with
 **zero open P0–P2 and Passes D + E + L + P fully clean** (no open P0/P1 in I/J), **every game fully played through,
 every notification type verified or explicitly `not implemented→Future.md`, chat (L) + the couple-shared premium gate
-(A) + settings-take-effect (M) + content/language (P: no typos/off-voice/non-inclusive copy, question bank on-guide)
-verified, all join-game navigation paths and all back-stack checks verified**, **and `qa/entrypoint_smoke.sh` GREEN on
+(A) + settings-take-effect (M) + **interactive features (N: daily-Q/reveal, outcomes, Bucket List, Date Builder work
+end-to-end — created data persists AND is read back, `scripts/wiring-scan.sh` 🔴=0)** + content/language (P: no typos/
+off-voice/non-inclusive copy, question bank on-guide) verified, all join-game navigation paths and all back-stack checks
+verified**, **and `qa/entrypoint_smoke.sh` GREEN on
 both emulators (0 FAIL — every entry-point cold-start opens and stays)**. Then stop (P3s optional). **Pass O (release
 build + store readiness) and Pass K's real-money path are pre-ship / real-device gates** — they don't block a per-round
 "flawless" but **must be GREEN before any store submission**. Don't re-open a clean pass within the same round.