docs(qa): update ClaudeQAPlan + ClaudeQACoverage with living-doc guardrails, wiring scanner, and device-matrix honesty

2026-06-28 11:30:29 -05:00 · 2026-06-28 11:30:29 -05:00 · 4ee154d8ed
parent 896691fee3
commit 4ee154d8ed
2 changed files with 78 additions and 22 deletions
--- a/ClaudeQACoverage.md
+++ b/ClaudeQACoverage.md
@ -3,7 +3,7 @@
 > **Resume anchor — current status only.** Statuses: `pass | fail→id | todo | n/a | not implemented→Future.md | blocked→id`.
 > Build HEAD `c31eea2` + **R15 working-tree changes** (functions + rules **deployed to prod**; client rebuilt+installed both emulators). Position + verdict: see `ClaudeReport.md` run-state. **Verdict: R15 = gap-closing round (Passes L/M/N/P + smoke) — found & FIXED M-001 (P2 quiet hours).** Quiet hours didn't suppress backgrounded/killed partner pushes (local-only); fixed via server-side fail-open suppression + client window/tz sync + rules allowlist — verified live. Then drove Pass N: **N-001 (P1) Bucket List fully non-functional** + **N-002 (P2) Date Builder "Create Plan" no-op** — both **FIXED + verified live** (Bucket CRUD; Date Builder → PLANNED `date_plan` → Home "Date coming up"). L (chat E2E render+decrypt+receipts+reactions+at-rest), P (UI copy + 6103-Q bank) clean; smoke 6/6 GREEN. **0 open P0–P2; 3 fixed pending 1 confirm (M-001, N-001, N-002); 2 P3 brand backlogs.** 0 FATAL. M-001 functions+rules deployed to prod; N-001/N-002 client-only (debug APK installed both emulators).
 >
-> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).**
+> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).** **Device/OS matrix = `blocked→needs-device` (pre-ship):** all per-round QA runs on two **identical** emulators (5554/5556, same API + screen) — minSdk/targetSdk · small/large screen · ≥1 physical device are NOT covered; don't claim "device matrix ✓".
 >
 > **📖 Architecture reference:** see [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — contains architecture, security model, data model, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) section that backs every fix-and-pruned ID below.
 > Hygiene: this is a *current-status* matrix, not a per-round changelog — `fail→id` flips to `pass` once a fix is
--- a/ClaudeQAPlan.md
+++ b/ClaudeQAPlan.md
@ -8,6 +8,36 @@
 > parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in
 > `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box).
 ## ⛔ This is a LIVING document — improve it whenever you see a gap (do this automatically)
 This playbook, the coverage matrix, and the `scripts/`/`qa/` scanners are **yours to evolve every round** — that is part
 of the job, not a separate task. Whenever a round teaches you something the plan doesn't yet capture, **edit it in the
 same chunk** (no need to ask):
 - A bug **escaped** a prior round, was hard to diagnose, or recurred → add the generalized reflex to the right Pass +
  the durable substance to the Engineering Manual landmine (the MANDATORY-retrospective rule), and if the class is
  greppable, **add/extend a scanner** (`scripts/theme-scan.sh`, `scripts/wiring-scan.sh`, `qa/entrypoint_smoke.sh`).
 - A step is **wrong, contradictory, or stale** (e.g. it told you to do something a standing rule forbids) → fix the
  wording so the next agent isn't misled.
 - A new **route / feature / notification / collection / gate / asset** appeared → fold it into the relevant Pass +
  `ClaudeQACoverage.md` (Living discovery ritual).
 - The plan is **unclear or bloated** → tighten it; lead with the answer; keep one canonical home per fact (don't restate
  a lesson in four places — link by ID).
 Leave the plan better than you found it each round. When you change a scanner, update its header; when you change a
 process rule, make sure it doesn't contradict the Guardrails.
 ## ✅ Per-round execution checklist (the literal flow — details in the sections below)
 1. **Resume:** read `ClaudeReport.md` run-state + `ClaudeQACoverage.md` (the authoritative state); `adb devices` shows
   both emulators; **installed build == HEAD** (rebuild+install if unsure — never QA a stale APK); baseline clean
   (both free, 0 active sessions, logcat 0 FATAL).
 2. **Discovery ritual:** reconcile routes/notifications/features/assets/backend with coverage; fold new surfaces in.
 3. **Run the scanners FIRST (cheap, before live driving):** `qa/entrypoint_smoke.sh` (both serials), `scripts/theme-scan.sh`
   (Pass C), `scripts/wiring-scan.sh` (Pass N). File 🔴/🟠 to `ClaudeReport.md`; record counts in coverage.
 4. **Run the passes report-only**, sub-batched to one context window each — recurring set **A–N + P** (K money-path +
   O release gates only when a sandbox device / pre-ship is in scope). Checkpoint the MD files after each chunk.
 5. **Fix phase** (after all passes): by severity P0→P1→P2→P3, one at a time, verify each live via the **real path** +
   re-run the relevant scanner, flip the row to Fixed, capture the durable substance in the Engineering Manual.
 6. **Re-QA loop** until **flawless** (see Definition of Done). Prune confirmed-fixed rows.
 7. **You never `git commit`/push — the user commits.** Your durable state is the MD files (they survive compaction).
 ## 📖 Architecture reference (read BEFORE testing the matching area)
 For each Pass below, before you start, read the relevant section of [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — it documents the architecture, the wire-format contracts, the security invariants, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (bugs that cost real debugging time and are easy to re-introduce).
@ -117,7 +147,7 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere.
  are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next
  chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action
  even with standing auth, or the macOS requirement for iOS).
- **Commit before anything interruptible** so a mid-chunk compaction never loses progress. Keep chunks atomic; if a
+- **Checkpoint (save the working-tree MD files + run-state) before anything interruptible** so a mid-chunk compaction never loses progress (the user commits — never run git yourself). Keep chunks atomic; if a
  chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck
  session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare.
 - **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite
@ -162,22 +192,32 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere.
  writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real
  services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but
  every seed/toggle/deploy hits the gate.)
- **Device/OS matrix:** don't certify on one emulator only — cover **minSdk + targetSdk**, a **small** and a **large**
+- **Device/OS matrix (pre-ship gate — currently NOT met; track it honestly):** per-round QA runs on our **two
-  screen, and at least one **physical device** (App Check / Play Integrity behave differently on emulators).
+  identical emulators (5554/5556, same API + screen size)** — that's the realistic recurring setup, not full coverage.
  Before any store push, certify across **minSdk + targetSdk**, a **small** and a **large** screen, and at least one
  **physical device** (App Check / Play Integrity behave differently on emulators). Because this is unmet today, keep a
  `blocked→needs-device` row for it in `ClaudeQACoverage.md` (alongside Pass K money-path + Pass O) so the gap stays
  visible rather than silently assumed-covered — don't claim "device matrix ✓" off two same-size emulators.
 - **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round
  re-checks it cheaply instead of by hand. **Built:** `qa/entrypoint_smoke.sh <serial> <recipient_uid>` (+ helper
  `qa/qa_push.js`) — the cold-start / entry-point launch-integrity smoke. It launches via the launcher AND sends a
  **real** push to a killed (`am kill`) app and **taps the actual OS notification** for each type, asserting the app
  **opens and STAYS** (process alive, 0 FATAL, off the launcher). This is the smoke that catches the "opens-and-closes"
-  splash-crash class that `am start` can't. Run it **every round and after any commit touching MainActivity / splash /
+  splash-crash class that `am start` can't. Run it **every round and after any change touching MainActivity / splash /
  theme / manifest / nav / notifications**. `FAIL` = an app crash (real bug); `BLOCK` = push not delivered (flaky
  emulator FCM — rerun, not a bug).
 - **Run associated automated scanners BEFORE the manual pass.** Every pass with a supporting script must start with it:
  - **Pass C:** run `scripts/theme-scan.sh` and review `/tmp/claude-theme-scan-<date>.md` before looking at any screen.
  - **Pass N (+ discovery ritual):** run `scripts/wiring-scan.sh` and review `/tmp/claude-wiring-scan-<date>.md` before
    driving the interactive features — it catches the **silent dead-feature class** (N-001 Bucket List, N-002 Date
    Builder): 🔴 a `setX()` ViewModel setter with **no caller**, 🟠 a repository read method with **no `ui/` caller**
    (data written but never displayed), 🟡 `if (x.isEmpty()) return` bail-guards to confirm the state is actually
    provided. Every 🔴 is a likely dead feature — prove the feature works by persisting real data and reading it back
    from Firestore (admin), not by trusting the empty-state render.
  - If a scanner does not yet exist for a pass but the pass is highly automatable (e.g. touch-target sizing for Pass J,
    `enc:v1:` leak grep for Pass L, redundant-read count for Pass I), consider building it and adding it here.
-  - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified in both themes; 🟠 MAJOR must be
+  - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified (both themes for C; live persist→read
-    reviewed for theme/art breakage; 🟡 REVIEW is checked during the visual sweep.
+    for N); 🟠 MAJOR must be reviewed for theme/art breakage or orphan data; 🟡 REVIEW is checked during the sweep.
  - If a manual finding is something the scanner should have caught, improve the scanner (see Living discovery ritual).
 - **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between
  rounds so they don't masquerade as bugs.
@ -305,7 +345,7 @@ State lives in **files**, not memory:
     path crashed → don't fix by reasoning; reproduce via the real channel + read the stack").
  This is how the plan self-improves between rounds — treat the human pointing out a missed bug as a signal the plan had
  a gap, and close the gap here, not just the bug.
- **Commit cadence**: commit `ClaudeReport.md` + `ClaudeQACoverage.md` after each pass and each chunk.
+- **Checkpoint cadence**: save `ClaudeReport.md` + `ClaudeQACoverage.md` + run-state after each pass and each chunk (the **user** commits — never run git yourself; see Guardrails).
 - **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
 - **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators
  online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue
@ -314,14 +354,15 @@ State lives in **files**, not memory:
 ## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration)
 A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the
-**largest coherent unit that reliably finishes AND commits within one context window, with margin**. End every chunk
+**largest coherent unit that reliably finishes AND checkpoints within one context window, with margin**. End every chunk
-with a commit + run-state update. If a chunk starts overflowing, split it; if chunks feel trivial, merge them.
+by saving the MD files + run-state (the user commits — never run git yourself). If a chunk starts overflowing, split it;
 if chunks feel trivial, merge them.
 **Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching
-prevents half-done/lost work and gives cleaner per-chunk verification + revertable commits.
+prevents half-done/lost work and gives cleaner per-chunk verification + revertable history.
 Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API
 verification, keep it to **one small route family, one game phase, or one notification type**. A chunk is too large if
-it cannot produce a precise coverage update, issue log, and commit before context gets tight. Split before starting
+it cannot produce a precise coverage update, issue log, and file-checkpoint before context gets tight. Split before starting
 rather than leaving a half-tested matrix behind. **Prefer Claude-friendly micro-batches**: smaller chunks let the agent
 fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows.
@ -348,6 +389,11 @@ Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI swee
 (dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus.
 ## Guardrails & efficiency
 - **⛔ NEVER `git commit` / `git push` — the USER does ALL commits.** This overrides every "commit" verb elsewhere in
  this doc: wherever a step says "commit," read it as **"checkpoint = save the working-tree files (`ClaudeReport.md` +
  `ClaudeQACoverage.md` + run-state, plus any code/docs)"** and leave the actual `git commit` to the user. Your durable
  state lives in those files (they survive compaction), not in a commit you make. Never stage, commit, push, branch, or
  amend.
 - **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
 - **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**.
 - **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
@ -989,6 +1035,10 @@ takes real effect. Read [Authentication and pairing flow](docs/Engineering_Refer
 - Every toggle survives **process death + reinstall-with-data** (overlaps F).
 ### Pass N — Daily question, reveal, check-ins & the other interactive features
 > **⛔ CLAUDE: Run `scripts/wiring-scan.sh` BEFORE driving these features** (review `/tmp/claude-wiring-scan-<date>.md`,
 > record counts in `ClaudeQACoverage.md`). Every 🔴 dead-setter / 🟠 orphan-reader is a likely silent dead feature —
 > this is the exact class that hid N-001 (Bucket List) + N-002 (Date Builder) behind innocent-looking empty states.
 The non-game interactive surfaces that have no functional home (Pass B is games only). Read
 [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle).
 - **Daily-question loop (the core daily ritual):** assignment (6 PM CST, `assignDailyQuestion`) → answer (each answer
@ -1086,9 +1136,11 @@ test, no-AI-writing, duplicate prevention, variety, fun/relationship-first/premi
 The report's job is to show, at a glance, **what's wrong right now** — not to accumulate a history of everything ever
 fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So:
 - **A Fixed row survives exactly ONE confirmation round, then it's removed.** When you fix an issue, mark its row
-  `Fixed` (with the commit) and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the
+  `Fixed` and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the row** — the durable
-  row** — the full root-cause/fix detail already lives in the **commit message** (the row cites the hash), so nothing is
+  root-cause/fix detail lives in the **Engineering Manual landmine** (mandatory for any escaped/deep bug) + git history
-  lost. Don't carry confirmed-fixed issues across multiple rounds.
+  (the row cites the landmine ID, and the commit hash once the user commits), so nothing is lost. Don't rely on "the
  commit message" as the only home — **you don't commit** (the user does, often batched), so the manual landmine is the
  reliable record. Don't carry confirmed-fixed issues across multiple rounds.
 - **One run-state header, always.** Keep only the **current** `Round N | Pass X | Chunk Y | NEXT ACTION` block pinned
  at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a **single one-line history**
  entry each (e.g. `R6: branding regression — 0 new`), or drop them entirely once their fixes are confirmed-and-pruned.
@ -1114,18 +1166,20 @@ fixed. Stale fixed rows and stacked old run-states make it unreadable and hide t
 Optimize every QA doc for a reader who has **5 seconds** to find the current state:
 - **Lead with the answer.** Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean")
  before any detail.
- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **commit**, not the row — the
+- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **Engineering Manual landmine**
-  row gets a one-sentence description + repro, then the commit hash.
+  (the durable home), not the row — the row gets a one-sentence description + repro + the landmine ID (and commit hash
  once the user commits).
 - **No walls of text.** Break run-state into scannable lines; bold the few words that matter; no multi-paragraph
-  headers. If a paragraph is longer than ~3 lines, it's probably commit material, not report material.
+  headers. If a paragraph is longer than ~3 lines, it's probably manual/landmine material, not report material.
 - **Consistent shape every round** so a returning reader (or a post-compaction resume) finds things in the same place.
 ## Fix phase (only AFTER all passes of the round complete)
 - Work strictly by severity: **all P0 → P1 → P2 → P3**.
 - **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct
  device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext
-  in Firestore**) → flip its row to **Fixed** + **commit** (one per issue/cluster) → next. Don't start the next until
+  in Firestore**) → flip its row to **Fixed** + capture the durable substance in the Engineering Manual landmine → next
-  the current is verified.
+  (the **user** commits per issue/cluster — never run git yourself; see Guardrails). Don't start the next until the
  current is verified.
 - **Real-path verification gate (do NOT mark Fixed without it):** verify the fix through the **same path the user hits**,
  not a synthetic shortcut. A crash/launch/notification fix is only "Fixed" once reproduced-then-cleared via the REAL
  channel (real push tapped from the shade on an `am kill`'d app; real launcher cold-start) — `am start`/`am force-stop`
@ -1146,8 +1200,10 @@ Optimize every QA doc for a reader who has **5 seconds** to find the current sta
 `blocked→id`; a **round** is done when all **recurring** passes (A–N + P) are done; **flawless** = one full round with
 **zero open P0–P2 and Passes D + E + L + P fully clean** (no open P0/P1 in I/J), **every game fully played through,
 every notification type verified or explicitly `not implemented→Future.md`, chat (L) + the couple-shared premium gate
-(A) + settings-take-effect (M) + content/language (P: no typos/off-voice/non-inclusive copy, question bank on-guide)
+(A) + settings-take-effect (M) + **interactive features (N: daily-Q/reveal, outcomes, Bucket List, Date Builder work
-verified, all join-game navigation paths and all back-stack checks verified**, **and `qa/entrypoint_smoke.sh` GREEN on
+end-to-end — created data persists AND is read back, `scripts/wiring-scan.sh` 🔴=0)** + content/language (P: no typos/
 off-voice/non-inclusive copy, question bank on-guide) verified, all join-game navigation paths and all back-stack checks
 verified**, **and `qa/entrypoint_smoke.sh` GREEN on
 both emulators (0 FAIL — every entry-point cold-start opens and stays)**. Then stop (P3s optional). **Pass O (release
 build + store readiness) and Pass K's real-money path are pre-ship / real-device gates** — they don't block a per-round
 "flawless" but **must be GREEN before any store submission**. Don't re-open a clean pass within the same round.