From 4ee154d8edf9f8b497eeeba1457dd38be901ec11 Mon Sep 17 00:00:00 2001 From: null Date: Sun, 28 Jun 2026 11:30:29 -0500 Subject: [PATCH] docs(qa): update ClaudeQAPlan + ClaudeQACoverage with living-doc guardrails, wiring scanner, and device-matrix honesty --- ClaudeQACoverage.md | 2 +- ClaudeQAPlan.md | 98 +++++++++++++++++++++++++++++++++++---------- 2 files changed, 78 insertions(+), 22 deletions(-) diff --git a/ClaudeQACoverage.md b/ClaudeQACoverage.md index ec644bee..fb998f8d 100644 --- a/ClaudeQACoverage.md +++ b/ClaudeQACoverage.md @@ -3,7 +3,7 @@ > **Resume anchor — current status only.** Statuses: `pass | fail→id | todo | n/a | not implemented→Future.md | blocked→id`. > Build HEAD `c31eea2` + **R15 working-tree changes** (functions + rules **deployed to prod**; client rebuilt+installed both emulators). Position + verdict: see `ClaudeReport.md` run-state. **Verdict: R15 = gap-closing round (Passes L/M/N/P + smoke) — found & FIXED M-001 (P2 quiet hours).** Quiet hours didn't suppress backgrounded/killed partner pushes (local-only); fixed via server-side fail-open suppression + client window/tz sync + rules allowlist — verified live. Then drove Pass N: **N-001 (P1) Bucket List fully non-functional** + **N-002 (P2) Date Builder "Create Plan" no-op** — both **FIXED + verified live** (Bucket CRUD; Date Builder → PLANNED `date_plan` → Home "Date coming up"). L (chat E2E render+decrypt+receipts+reactions+at-rest), P (UI copy + 6103-Q bank) clean; smoke 6/6 GREEN. **0 open P0–P2; 3 fixed pending 1 confirm (M-001, N-001, N-002); 2 P3 brand backlogs.** 0 FATAL. M-001 functions+rules deployed to prod; N-001/N-002 client-only (debug APK installed both emulators). > -> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).** +> **Scope expanded (plan review):** the playbook now has first-class passes **K–O** (billing money-path · messaging/chat E2E · functional settings · daily-Q/outcomes/interactive · release-build/store-readiness). These surface **coverage GAPS, not defects** — the recurring defect bar is clean, but **K (real purchase/restore/cancel path), L (full chat), M (settings take-effect), N (outcomes/Bucket-List/Date-Builder), O (minified release + App Check + store)** are `todo`/`partial`/`blocked→needs-device`. **Next-priority work = close these (start L + M on-emulator; K + O need a real device / pre-ship).** **Device/OS matrix = `blocked→needs-device` (pre-ship):** all per-round QA runs on two **identical** emulators (5554/5556, same API + screen) — minSdk/targetSdk · small/large screen · ≥1 physical device are NOT covered; don't claim "device matrix ✓". > > **📖 Architecture reference:** see [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — contains architecture, security model, data model, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) section that backs every fix-and-pruned ID below. > Hygiene: this is a *current-status* matrix, not a per-round changelog — `fail→id` flips to `pass` once a fix is diff --git a/ClaudeQAPlan.md b/ClaudeQAPlan.md index 47f92e8a..3d4f97d3 100644 --- a/ClaudeQAPlan.md +++ b/ClaudeQAPlan.md @@ -8,6 +8,36 @@ > parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in > `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box). +## ⛔ This is a LIVING document — improve it whenever you see a gap (do this automatically) +This playbook, the coverage matrix, and the `scripts/`/`qa/` scanners are **yours to evolve every round** — that is part +of the job, not a separate task. Whenever a round teaches you something the plan doesn't yet capture, **edit it in the +same chunk** (no need to ask): +- A bug **escaped** a prior round, was hard to diagnose, or recurred → add the generalized reflex to the right Pass + + the durable substance to the Engineering Manual landmine (the MANDATORY-retrospective rule), and if the class is + greppable, **add/extend a scanner** (`scripts/theme-scan.sh`, `scripts/wiring-scan.sh`, `qa/entrypoint_smoke.sh`). +- A step is **wrong, contradictory, or stale** (e.g. it told you to do something a standing rule forbids) → fix the + wording so the next agent isn't misled. +- A new **route / feature / notification / collection / gate / asset** appeared → fold it into the relevant Pass + + `ClaudeQACoverage.md` (Living discovery ritual). +- The plan is **unclear or bloated** → tighten it; lead with the answer; keep one canonical home per fact (don't restate + a lesson in four places — link by ID). +Leave the plan better than you found it each round. When you change a scanner, update its header; when you change a +process rule, make sure it doesn't contradict the Guardrails. + +## ✅ Per-round execution checklist (the literal flow — details in the sections below) +1. **Resume:** read `ClaudeReport.md` run-state + `ClaudeQACoverage.md` (the authoritative state); `adb devices` shows + both emulators; **installed build == HEAD** (rebuild+install if unsure — never QA a stale APK); baseline clean + (both free, 0 active sessions, logcat 0 FATAL). +2. **Discovery ritual:** reconcile routes/notifications/features/assets/backend with coverage; fold new surfaces in. +3. **Run the scanners FIRST (cheap, before live driving):** `qa/entrypoint_smoke.sh` (both serials), `scripts/theme-scan.sh` + (Pass C), `scripts/wiring-scan.sh` (Pass N). File 🔴/🟠 to `ClaudeReport.md`; record counts in coverage. +4. **Run the passes report-only**, sub-batched to one context window each — recurring set **A–N + P** (K money-path + + O release gates only when a sandbox device / pre-ship is in scope). Checkpoint the MD files after each chunk. +5. **Fix phase** (after all passes): by severity P0→P1→P2→P3, one at a time, verify each live via the **real path** + + re-run the relevant scanner, flip the row to Fixed, capture the durable substance in the Engineering Manual. +6. **Re-QA loop** until **flawless** (see Definition of Done). Prune confirmed-fixed rows. +7. **You never `git commit`/push — the user commits.** Your durable state is the MD files (they survive compaction). + ## 📖 Architecture reference (read BEFORE testing the matching area) For each Pass below, before you start, read the relevant section of [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — it documents the architecture, the wire-format contracts, the security invariants, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (bugs that cost real debugging time and are easy to re-introduce). @@ -117,7 +147,7 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere. are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action even with standing auth, or the macOS requirement for iOS). -- **Commit before anything interruptible** so a mid-chunk compaction never loses progress. Keep chunks atomic; if a +- **Checkpoint (save the working-tree MD files + run-state) before anything interruptible** so a mid-chunk compaction never loses progress (the user commits — never run git yourself). Keep chunks atomic; if a chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare. - **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite @@ -162,22 +192,32 @@ confirms + enumerates this; the fix phase applies couple-shared everywhere. writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but every seed/toggle/deploy hits the gate.) -- **Device/OS matrix:** don't certify on one emulator only — cover **minSdk + targetSdk**, a **small** and a **large** - screen, and at least one **physical device** (App Check / Play Integrity behave differently on emulators). +- **Device/OS matrix (pre-ship gate — currently NOT met; track it honestly):** per-round QA runs on our **two + identical emulators (5554/5556, same API + screen size)** — that's the realistic recurring setup, not full coverage. + Before any store push, certify across **minSdk + targetSdk**, a **small** and a **large** screen, and at least one + **physical device** (App Check / Play Integrity behave differently on emulators). Because this is unmet today, keep a + `blocked→needs-device` row for it in `ClaudeQACoverage.md` (alongside Pass K money-path + Pass O) so the gap stays + visible rather than silently assumed-covered — don't claim "device matrix ✓" off two same-size emulators. - **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round re-checks it cheaply instead of by hand. **Built:** `qa/entrypoint_smoke.sh ` (+ helper `qa/qa_push.js`) — the cold-start / entry-point launch-integrity smoke. It launches via the launcher AND sends a **real** push to a killed (`am kill`) app and **taps the actual OS notification** for each type, asserting the app **opens and STAYS** (process alive, 0 FATAL, off the launcher). This is the smoke that catches the "opens-and-closes" - splash-crash class that `am start` can't. Run it **every round and after any commit touching MainActivity / splash / + splash-crash class that `am start` can't. Run it **every round and after any change touching MainActivity / splash / theme / manifest / nav / notifications**. `FAIL` = an app crash (real bug); `BLOCK` = push not delivered (flaky emulator FCM — rerun, not a bug). - **Run associated automated scanners BEFORE the manual pass.** Every pass with a supporting script must start with it: - **Pass C:** run `scripts/theme-scan.sh` and review `/tmp/claude-theme-scan-.md` before looking at any screen. + - **Pass N (+ discovery ritual):** run `scripts/wiring-scan.sh` and review `/tmp/claude-wiring-scan-.md` before + driving the interactive features — it catches the **silent dead-feature class** (N-001 Bucket List, N-002 Date + Builder): 🔴 a `setX()` ViewModel setter with **no caller**, 🟠 a repository read method with **no `ui/` caller** + (data written but never displayed), 🟡 `if (x.isEmpty()) return` bail-guards to confirm the state is actually + provided. Every 🔴 is a likely dead feature — prove the feature works by persisting real data and reading it back + from Firestore (admin), not by trusting the empty-state render. - If a scanner does not yet exist for a pass but the pass is highly automatable (e.g. touch-target sizing for Pass J, `enc:v1:` leak grep for Pass L, redundant-read count for Pass I), consider building it and adding it here. - - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified in both themes; 🟠 MAJOR must be - reviewed for theme/art breakage; 🟡 REVIEW is checked during the visual sweep. + - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified (both themes for C; live persist→read + for N); 🟠 MAJOR must be reviewed for theme/art breakage or orphan data; 🟡 REVIEW is checked during the sweep. - If a manual finding is something the scanner should have caught, improve the scanner (see Living discovery ritual). - **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between rounds so they don't masquerade as bugs. @@ -305,7 +345,7 @@ State lives in **files**, not memory: path crashed → don't fix by reasoning; reproduce via the real channel + read the stack"). This is how the plan self-improves between rounds — treat the human pointing out a missed bug as a signal the plan had a gap, and close the gap here, not just the bug. -- **Commit cadence**: commit `ClaudeReport.md` + `ClaudeQACoverage.md` after each pass and each chunk. +- **Checkpoint cadence**: save `ClaudeReport.md` + `ClaudeQACoverage.md` + run-state after each pass and each chunk (the **user** commits — never run git yourself; see Guardrails). - **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each. - **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue @@ -314,14 +354,15 @@ State lives in **files**, not memory: ## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration) A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the -**largest coherent unit that reliably finishes AND commits within one context window, with margin**. End every chunk -with a commit + run-state update. If a chunk starts overflowing, split it; if chunks feel trivial, merge them. +**largest coherent unit that reliably finishes AND checkpoints within one context window, with margin**. End every chunk +by saving the MD files + run-state (the user commits — never run git yourself). If a chunk starts overflowing, split it; +if chunks feel trivial, merge them. **Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching -prevents half-done/lost work and gives cleaner per-chunk verification + revertable commits. +prevents half-done/lost work and gives cleaner per-chunk verification + revertable history. Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API verification, keep it to **one small route family, one game phase, or one notification type**. A chunk is too large if -it cannot produce a precise coverage update, issue log, and commit before context gets tight. Split before starting +it cannot produce a precise coverage update, issue log, and file-checkpoint before context gets tight. Split before starting rather than leaving a half-tested matrix behind. **Prefer Claude-friendly micro-batches**: smaller chunks let the agent fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows. @@ -348,6 +389,11 @@ Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI swee (dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus. ## Guardrails & efficiency +- **⛔ NEVER `git commit` / `git push` — the USER does ALL commits.** This overrides every "commit" verb elsewhere in + this doc: wherever a step says "commit," read it as **"checkpoint = save the working-tree files (`ClaudeReport.md` + + `ClaudeQACoverage.md` + run-state, plus any code/docs)"** and leave the actual `git commit` to the user. Your durable + state lives in those files (they survive compaction), not in a commit you make. Never stage, commit, push, branch, or + amend. - **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up. - **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**. - **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it). @@ -989,6 +1035,10 @@ takes real effect. Read [Authentication and pairing flow](docs/Engineering_Refer - Every toggle survives **process death + reinstall-with-data** (overlaps F). ### Pass N — Daily question, reveal, check-ins & the other interactive features +> **⛔ CLAUDE: Run `scripts/wiring-scan.sh` BEFORE driving these features** (review `/tmp/claude-wiring-scan-.md`, +> record counts in `ClaudeQACoverage.md`). Every 🔴 dead-setter / 🟠 orphan-reader is a likely silent dead feature — +> this is the exact class that hid N-001 (Bucket List) + N-002 (Date Builder) behind innocent-looking empty states. + The non-game interactive surfaces that have no functional home (Pass B is games only). Read [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle). - **Daily-question loop (the core daily ritual):** assignment (6 PM CST, `assignDailyQuestion`) → answer (each answer @@ -1086,9 +1136,11 @@ test, no-AI-writing, duplicate prevention, variety, fun/relationship-first/premi The report's job is to show, at a glance, **what's wrong right now** — not to accumulate a history of everything ever fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So: - **A Fixed row survives exactly ONE confirmation round, then it's removed.** When you fix an issue, mark its row - `Fixed` (with the commit) and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the - row** — the full root-cause/fix detail already lives in the **commit message** (the row cites the hash), so nothing is - lost. Don't carry confirmed-fixed issues across multiple rounds. + `Fixed` and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the row** — the durable + root-cause/fix detail lives in the **Engineering Manual landmine** (mandatory for any escaped/deep bug) + git history + (the row cites the landmine ID, and the commit hash once the user commits), so nothing is lost. Don't rely on "the + commit message" as the only home — **you don't commit** (the user does, often batched), so the manual landmine is the + reliable record. Don't carry confirmed-fixed issues across multiple rounds. - **One run-state header, always.** Keep only the **current** `Round N | Pass X | Chunk Y | NEXT ACTION` block pinned at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a **single one-line history** entry each (e.g. `R6: branding regression — 0 new`), or drop them entirely once their fixes are confirmed-and-pruned. @@ -1114,18 +1166,20 @@ fixed. Stale fixed rows and stacked old run-states make it unreadable and hide t Optimize every QA doc for a reader who has **5 seconds** to find the current state: - **Lead with the answer.** Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean") before any detail. -- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **commit**, not the row — the - row gets a one-sentence description + repro, then the commit hash. +- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **Engineering Manual landmine** + (the durable home), not the row — the row gets a one-sentence description + repro + the landmine ID (and commit hash + once the user commits). - **No walls of text.** Break run-state into scannable lines; bold the few words that matter; no multi-paragraph - headers. If a paragraph is longer than ~3 lines, it's probably commit material, not report material. + headers. If a paragraph is longer than ~3 lines, it's probably manual/landmine material, not report material. - **Consistent shape every round** so a returning reader (or a post-compaction resume) finds things in the same place. ## Fix phase (only AFTER all passes of the round complete) - Work strictly by severity: **all P0 → P1 → P2 → P3**. - **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext - in Firestore**) → flip its row to **Fixed** + **commit** (one per issue/cluster) → next. Don't start the next until - the current is verified. + in Firestore**) → flip its row to **Fixed** + capture the durable substance in the Engineering Manual landmine → next + (the **user** commits per issue/cluster — never run git yourself; see Guardrails). Don't start the next until the + current is verified. - **Real-path verification gate (do NOT mark Fixed without it):** verify the fix through the **same path the user hits**, not a synthetic shortcut. A crash/launch/notification fix is only "Fixed" once reproduced-then-cleared via the REAL channel (real push tapped from the shade on an `am kill`'d app; real launcher cold-start) — `am start`/`am force-stop` @@ -1146,8 +1200,10 @@ Optimize every QA doc for a reader who has **5 seconds** to find the current sta `blocked→id`; a **round** is done when all **recurring** passes (A–N + P) are done; **flawless** = one full round with **zero open P0–P2 and Passes D + E + L + P fully clean** (no open P0/P1 in I/J), **every game fully played through, every notification type verified or explicitly `not implemented→Future.md`, chat (L) + the couple-shared premium gate -(A) + settings-take-effect (M) + content/language (P: no typos/off-voice/non-inclusive copy, question bank on-guide) -verified, all join-game navigation paths and all back-stack checks verified**, **and `qa/entrypoint_smoke.sh` GREEN on +(A) + settings-take-effect (M) + **interactive features (N: daily-Q/reveal, outcomes, Bucket List, Date Builder work +end-to-end — created data persists AND is read back, `scripts/wiring-scan.sh` 🔴=0)** + content/language (P: no typos/ +off-voice/non-inclusive copy, question bank on-guide) verified, all join-game navigation paths and all back-stack checks +verified**, **and `qa/entrypoint_smoke.sh` GREEN on both emulators (0 FAIL — every entry-point cold-start opens and stays)**. Then stop (P3s optional). **Pass O (release build + store readiness) and Pass K's real-money path are pre-ship / real-device gates** — they don't block a per-round "flawless" but **must be GREEN before any store submission**. Don't re-open a clean pass within the same round.