# Claude QA Playbook — Full-App QA → Fix → Re-QA until flawless > Reusable QA plan for the Closer app. Run report-only first, fix everything, then re-QA until a clean round. > Progress/state is tracked in **ClaudeReport.md** (issues) + **ClaudeQACoverage.md** (coverage matrix), which are > the authoritative source of truth. See the Continuity section before resuming. > > **Program roadmap:** **Part 1** = Android QA (this doc) → **Part 2** = build the iOS app to Android's current > parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in > `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box). ## ⛔ This is a LIVING document — improve it whenever you see a gap (do this automatically) This playbook, the coverage matrix, and the `scripts/`/`qa/` scanners are **yours to evolve every round** — that is part of the job, not a separate task. Whenever a round teaches you something the plan doesn't yet capture, **edit it in the same chunk** (no need to ask): - A bug **escaped** a prior round, was hard to diagnose, or recurred → add the generalized reflex to the right Pass + the durable substance to the Engineering Manual landmine (the MANDATORY-retrospective rule), and if the class is greppable, **add/extend a scanner** (`scripts/theme-scan.sh`, `scripts/wiring-scan.sh`, `qa/entrypoint_smoke.sh`). - A step is **wrong, contradictory, or stale** (e.g. it told you to do something a standing rule forbids) → fix the wording so the next agent isn't misled. - A new **route / feature / notification / collection / gate / asset** appeared → fold it into the relevant Pass + `ClaudeQACoverage.md` (Living discovery ritual). - The plan is **unclear or bloated** → tighten it; lead with the answer; keep one canonical home per fact (don't restate a lesson in four places — link by ID). Leave the plan better than you found it each round. When you change a scanner, update its header; when you change a process rule, make sure it doesn't contradict the Guardrails. ## ✅ Per-round execution checklist (the literal flow — details in the sections below) 1. **Resume:** read `ClaudeReport.md` run-state + `ClaudeQACoverage.md` (the authoritative state); `adb devices` shows both emulators; **installed build == HEAD** (rebuild+install if unsure — never QA a stale APK); baseline clean (both free, 0 active sessions, logcat 0 FATAL). 2. **Discovery ritual:** reconcile routes/notifications/features/assets/backend with coverage; fold new surfaces in. 3. **Run the cheap gates FIRST (before live driving):** (a) the **automated test suites** — `./gradlew testDebugUnitTest` + `cd functions && npm test` (they cover the fragile logic: encryption format, rate limiter, quiet hours, streak, entitlement math) — **a red suite is a P0/P1 regression gate, stop and fix before QA'ing a build**; (b) the **scanners** — `qa/entrypoint_smoke.sh` (both serials), `scripts/theme-scan.sh` (Pass C), `scripts/wiring-scan.sh` (Pass N), `scripts/painter-xml-scan.sh` (crash guard — `painterResource()` on a non-`` XML drawable throws on render; caught O-ONBOARD-001 class — exit≠0 is a P0 gate); (c) the **instrumented render smoke** (when an emulator is attached) — `./gradlew :app:connectedDebugAndroidTest` runs `FirstRunRenderSmokeTest` (first-run composables paint in light+dark; the on-device net for the "composes fine, crashes on first paint" class — a red run is a P0 gate). **⛔ RUN THIS ON A THROWAWAY (5558), NEVER on the 5554/5556 fixtures — `connectedDebugAndroidTest` UNINSTALLS the app-under-test on completion, which WIPES its data (auth + couple keys + App-Check debug token), same effect as the forbidden `pm clear`; R25 wiped QA(5554) this way and had to surface a user-gated re-auth to recover it.** (d) optional **monkey fuzz** `adb shell monkey -p app.closer --throttle 300 --pct-touch 90 -v 5000` (any crash = bug). File 🔴/🟠 to `ClaudeReport.md`; record counts in coverage. 4. **Run the passes report-only**, sub-batched to one context window each — recurring set **A–N + P** (K money-path + O release gates only when a sandbox device / pre-ship is in scope). Checkpoint the MD files after each chunk. 5. **Fix phase** (after all passes): by severity P0→P1→P2→P3, one at a time, verify each live via the **real path** + re-run the relevant scanner, flip the row to Fixed, capture the durable substance in the Engineering Manual. 6. **Re-QA loop** until **flawless** (see Definition of Done). Prune confirmed-fixed rows. 7. **You never `git commit`/push — the user commits.** Your durable state is the MD files (they survive compaction). ## 📖 Architecture reference (read BEFORE testing the matching area) For each Pass below, before you start, read the relevant section of [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — it documents the architecture, the wire-format contracts, the security invariants, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (bugs that cost real debugging time and are easy to re-introduce). **This is bidirectional — the manual is a LIVING document, not a read-only reference.** Read it before; **write back to it after.** Whenever a round fixes a bug, changes a contract/flow/gate, or finds the manual stale or missing something, update the manual in the same chunk (see *Where every finding goes*, the *Docs update rule*, and the *MANDATORY retrospective* — all now route durable engineering truth here). Treat it as part of every fix, same as `ClaudeReport.md`/`ClaudeQACoverage.md`. | Pass | Manual section to read first | |---|---| | A — Couple-shared premium | [Premium-gated features and gate pattern](docs/Engineering_Reference_Manual.md#premium-gated-features-and-gate-pattern) · [Billing](docs/Engineering_Reference_Manual.md#billing) | | B — Games lifecycle | [Game session push semantics (idempotent flag-claim)](docs/Engineering_Reference_Manual.md#game-session-push-semantics-idempotent-flag-claim) · [Foreground game-alert banner](docs/Engineering_Reference_Manual.md#foreground-game-alert-banner-r10) · [F-RACE-001](docs/Engineering_Reference_Manual.md#f-race-001-duplicate-game-start-push-on-rapid-partner-update) | | C — Visual (light+dark) | [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle) · [C-NAV-001](docs/Engineering_Reference_Manual.md#c-nav-001-back-from-home-resurfaces-onboarding-auth) · [Back-stack gotchas](docs/Engineering_Reference_Manual.md#back-stack-gotchas-c-nav-002-c-nav-003) · [C-HOME-001](docs/Engineering_Reference_Manual.md#home-duplicate-pending-action-card-c-home-001) | | D — Security & encryption | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Firestore security rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) · [Encryption versions](docs/Engineering_Reference_Manual.md#encryption-versions) | | E — Notifications | [Notifications](docs/Engineering_Reference_Manual.md#notifications) · [Notification deep-link routing](docs/Engineering_Reference_Manual.md#notification-deep-link-routing) · [E-GAME-001](docs/Engineering_Reference_Manual.md#e-game-001-notification-deep-link-landed-in-stale-finished-game) · [E-GAME-002](docs/Engineering_Reference_Manual.md#e-game-002-game-start-push-easy-to-miss-when-app-is-foreground) | | F — Resilience | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Known limitation: single-device keys](docs/Engineering_Reference_Manual.md#known-limitation-single-device-keys) | | G — Account creation / fake-account | [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow) · [Rate limiting on accept](docs/Engineering_Reference_Manual.md#rate-limiting-on-accept) | | H — Branding & artwork | `ClaudeBrandingReview.md` (this repo) · `docs/brand/visual-identity.md` | | I — Performance | [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) · [Where to look first](docs/Engineering_Reference_Manual.md#where-to-look-first) | | J — Accessibility | [CloserTheme](docs/Engineering_Reference_Manual.md#ios-specific-notes) · [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) | | K — Billing & subscription lifecycle | [Billing](docs/Engineering_Reference_Manual.md#billing) · [Premium-gated features and gate pattern](docs/Engineering_Reference_Manual.md#premium-gated-features-and-gate-pattern) | | L — Messaging & chat (E2E) | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Notifications](docs/Engineering_Reference_Manual.md#notifications) | | M — Settings & account management | [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow) · [Notifications](docs/Engineering_Reference_Manual.md#notifications) | | N — Daily question & interactive features | [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle) | | O — Release build & store readiness | [Firestore security rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) · [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) | | P — Content, copy & language | `docs/brand/visual-identity.md` (Store voice) · `seed/questions/QUESTION_CONTENT_GUIDE.md` (v3) | **If you find a bug that LOOKS like it might be a re-introduction of a known landmine** (above table or [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes)), stop and verify the fix is still in place before filing a new ID — it may be a regression on a known issue, not a new bug. ## Where every finding goes (route it here — exactly one home each) | What you found | Where it goes | Form | |---|---|---| | **A bug** — broken / incorrect / crashing / insecure, premium bypass, wrong-or-missing notification, dead-end nav | **`ClaudeReport.md`** | Table row: stable ID (`A-001`, `E-003`…) + severity (P0–P3) + repro + status | | **An idea / improvement** — works but could be better, confusing copy, missing affordance, rough-but-not-broken flow, "it'd be great if…", feature idea | **`Future.md`** `## QA` | Short title + what prompted it + suggested improvement | | **New artwork to create** — illustrations, glyphs, image-gen prompts | **`ClaudeBrandingReview.md`** | House-style prompt + placement | | **What got tested + its status** (pass / fail / todo / deferred) | **`ClaudeQACoverage.md`** | Coverage cell (the resume anchor) | | **Automated scanner findings** | **`ClaudeReport.md`** (CRITICAL/MAJOR that break themes/functionality) **+** `ClaudeQACoverage.md` (execution counts + filing status) | ID + file:line + pattern + fix suggestion | | **Durable engineering knowledge** — a fixed bug's root cause + how it's easy to re-introduce, a new architecture fact / data path / wire-format contract / security invariant / gate pattern, or anything the manual is now stale/missing about | **[`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md)** (esp. [Known landmines and recent fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes)) | New landmine entry (ID + cause + the guard) and/or an updated architecture/gate/flow section | - A branding **defect** (mis-colored, clipped, off-brand, low-contrast art) is a **bug → `ClaudeReport.md`**, not a brand idea — only *new art to create* goes to `ClaudeBrandingReview.md`. - **WRONG LANGUAGE IS A BUG (not a Future.md idea).** A typo, grammar/punctuation error, off-brand or cold/salesy voice, non-inclusive/assumptive wording, leaked placeholder/dev/raw-error text, copy that doesn't match behavior, or a broken/duplicate/off-guide question → **`ClaudeReport.md`** (see **Pass P**). Only genuinely-working copy that *could be warmer/clearer* (a rewording for delight) goes to `Future.md`. "Confusing copy" that actually misleads the user is a bug. - **ONE canonical home per fact; everywhere else is a pointer (ID/anchor), never a paraphrase.** This is the rule that keeps the five docs from duplicating each other (and wasting tokens re-stating the same lesson). Route by *purpose*: the **defect** (repro/severity/status) → `ClaudeReport.md` (transient — prunes to an ID after one confirm); the **substance** (root cause / why it's fragile / how to not re-introduce it) → the **Engineering Reference Manual** (permanent, engineer-facing); the **reflex** (how to FIND the class next round) → this `ClaudeQAPlan.md` Pass (generalized, citing the ID); **coverage status** → `ClaudeQACoverage.md`; **cross-session ops not in the repo** (accounts, tooling, auth) → `memory/`. State a fact in its home once; elsewhere cite the ID. Don't restate a fix in four docs. - **The Engineering Reference Manual is a LIVING document — read it before a pass, write back to it after.** When a round teaches the codebase something durable (a fixed bug's re-introduction risk, a new/changed architecture fact, data path, contract, gate, flow, collection/Function/route, or the manual disagreeing with reality), update the manual in the **same chunk**. **A fix is not complete until its durable substance is in the manual** (see the MANDATORY-retrospective rule). The report row and the Pass reflex just reference the manual's landmine ID — they don't re-tell it. - Logging an idea in `Future.md` is **never** a substitute for filing a real defect: if it's broken, it gets an ID in `ClaudeReport.md` too. - Bug lifecycle: filed in `ClaudeReport.md` → fixed → kept **one** confirmation round → pruned to the archived-ID line (detail lives in git). `Future.md` ideas sit in the backlog until built. (See **Report hygiene** under Reporting.) ## Context Drive the real app on both emulators, verify each thing live, report, fix, re-verify. Core QA dimensions (cornerstones): 1. **Couple-shared premium** — if EITHER partner is premium, **all** premium features unlock for **both**. 2. **Games** — each starts, plays, **joins, resumes**, finishes, **and reopens results** correctly on both devices. 3. **Full visual pass, light + dark** — every screen, text readable, nothing clipped/invisible. 4. **Security & encryption (cornerstone)** — every private field is ciphertext at rest, rules hold against non-members, keys/recovery are sound. Findings here default to P0. 5. **Notifications** — the **full suite**: every type delivers to the right partner (foreground/background/killed), deep-links correctly, opens the right destination on **both clients**, covers all **game/join-game** flows, handles stale notifications, and leaks no private content. These five are the original cornerstones; the playbook has since grown to cover the rest of the app as first-class passes (see **QA passes** below): **K Billing & subscription lifecycle** (the real purchase/restore/cancel/expiry money path, not just the admin entitlement toggle), **L Messaging & chat** (E2E send/receive/react/media, both clients), **M Settings & account management** (every toggle persists + takes effect; biometric lock, quiet hours, unpair/delete), **N Daily-question/reveal/check-ins + Bucket List/Date Builder/Activity** (the interactive non-game features), and **O Release build & store readiness** (the **minified release** build, signing/AAB, App Check, i18n, deep/App-Links, Play Data-Safety — everything else runs on the debug APK), and **P Content, copy & language** (typos/grammar, brand voice, inclusive language, and the **question-bank** content — *wrong language is a bug, not a Future.md idea*). Plus the existing **F resilience**, **G account/abuse**, **H branding**, **I performance**, **J accessibility**. **Pass letters are stable IDs — never renumber** (issue IDs and coverage rows reference them; note D/E/G are not in strict alphabetical position for that reason). Scope decisions: **exhaustive** visual pass (all ~50 screens, both modes); **full scope incl. pre-pairing** flows (fresh throwaway account); **couple-shared everywhere** — per-user gates are bugs, fixed by routing through `core/billing/CouplePremiumChecker.kt`; **full notification suite** — every type, game + join-game pushes, deep-links, stale-notification handling, and all in-app paths into joining/resuming/results, verified on **both clients**. **Early known signal:** only chat uses `CouplePremiumChecker`; games/packs/dates/wheel gate on the user's own `EntitlementChecker.isPremium()` — so premium almost certainly does NOT unlock for the free partner there. Pass A confirms + enumerates this; the fix phase applies couple-shared everywhere. ## Execution mode — run to completion (autonomous; do NOT stop) - **Do not stop to check in or ask for approval.** Run all passes (A–P — recurring set A–N + P each round; K's real-money path and O's release/store gates run when a sandbox device / pre-ship is in scope) → the fix phase → re-QA rounds **continuously until a flawless round** (zero open P0–P2, Passes D + E clean, every game fully played through, all notification routes verified, navigation/back-stack verified). Don't hand control back early. - **Unblock yourself:** if anything **blocks progress** (a stale/blocking session, a crash, a build break, a missing prerequisite state, a broken nav path that prevents reaching a screen), **fix it immediately and continue** — even though passes are otherwise report-only. Blocking issues are fixed inline so the run can proceed; non-blocking findings are still logged and fixed in the fix phase. - **"Once executed, complete it":** never declare done before the Definition of Done is met — keep cycling fix → re-QA until flawless, then stop. - **Context limits ≠ stopping — do NOT hand back to the user when context fills.** The harness auto-summarizes a long conversation and continues in the next window; you continue **without the user**. (You cannot self-invoke `/compact` — and you don't need to; auto-compaction handles it.) The **committed `ClaudeReport.md` run-state + `ClaudeQACoverage.md` are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action even with standing auth, or the macOS requirement for iOS). - **Checkpoint (save the working-tree MD files + run-state) before anything interruptible** so a mid-chunk compaction never loses progress (the user commits — never run git yourself). Keep chunks atomic; if a chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare. - **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite deliberate design — the log captures it). Never halt the run to ask. - **Only true stop = a gated action you cannot perform.** Production deploys, admin Firestore writes/seeds, and entitlement toggles still need per-occurrence authorization (the classifier enforces this regardless of this doc). If one is genuinely required to proceed and is denied, do **all** other work first, then surface only that single blocker — don't halt the whole run for it. ## Methodology (every pass) - **EVIDENCE OVER ASSUMPTION — read the logs, never assume, always verify (the #1 rule).** Every conclusion — `pass`, `fail`, `fixed`, "it works", "the notification didn't open" — must be backed by **observed evidence**, never by what the UI *appears* to do or by reasoning about the code. Concretely: - **Read `logcat` on EVERY action, not only when something looks wrong.** `logcat -c` before a tap/flow, then after, scan for `FATAL EXCEPTION`/ANR/`PERMISSION_DENIED`/exceptions. **Absence of a visible symptom ≠ success** — a screen that "looks fine" can be masking a swallowed exception, a denied read, or a crash on another device. - **Verify with ground truth, not appearance:** confirm persisted state via **admin reads** (Firestore), confirm delivery via `notification_queue`/`dumpsys notification`, confirm routing via the landed screen + back stack, confirm encryption via the raw stored bytes. "Looked right" is not verified. - **Don't theorize a root cause — reproduce it and read the stack.** If behavior is "didn't work / closed / flashed", pull the crash log FIRST (this session's bug was misdiagnosed by reasoning until the live stack named the splash NPE). - **Don't trust a synthetic pass** (`am start`, admin write, direct call) for launch/notification/permission paths — verify through the **real** channel (see Reproduction fidelity). A green that didn't exercise the user's path is not green. - Devices: **5554 (QA)**, **5556 (Sam)**, paired; one **fresh throwaway account** for pre-pairing flows. - Drive via adb tap/swipe; resolve coords from `uiautomator dump` bounds; downscale screenshots to read; scan `logcat` for `FATAL EXCEPTION`/ANR on each screen. - Premium toggled via `scratchpad/set_premium.js` (admin, **user-authorized each time**). - Theme toggled via **Settings → Appearance (Light/Dark)** (`MainActivity` `ThemeMode`). - **REPORT-ONLY during passes — never fix mid-pass.** - **THINK AS A CONSUMER — approach everything from different angles.** Beyond "does it work", constantly ask *"is this what a real person would expect / want here? is this delightful, confusing, or annoying?"* Come at each flow from multiple angles (first-time user, returning user, the partner who didn't start it, someone tapping fast, someone reading carefully, the skeptic, the impatient). Vary inputs, depths, orders, and entry points (don't repeat one happy path). A thing can be bug-free yet still *worse than it should be* — notice that too. - **CAPTURE IMPROVEMENT / FEATURE IDEAS → `Future.md` (section `## QA`).** Bugs (broken/incorrect behavior) go to `ClaudeReport.md` as always. But anything that *works yet could be better* — confusing copy, a missing affordance, a rough-but-not-broken flow, a "it'd be great if…" feature idea — append it to **`Future.md` under `## QA`** with a short title, what prompted it, and the suggested improvement. This is an idea backlog, **not** the bug log; logging here is never a substitute for filing an actual defect in `ClaudeReport.md`. - **Environment (senior-QA rec):** prefer the **Firebase Local Emulator Suite or a dedicated staging project** over production — isolates test data, makes seeding / entitlement toggles / D3 negative tests **free** (no gated prod writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but every seed/toggle/deploy hits the gate.) - **Device/OS matrix (pre-ship gate — currently NOT met; track it honestly):** per-round QA runs on our **two identical emulators (5554/5556, same API + screen size)** — that's the realistic recurring setup, not full coverage. Before any store push, certify across **minSdk + targetSdk**, a **small** and a **large** screen, and at least one **physical device** (App Check / Play Integrity behave differently on emulators). Because this is unmet today, keep a `blocked→needs-device` row for it in `ClaudeQACoverage.md` (alongside Pass K money-path + Pass O) so the gap stays visible rather than silently assumed-covered — don't claim "device matrix ✓" off two same-size emulators. - **⛔ First-run / cold-path lane (fixture blind-spot — learned from O-ONBOARD-001, a P0 we shipped past).** The two recurring emulators (5554/5556) are **paired + signed-in + onboarding-complete** — a stable fixture that is great for the A–N passes but **structurally cannot reach the entire first-run surface**: onboarding (every slide + **Skip**), sign-up, login, the auth-screen logo, pairing/invite-accept, recovery-on-new-device, and day-1 empty states. A bug anywhere in that region is **invisible no matter how thorough the passes are** — O-ONBOARD-001 (every fresh install crashed on the last onboarding slide) sat undetected precisely because the fixtures skip it. **So: run a fresh-install lane on a THROWAWAY device** (e.g. `emulator-5558`, or a fresh AVD — **never `pm clear` 5554/5556**, it breaks the App Check debug token): install the build, walk onboarding all the way through (+ Skip) → sign-up → login → pairing → first daily Q, asserting 0 FATAL and each screen renders. **Trigger it every time you touch onboarding / auth / pairing / branding / launcher or any `res/drawable` asset, and always before a store push.** Keep a `first-run` row in `ClaudeQACoverage.md` so this blind-spot stays visible instead of assumed-covered. - **Render-level coverage gap (the other half of why O-ONBOARD-001 slipped).** Our cheap gates are all static or logic-level — unit/functions tests, `theme-scan`, `wiring-scan`, `painter-xml-scan` — **none of them actually render a composable.** `painterResource` on a `` compiled fine, passed all 205 unit tests, and only threw on first paint. There is a whole class of "composes fine, crashes on render" bugs (resource resolution, `LocalContext` casts, bad `painterResource`). **R20 added the first on-device net for it:** `app/src/androidTest/.../ui/FirstRunRenderSmokeTest.kt` renders the first-run crash composables (`CtaSlide` + `AuthLogoMark`, light+dark — the exact O-ONBOARD-001 sites) via a Compose `createComposeRule()` and asserts they paint; proven to FAIL on the reintroduced bug. Run it with `./gradlew :app:connectedDebugAndroidTest` (needs a connected emulator; filter a class with `-Pandroid.testInstrumentationRunnerArguments.class=…`). It currently covers **only the first-run leaf composables** — most routes still have no render test, so the **fresh-install lane above remains the net for the rest** until the smoke grows (see `Future.md` — extend toward sign-in→pair→daily-Q→game with a Hilt test runner, and/or a Roborazzi/Paparazzi screenshot suite). Treat both the fresh-install lane and (when an emulator is attached) `connectedDebugAndroidTest` as part of the render-crash net. - **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round re-checks it cheaply instead of by hand. **Built:** `qa/entrypoint_smoke.sh ` (+ helper `qa/qa_push.js`) — the cold-start / entry-point launch-integrity smoke. It launches via the launcher AND sends a **real** push to a killed (`am kill`) app and **taps the actual OS notification** for each type, asserting the app **opens and STAYS** (process alive, 0 FATAL, off the launcher). This is the smoke that catches the "opens-and-closes" splash-crash class that `am start` can't. Run it **every round and after any change touching MainActivity / splash / theme / manifest / nav / notifications**. `FAIL` = an app crash (real bug); `BLOCK` = push not delivered (flaky emulator FCM — rerun, not a bug). - **Run the project's OWN test suites every round (they are the cheapest, most deterministic regression net).** Before the scanners and live driving, run `./gradlew testDebugUnitTest` (19 unit tests — `FieldEncryptorTest`, `SealedAnswerEncryptorTest`, `NotificationRateLimiterTest`, `QuietHoursManagerTest`, `StreakCalculatorTest`, `ChallengeStateMachineTest`, `PartnerNotificationManagerTest`, `HomePriorityEngineTest`, `DateMatchRepositoryImplTest`, `CloserBrandCopyTest`, …) and `cd functions && npm test` (`entitlementLogic.test.ts`). **A failing test is a regression bug (P0/P1) — file it and do not QA a build with a red suite.** A fix that breaks a test isn't "Fixed" (see Fix phase). These guard the exact invariants this QA chases (ciphertext format, rate limiting, quiet-hours suppression, entitlement math), so a green run is a precondition, not a bonus. (**Instrumented coverage (R20):** the first on-device test now exists — `FirstRunRenderSmokeTest` (a Compose render smoke of the first-run screens); run it when an emulator is attached with `./gradlew :app:connectedDebugAndroidTest`. It's still **first-run-only** — broader UI/nav/DB-DataStore behavior remains uncovered, so the live passes + scanners are still the main UI-behavior net; grow the suite per `Future.md`.) - **Stress / monkey fuzz (cheap random-crash net the manual nav-fuzz misses):** once per build run `adb shell monkey -p app.closer --throttle 300 --pct-touch 90 -v 5000` on each emulator with `logcat` capturing — any `FATAL EXCEPTION`/ANR it triggers is a bug (file it with the monkey seed). This complements Pass C's *targeted* nav fuzzing with broad random input. - **Run associated automated scanners BEFORE the manual pass.** Every pass with a supporting script must start with it: - **Pass C:** run `scripts/theme-scan.sh` and review `/tmp/claude-theme-scan-.md` before looking at any screen. - **Pass N (+ discovery ritual):** run `scripts/wiring-scan.sh` and review `/tmp/claude-wiring-scan-.md` before driving the interactive features — it catches the **silent dead-feature class** (N-001 Bucket List, N-002 Date Builder): 🔴 a `setX()` ViewModel setter with **no caller**, 🟠 a repository read method with **no `ui/` caller** (data written but never displayed), 🟡 `if (x.isEmpty()) return` bail-guards to confirm the state is actually provided. Every 🔴 is a likely dead feature — prove the feature works by persisting real data and reading it back from Firestore (admin), not by trusting the empty-state render. - If a scanner does not yet exist for a pass but the pass is highly automatable (e.g. touch-target sizing for Pass J, `enc:v1:` leak grep for Pass L, redundant-read count for Pass I), consider building it and adding it here. - Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified (both themes for C; live persist→read for N); 🟠 MAJOR must be reviewed for theme/art breakage or orphan data; 🟡 REVIEW is checked during the sweep. - If a manual finding is something the scanner should have caught, improve the scanner (see Living discovery ritual). - **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between rounds so they don't masquerade as bugs. - **Evidence standard:** every filed bug must be reproducible from text alone: build/commit, device, account, theme, app/process state, screen/route, exact tap/input sequence, expected result, actual result, and whether logcat showed a crash/ANR/permission denial. Screenshots/videos are helpful but never the only evidence because session artifacts may not survive compaction. - **Flake policy:** if something fails once and then passes, do not dismiss it. Repeat from a clean state, vary timing (rapid tap / slow network / background-resume), inspect logs, and file it as intermittent if it cannot be made fully deterministic. Intermittent routing, notification, encryption, duplicate-write, or crash behavior is still a bug. - **Reproduction fidelity (how we catch DEEP bugs) — the test harness must exercise the SAME path as the user.** A synthetic shortcut (`am start` extras, admin writes, calling a function directly, `am force-stop`) can **pass while the real path crashes** — the splash-handover NPE only fires on a real notification cold-start, and `am force-stop` can't even receive FCM. So for launch / notification / permission / IPC / deep-link behavior, reproduce through the **real OS mechanism** (real push tapped from the shade, real launcher cold-start, real permission dialog). Record **which angle** proved it in `ClaudeQACoverage.md`; "synthetic/UI-shortcut only" is **not** a pass for these paths. - **Symptom→inspection reflexes (apply before theorizing a root cause):** (1) "opens-and-closes / flashes / silently fails" ⇒ it's a **crash until the stack says otherwise** — `logcat -c` then capture `FATAL EXCEPTION` from the live repro **before** proposing a cause (don't fix by reasoning, like the routing red-herring on this very bug). (2) **Many features break at once ⇒ inspect the SHARED code path** (launch/`onCreate`/splash/auth/key-load), not each feature. (3) "worked before, broken now" ⇒ **diff & history-check before you fix**: `git blame`/`git log -L`/`git diff` the failing line to the introducing commit (**incl. other agents' commits — Codex/kimi/Ripley co-edit this repo**), and search the Engineering Manual landmines + the report's archived-ID line for a prior fix of the same symptom — a match means **regression, not a new bug** (full procedure: the Fix-phase **Regression triage** step). (4) Treat cosmetic/branding/theme/manifest/splash commits as **capable of deep crashes** — re-run the cold-start + notification smoke after them. ## Living discovery ritual (before each round, and whenever reality disagrees with the docs) The app is allowed to grow; the QA plan must keep up. Before a pass or chunk, quickly inventory the current code/app surface and reconcile it with `ClaudeQACoverage.md`: - **Routes/screens:** inspect `core/navigation/AppRoute.kt`, navigation graph call sites, Settings sub-pages, dialogs, bottom tabs, deep links, and any new composables reachable by buttons/cards. - **Notifications:** inspect notification type enums/classes, Cloud Function triggers, Android intent/deep-link handling, notification channels/actions, FCM token registration, and Android runtime notification permission paths. - **Features/gates:** grep for premium checks, permission requests, media pickers, billing/paywall entry points, destructive actions, account/couple lifecycle actions, and admin/server-only writes. - **Assets/content:** inventory new drawables, `drawable-night*` variants, pack art, empty states, strings, feature flags, remote config, and any debug-only screens that should not ship. - **Backend/rules:** inspect Firestore rules, indexes/queries, Functions triggers/callables, Storage paths, scheduled jobs, and migrations for new data shapes or access paths. - **⛔ Reverted-then-reinstated code (this is exactly how Date Memories/Reflection slipped R25):** diff the working tree against the coverage matrix — `git status --short` staged **additions** (`A `) and the recent `git log` for `Revert`/`re-add` churn. A feature that was reverted and later re-added is **new to QA even if the commits look old**; re-enter it into the relevant passes. Cross-check that every `AppRoute`/notification type/Function trigger present in code has a coverage row. - **Docs update rule:** if the inventory finds a page, feature, notification, asset, state, backend path, or edge case missing from the playbook/coverage, update `ClaudeQAPlan.md` and `ClaudeQACoverage.md` before marking the chunk done. - **Scanner update rule:** if a manual finding is a pattern an existing scanner *should* have caught (e.g. a hardcoded surface color the theme scanner missed, a route the smoke should have exercised), improve that script and document the change in its header. If no scanner exists for a repeated failure mode, consider writing one and adding it to **Methodology**. If it is product polish, also add it to `Future.md`; if it needs new artwork, add it to `ClaudeBrandingReview.md`. **And if the discovery is a durable engineering fact (new route/collection/Function/flag/contract, a changed wire format, a renamed file, a gate/flow that the manual describes wrongly or omits), update [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) in the same chunk** — the discovery ritual is exactly when the manual drifts out of date, so reconcile it then, not "later". ## Multi-angle attack mandate (go DEEPER than "does the happy path work") A capability can pass via the UI yet fail when hit directly. Probe each meaningful capability (read/write a private field, gate a premium feature, deliver/route a notification, start/finish a game, pair/unpair, create an account) from as many **independent angles** as apply — not just the in-app happy path: - **Real UI** (play-as-user) — the baseline angle. - **Crafted intent / deep-link** — fire the exact intent a notification/link carries (bypasses UI nav) to test routing in isolation; also send **malformed/missing extras** → must route gracefully or no-op, never crash. - **Raw API against the DEPLOYED backend** — hit Firestore/Storage/Functions REST **directly** with a real token, as a **member AND a non-member**, to exercise rules + App Check from OUTSIDE the app. A non-member (or no-App-Check) request must be **DENIED** — App Check `403` or rules `PERMISSION_DENIED`. The member request characterizes which layer enforces. **Any unauthorized `200` returning couple data = P0.** - **Admin inspection (ground truth)** — read the RAW stored docs/objects (admin bypasses rules) to assert what is actually persisted: ciphertext only, no plaintext, no raw keys/invite-seeds, no private content in pushes. - **Concurrency / race** — two partners (or two rapid taps) hit the same thing at once. - **Killed / cold state** — kill with **`am kill `**, NOT `am force-stop`: a force-stopped app is in Android's *stopped* state and is **excluded from FCM broadcasts** (`GCM broadcast …result=CANCELLED`), so the push never arrives and you get a false "no notification". Then deliver a **real** push and **tap the actual OS notification** (one at a time — clear the shade first; tapping a *grouped summary* launches with no extras and falsely lands on Home). `am start … --es type …` is **not** equivalent to a real notification tap (different launch path — see the crash-triage note in Pass E). Also cold-start straight onto a deep link. - **Malformed / abusive input** — oversized, empty, rapid-fire, injection-ish, forged FCM payloads, replayed/expired tokens & invite codes. - **Offline / flaky** — drop network mid-action → graceful failure, recover on reconnect. Record **which angles** were tried per area in `ClaudeQACoverage.md`. For security- or data-sensitive capabilities, "UI happy path only" is **not** a `pass`. **D3/Pass G negative access MUST be executed live via the raw-API angle each round — never deferred to "only 2 emulators."** (Mint a token for a non-member UID via admin → exchange for an ID token via the Identity Toolkit REST `signInWithCustomToken` → use it as Bearer against the Firestore REST API.) ## Continuity & resumability (this effort WILL span many context windows — don't lose state) State lives in **files**, not memory: - **`ClaudeReport.md`** = the issue log (committed). Each issue row is **self-contained in text** (repro + expected + actual) — screenshots are session-only and won't survive a compaction; never rely on a screenshot path alone. - **`ClaudeQACoverage.md`** = the coverage matrix: every screen×mode, feature×premium-state, game×lifecycle, notification×{foreground,background,killed}, each `todo | pass | fail→id | not implemented→Future.md | blocked→id`. The resume anchor. - **`Future.md`** (`## QA`) = the non-bug improvement/idea backlog; **`ClaudeBrandingReview.md`** = the branding/artwork review + image-prompt backlog. Both committed alongside the report/coverage. - **Persistent memory** (`memory/`): QA methodology + exact commands; emulator↔account↔coupleId mapping; `scratchpad/set_premium.js` + admin tooling; the couple-shared-premium-everywhere goal + the per-user-gate gap. - **Run-state header** pinned at the TOP of `ClaudeReport.md`, always current: `Round N | Pass X | Chunk Y | NEXT ACTION: …` — first thing to read, last thing to update before stopping. - **Stable issue IDs**: `A-001 / B-002 / C-… / D-… / E-…` (pass-letter + number); coverage references the ID for every `fail`. Never renumber or reuse. - **Source of truth**: the two MD files are authoritative; the TodoWrite list is scratch for the current chunk only. Update the MD files + run-state header *before* ending a session. - **Living playbook rule:** when QA discovers any new app surface or recurring lesson — a new page/route, feature, setting, game state, notification type/action/channel, entry point, background/killed-state behavior, asset/art placement, repeatable bug class, missed edge case, fragile route, confusing state, image/layout failure mode, security angle, or anything else that should be checked every future round — update **this `ClaudeQAPlan.md`** in the relevant pass before ending the chunk. Also add the matching row/cell to `ClaudeQACoverage.md` if it needs recurring verification. **And update [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) when the discovery is durable engineering truth** (a new architecture fact, data path, contract, gate, flow, or a fixed bug's re-introduction risk) — the QA plan captures *what to re-test*, the manual captures *what the system is and why it's fragile*; both are living and both get updated. Do this even after the immediate bug is filed/fixed so the lesson or newly discovered surface is not lost to memory or git history. - **Learn from every ESCAPED or DEEP bug — MANDATORY retrospective (do this automatically, not only when asked).** Any bug that (a) **escaped a prior round**, (b) needed **non-obvious diagnosis** (a crash, an "opens-and-closes", a "didn't work", an intermittent, a wrong-root-cause first guess), or (c) **recurred** triggers a short retrospective the moment it's fixed — the fix is **not complete** until all four are done: 1. **Add the guard that would have caught it** — a new `qa/` smoke check, a coverage row, or a concrete pass step (e.g. the cold-start bug → `qa/entrypoint_smoke.sh`). If an existing smoke missed it, extend the smoke. 2. **Capture the lesson in its ONE canonical home, then link by ID elsewhere — never paraphrase it twice.** Split by purpose: the **reflex** (how to *find* this class next round) goes in the relevant Pass of **this doc**, written *generalized* and citing the bug ID as an example (do NOT re-narrate the bug here); the **substance** (root cause + where it lives now + re-introduction risk + the guard) goes in [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) → [Known landmines and recent fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (and update the matching architecture/gate/flow section if the fix changed it). The manual is the next engineer's first read; a landmine that isn't in it will be re-introduced. **Do NOT copy the fix into `memory/`** — per the memory rules, memory holds only cross-session facts NOT in the repo (emulator↔account map, admin tooling/commands, standing auth, never-commit); past fixes belong to the manual, so memory just points to the landmine ID if needed. 3. **Name the missing state/angle/entry-point** that let it hide and add it to the multi-angle / state matrices so it's exercised every round (e.g. "real notification tap on an `am kill`'d app", not just `am start`). 4. **Note any wrong turn in diagnosis** so the misstep isn't repeated (e.g. "synthetic test passed while the real path crashed → don't fix by reasoning; reproduce via the real channel + read the stack"). This is how the plan self-improves between rounds — treat the human pointing out a missed bug as a signal the plan had a gap, and close the gap here, not just the bug. - **Checkpoint cadence**: save `ClaudeReport.md` + `ClaudeQACoverage.md` + run-state after each pass and each chunk (the **user** commits — never run git yourself; see Guardrails). - **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each. - **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue at the first `todo` / unverified-fix; (5) if a prior chunk left an active/stuck game session, recover it via in-app "End their game" (log if needed), then redo that chunk. ## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration) A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the **largest coherent unit that reliably finishes AND checkpoints within one context window, with margin**. End every chunk by saving the MD files + run-state (the user commits — never run git yourself). If a chunk starts overflowing, split it; if chunks feel trivial, merge them. **Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching prevents half-done/lost work and gives cleaner per-chunk verification + revertable history. Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API verification, keep it to **one small route family, one game phase, or one notification type**. A chunk is too large if it cannot produce a precise coverage update, issue log, and file-checkpoint before context gets tight. Split before starting rather than leaving a half-tested matrix behind. **Prefer Claude-friendly micro-batches**: smaller chunks let the agent fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows. | Pass | Chunk granularity | ~chunks | |---|---|---| | A Premium | one gated-feature family per chunk if live toggles are needed; otherwise free-state sweep → couple-shared verify | 2–4 | | B Games | **one game per chunk max**; split complex games into lifecycle/playthrough chunk + join/resume/results/notification-entry chunk | 7–14 | | C Visual | **one small route family per chunk** (both themes, ~2–3 screens/states, screenshots reviewed + nav/back + image-fit + all CTAs for that family) — never "all screens" or a broad tab at once | 16–25 | | D Security | one security assertion group per chunk: D1 at-rest · D2 rules static · D3 live negative raw API · D4 keys/recovery · D5/D6 leaks · D7 migration | ~6 | | E Notifications | **one notification type per chunk** with the full contract below; split a type into direction/state subchunks if needed, but do not mark the type pass until both clients + source screens + fg/bg/killed + stale/malformed + payload/back-stack are covered | 16–30 | | F Resilience | **one dimension per chunk** (concurrency · lifecycle/process-death · network · time · account-lifecycle) | ~5 | | G Account creation | **one creation/abuse dimension per chunk** (happy/validation · duplicate/conflict · fake-account abuse · lifecycle) | ~4 | | H Branding | **one small route family per chunk** (~2–3 screens/states) consumer brand walk + ready-to-paste art prompts + existing-image integration verdict | 8–14 | | I Performance | **one route-group per chunk** — gfxinfo/jank + read-count instrumentation (build the route smoke checklist) | ~3 | | J Accessibility | **one a11y setting per chunk** (font scale · TalkBack · contrast · targets · keyboard · reduce-motion) | ~5 | | K Billing | **one money-path per chunk** (purchase · restore · plan-switch · cancel→expiry-relock · refund · webhook auth) — needs a real device/sandbox | ~6 | | L Messaging | **one chat dimension per chunk** (send-types both dirs · reactions/receipts/typing · failed-send/offline · media perms · inbox/entry-points · delete/moderation) | ~6 | | M Settings | **one settings group per chunk** (appearance · notif toggles · quiet hours · biometric lock · edit profile · unpair/delete · security/recovery) | ~6 | | N Interactive features | **one feature per chunk** (daily-question loop · outcomes/check-ins · Bucket List · Date Builder · Activity feed) | ~5 | | O Release/store | **one gate per chunk** (minified release smoke · signing/AAB · App Check (staging) · deep/App-Links · permissions/manifest · i18n · Data-Safety/store) — pre-ship, not per-round | ~6 | | P Content/language | **one surface per chunk** (UI microcopy of a route family · voice/tone sweep · inclusive-language sweep · question-bank by category/depth · legal/store copy) | ~5 | Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI sweeps; **montage** screenshots (dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus. ## Guardrails & efficiency - **⛔ NEVER `git commit` / `git push` — the USER does ALL commits.** This overrides every "commit" verb elsewhere in this doc: wherever a step says "commit," read it as **"checkpoint = save the working-tree files (`ClaudeReport.md` + `ClaudeQACoverage.md` + run-state, plus any code/docs)"** and leave the actual `git commit` to the user. Your durable state lives in those files (they survive compaction), not in a commit you make. Never stage, commit, push, branch, or amend. - **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up. - **Never run `connectedDebugAndroidTest` (or any instrumented test) on the 5554/5556 fixtures** — AGP UNINSTALLS the app-under-test after the run, wiping its data (auth + couple keys + App-Check token) exactly like `pm clear`. Run the render smoke on a **throwaway (5558)**. (R25 wiped QA/5554 this way → needed a user-gated re-auth to recover.) - **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**. - **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it). - **Pass C parallelism:** set **5554 = Dark, 5556 = Light** to capture both themes at once. - Never log decrypted message/answer content. ## Severity scale (label every issue) - **P0 Critical** — crash/ANR, data loss, encryption/security leak, feature fully broken, premium bypass. - **P1 Major** — feature partly broken, premium not unlocking for partner, wrong/missing notification, dead-end nav. - **P2 Minor** — readability/contrast, clipping/overflow/truncation, theme not adapting, inconsistent styling, wrong/double-back navigation. - **P3 Polish** — spacing/alignment/copy nits. ## QA passes (Round 1 = baseline) ### Pass A — Couple-shared premium (target: either partner premium → both unlock) Test each gated feature in 3 states: **neither** premium → locked + paywall; **partner-only** premium → BOTH unlock; **self** premium → unlock. Toggle Sam premium, confirm QA (free) unlocks; toggle off. Features: Play-hub games (Desire Sync + any premium-badged), Connection Challenges, Memory Lane; Question Packs; Spin the Wheel / Category Picker / Wheel History (+ any premium wheel categories); Date Match / Plan Date / Date Builder; chat media + reactions + any premium chat tools (regression — already couple-shared); Subscription/Settings reflects entitlement. Gated files (for the fix): `ui/play/PlayHubViewModel`, `ui/desiresync/DesireSyncScreen`, `ui/wheel/{CategoryPicker,SpinWheel,WheelHistory}*`, `ui/questions/QuestionPackLibrary*`, `ui/dates/{DateMatch,DateMatches}Screen`, `ui/memorylane/MemoryLaneScreen`, `ui/challenges/ConnectionChallengesScreen`. Also: **any VM/screen calling `EntitlementChecker.isPremium()` directly** (grep for it) is a candidate gate. - **ENFORCEMENT, not just a checker-usage grep (mandatory — RETROSPECTIVE from A-201, R12).** A feature can carry an `isPremium` **content flag** + a cosmetic `PremiumBadge` with **NO gate at all** — that's exactly how Date Match shipped a premium **bypass** (free users could view/like/match ★Premium date ideas; `getDateIdeas()` returned `DateIdeaSeed.all`, no `CouplePremiumChecker`, badge only). Prior rounds missed it because the audit grepped for `CouplePremiumChecker` *usages* and found the gated features, never noticing the feature that had **no** checker. So every round: (1) **grep for `isPremium` / `PremiumBadge` / premium content flags** (`DateIdea.isPremium`, `category.access=="premium"`, `challenge.isPremium`, …) and for **each** confirm a real enforcement path exists — a `CouplePremiumChecker` filter OR a paywall-on-interaction — **not just a badge**; (2) **actually TRY TO USE the premium content as a free user** (like/open/play it), don't just confirm the lock renders — "badge shows" ≠ "gated". A badge with no enforcement = **premium bypass** (P1+). Inspection lesson: *"shows a Premium badge" is a display fact, not a gate; prove the gate by using the content while free.* ### Pass B — Games lifecycle (MANDATORY: play each game ONE complete time through ALL different play stayles of the game) Games: This or That, How Well Do You Know Me, Desire Sync, Connection Challenges, Memory Lane, Spin the Wheel, + Date Match. - **PLAY AS THE USER (mandatory mindset for this pass):** drive every game **the way a real user would** — reach it through the actual in-app navigation a person would tap (Play hub → the game's card → its buttons), **not** via deep-links, admin pokes, forced state, or any shortcut a user doesn't have. **Expect what the user expects:** if a tap/button/flow doesn't do the obvious thing, or a screen doesn't behave the way a normal user would assume, **that itself is a finding** — log it. - **When something doesn't work: REPORT FIRST, then a minimal workaround (in that order).** Do **not** silently engineer around breakage by taking extra steps the user wouldn't take. The moment the natural user path fails: (1) **log the issue** in `ClaudeReport.md` with severity + the exact user action that failed and what was expected; (2) **only then** apply the smallest workaround needed to keep the pass moving. The workaround **never replaces** the report — a flow that needs a workaround to proceed is, by definition, broken and must be filed to fix. If a workaround is impossible, mark the game `fail→` (blocked) and continue with the next. - **A launch/crash check is NOT sufficient. Each game MUST be played one full way through, end-to-end, on BOTH devices** — start → answer/interact through **every** step/round/question on each device → reach the **finish/reveal/results** screen → confirm the result renders correctly for both partners. Verify each intermediate screen and interaction works (selections register, progress advances, both-answered gating, reveal/scoring/summary correct). Premium games (Desire Sync, Memory Lane) need a premium toggle to play. - The session lifecycle is exercised by the real playthrough: `status` active→completed; reveal/results correct on both. - **GAME JOIN PATHS (mandatory — the second partner must JOIN, not just co-play):** the starter begins from real in-app nav; the joiner then enters from **every** user-facing entry point — notification tap, Play-hub active state, Home active-game card, Today prompt, waiting-room/resume screen, in-app foreground banner, game history/replay, and (after the natural paths) deep-link/crafted intent + cold-start from a push. A game isn't complete unless **both** partners can **start, join, resume, finish, reopen results, and recover from a stale/ended session** — with no duplicate sessions, wrong routes, stuck waiting screens, broken back nav, or premium-gate mistakes. - **FIRST-FINISHER → WAITING-PARTNER NOTIFICATION (mandatory state — async games):** explicitly exercise the asymmetric state where **one partner finishes their part and the OTHER is idle/away**. The waiting partner MUST get a "your turn to play" nudge (`partner_completed_part` via `onGamePartFinished`) the moment the first finishes — async games (this_or_that / wheel / how_well / desire_sync) only flip to `completed` (→ `partner_finished_game`) once BOTH answer, so without the first-finish nudge the waiting partner is told nothing. Verify the **idle partner** (on Home, or backgrounded/killed) actually receives + can tap into the game. (This state was missed for a long time precisely because QA always played both sides through; "one finishes, the other never played" is its own required angle.) - **VARY THE STYLE OF PLAY (don't just repeat the happy path):** across runs, deliberately exercise *different* ways a real couple would play each game, because different inputs hit different code paths: - **Different DEPTHS and QUESTION COUNTS — cover the matrix, don't settle for one combo:** play each game across **every depth/mood** (Light, Everyday, Deep, All-topics/shuffle) AND **every round length / number of questions** (5 / 10 / 15), in *different pairings* across runs (e.g. Light×5, Deep×15, Everyday×10, All×5) — short *and* long sessions, shallow *and* deep content. Different depths surface different question sets, tones, and edge content (e.g. Deep/Desire-Sync sensitive prompts); different counts stress pacing, progress, and the both-answered gate. Also exercise **each distinct answer type** (A/B, Yes/No, True/False, 1–5 scale, multi-select, free-text). - **Different answer *patterns* that change the result** — all-match vs all-mismatch vs partial; both-yes vs both-no vs split (so reveals show "shared", "all private", "0 matches", "perfect/zero score" — verify each renders right). - **Different turn orders / who-starts** — partner A starts vs partner B starts; the guesser opens before vs after the subject finishes; both open simultaneously (race); one device much slower than the other. - **Different exit/resume styles** — finish normally; quit mid-game; background mid-game then resume; cold-kill mid-game then reopen; "End their game"; re-open a completed session for the replay/results; play two games back-to-back, and a *different* game type immediately after. - **⛔ VERIFY QUIT/ABANDON ACTUALLY ENDS THE SESSION (server-side, by admin read — RETROSPECTIVE from B-ABANDON-001).** "Quit" / "End their game" navigating away is **not** proof the session ended — the abandon write is best-effort and **swallowed** (`runCatching{…}.onFailure{ Log.d }`), so a `PERMISSION_DENIED` looks like success in the UI. After any quit/abandon, confirm the session is actually `completed` via an admin read (**0 active sessions**) AND that a *new/ different* game can then be started; watch `logcat` for `PERMISSION_DENIED` on the `sessions/{id}` doc during the quit. A session that "won't clear" between rounds is a **bug to root-cause, not a test-data nuisance** — B-ABANDON-001 (the full-`saveSession` `doc.set()` dropping server-only flags → rule rejects the removed `affectedKeys`) hid for several rounds precisely because it was dismissed as cleanup difficulty. See the manual's [B-ABANDON-001 landmine](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes). - **Edge inputs** — submit with nothing selected (should be blocked), rapid double-taps on answer/confirm/next, spamming the start button, tapping during the reveal animation, switching tabs mid-game, receiving/tapping a notification mid-game. None should crash, duplicate, or desync. - Edges: re-open a completed session, leave mid-game (resume), no stuck session, no crash, logcat clean. - Game start/finish pushes (`onGameSessionUpdate`) exercised here; full delivery/deep-link audit in **Pass E**. - **Media permissions** (CAMERA, RECORD_AUDIO): granted works, denied degrades gracefully. - **Done = every game has one verified complete playthrough** (a launch-only "opens, no crash" row is `partial`, not `pass`). Coverage row format: `game × starter × join-entry × premium-state × depth/count × lifecycle-edge × result`; only `pass` when start/join/play/finish/reopen/recover are all verified. ### ⛔ Pass C — Visual pass, light + dark, ALL screens (MANDATORY: run scan BEFORE sweep) > **⛔ CLAUDE: Run the automated theme scan (below, Automated Tier 1) before starting the visual sweep. > Read the output at `/tmp/claude-theme-scan-.md` and file findings to ClaudeReport.md first. > The sweep must verify every flagged screen in BOTH themes.** Every route in `core/navigation/AppRoute.kt` (~50), in **both** modes: text contrast/readability (no invisible/ low-contrast), no clipping/overflow/ellipsis breakage, icons visible, backgrounds adapt, controls legible. Groups: auth/onboarding/pairing (fresh acct); Home (solo + paired); Play + every game; Today + reveal/history; Messages (inbox + conversation); Packs; Dates (Match/Builder/Matches/Bucket List); Wheel (picker/session/complete/history); Settings + all sub-pages (Account, Notifications, Appearance, Privacy, Subscription, Relationship, Security, Delete Account); Paywall; Your Progress/Activity; Recovery. - **Images must belong to the screen:** during the UI sweep, visually inspect every illustration, glyph, banner, empty-state image, pack art, celebration asset, and dark/light variant in context. It should feel intentionally integrated with the page hierarchy, copy, spacing, and action area — not like a forgotten placeholder dropped into an empty slot. Check crop, scale, padding, alignment, corner radius, background/tile treatment, theme variant, **edge treatment**, loading/fallback state, and whether the image competes with or clarifies the primary task. If it is broken, clipped, low-contrast, off-brand, stale, or placeholder-looking, file a bug in `ClaudeReport.md`; if the screen works but would benefit from new/better art, log the prompt need in `ClaudeBrandingReview.md`. - **SOFT EDGES — art must fade into the screen, not show a hard tile edge (mandatory):** every displayed illustration should **blend/feather softly into the background**, not sit as a hard-edged rounded rectangle/card with a visible boundary or border line. Inspect each illustration's edges against the screen on **both themes** — a crisp tile edge, outline/border, or a pale block floating on the surface is a finding (C-ART-EDGE-001). (**Fixed R11:** `BrandIllustration` now feathers its 4 edges to transparent via `Modifier.graphicsLayer{compositingStrategy=Offscreen}` + `drawWithContent` `BlendMode.DstIn` linear gradients — `clip`+`border` removed — and `EmptyState` routes its illustration through `BrandIllustration`, so all tiled art melts into the surface. Recurring check: verify it still holds and that any NEW art helper / direct `painterResource` tile also feathers.) Fix pattern (if it regresses): feather the edges to transparent, or a vignette matching the surface, or ship transparent-edged art — applied in the shared `BrandIllustration`/`EmptyState` helpers so it's consistent everywhere. - **⛔ CLAUDE — RUN THE AUTOMATED THEME SCAN FIRST (MANDATORY, BEFORE THE VISUAL SWEEP):** Do NOT start the manual visual sweep until the automated scan has completed and you have reviewed its results. The scanner is `scripts/theme-scan.sh`. Run it from the project root and save the report: ```bash cd /home/kaspa/.openclaw/Projects/relationship-app ./scripts/theme-scan.sh > /tmp/claude-theme-scan-$(date +%Y%m%d).md cat /tmp/claude-theme-scan-$(date +%Y%m%d).md ``` The script reports findings by severity and ends with a `## Summary` section showing the exact counts. Record those counts in `ClaudeQACoverage.md` under Pass C **before** starting the visual sweep. - **🔴 CRITICAL** — container/surface/background set to a hardcoded color. Will produce visible light/dark mismatches. Example: `Surface(color = Color.White)` inside a dialog in dark mode. - **🟠 MAJOR** — component color overrides or direct `painterResource` that bypasses `BrandIllustration`. Likely to break theme adaptation or decoupled-theme art. - **🟡 REVIEW** — hardcoded text/icon/border/gradient colors that may be correct on a branded container but must be verified in both themes. **⛔ CLAUDE: You are explicitly allowed to improve `scripts/theme-scan.sh` and this Pass C methodology whenever you discover a new light/dark failure mode.** Examples: new Compose patterns that evade the current grep, a new color token that should be checked, better false-positive filtering, or converting the output to JSON/CSV. Keep the script runnable from the project root and update the script header with what changed. Do not remove existing patterns unless they are provably wrong. **After running the scan:** read the report, file all CRITICAL and MAJOR findings to `ClaudeReport.md` as Pass C theme defects, then proceed to the manual visual sweep. Any screen flagged CRITICAL/MAJOR must be verified in BOTH themes during the sweep. If you fix hardcoded colors during the QA round, re-run the scan to confirm they are gone. **Tier 2 — Theme definition validation:** `scripts/theme-scan.sh` also validates that `darkColors` in `Theme.kt` has every required Material3 slot explicitly defined. If a slot is missing, log it to `ClaudeReport.md` as a P2 theme defect. **Tier 3 — Compose screenshot diff suite (endgame, not yet implemented):** The true "catch everything" solution is an automated screenshot comparison pipeline that renders every route in light mode, renders the same route in dark mode, and pixel-diffs them — flagging any screen where the dark version has white backgrounds, invisible text, or wrong-variant art. This catches compositional and gradient-based mismatches that static analysis cannot. When implemented, use `papAROS`, `Shot`, or Roborazzi with a custom `darkTheme = true` test parameter for each route. Log this to `Future.md` as "Tier 3: Compose screenshot diff for visual regression". - **THEME-VARIANT ART must follow the IN-APP theme, not just the system (mandatory — RUN THE DECOUPLED STATE):** the app has its own theme toggle (Settings → Appearance → Light/Dark/Device) that swaps Compose colors but does **not** change the Android config `uiMode`, while `-night` drawables (`drawable-night-nodpi/`) and `painterResource` resolve off the **system** `uiMode`. So art can mismatch the UI when the two disagree. **Test the decoupled state explicitly, every round:** force system light then set the app to **Dark**, and force system dark then set the app to **Light**, and on every screen that has a dark art variant confirm the illustration matches the **in-app** theme (no bright/light tile on a dark screen, no dark tile on a light screen). Commands: `adb -s shell cmd uimode night no` (system light) / `… night yes` (system dark); then toggle the in-app theme in Appearance. Screens with `-night` variants to check: Security (privacy_recovery), Memory Lane, Bucket List, Answer History, Date Match (empty + success), Connection Challenges header, Pairing success, Messages empty, Past Games, Quiet-hours, Account-deletion, + any new `illustration_*` added to `drawable-night-nodpi/`. **Restore `cmd uimode night auto` after.** Light art on a dark screen (or vice-versa) when the in-app theme is switched = bug (P2 theme-not-adapting; see C-DARKART-001). (**Fixed R11:** `CloserTheme` provides `LocalAppInDarkTheme`; `BrandIllustration` loads each drawable through `context.createConfigurationContext(cfg)` whose `UI_MODE_NIGHT_*` is set from `LocalAppInDarkTheme`, so the `-night` variant follows the IN-APP theme, not the system. Verified live R11 both decoupled directions. Recurring check: re-run the decoupled state and confirm it still holds, including any newly added `-night` art.) Fix pattern (if it regresses): drive the resource `uiMode` from the in-app theme as above, or `AppCompatDelegate.setDefaultNightMode`/config override, so `painterResource` picks `-night` per the app's own setting. - **EVERY image needs BOTH a light AND a dark variant matching the theme (mandatory — audit every image-bearing page).** It is not enough that the `-night` mechanism works — the dark asset must **exist**. Go page by page through every screen that shows an illustration / hero / banner / empty-state / pack art and confirm there is a real **dark variant** in `drawable-night-nodpi/` (and a light one in `drawable-nodpi/`) that **matches the in-app theme** — a light/pink image shown on a dark screen (because only the light asset exists) is a **bug**, even with feathered edges. Cross-check each page against the **Image theme-variant coverage** table in `ClaudeBrandingReview.md`; a missing variant is filed as a bug in `ClaudeReport.md` **and** the dark/light asset to create is logged in `ClaudeBrandingReview.md` as a prompt to be made. (2026-06-27 audit found all `illustration_couple_*` heroes, `daily_question`, `partner_activation`, `tonight_partner_prompt`, `together_empty`, and all 10 `pack_art_*` are **light-only** — these still need dark variants.) Only genuinely theme-agnostic transparent/celebration art is exempt, and only after you **verify** it reads on both. - **EVERY icon/glyph must be a CUSTOM Closer glyph — no generic Material icons, no generic hearts (mandatory).** On every screen, inspect each icon: any generic Material icon (`Icons.Filled.*`/`Icons.AutoMirrored.*`/`Icons.Default.*` — ArrowBack, Favorite/FavoriteBorder, Person, Lock, Star, PlayArrow, Check, Close, Send, …) is a **placeholder, not brand** → a finding. File it as a brand defect in `ClaudeReport.md` and log the **custom `glyph_*` to make** in `ClaudeBrandingReview.md` (see its **Icon/glyph audit**). Reflex grep to find them: `grep -rE "Icons\.(Filled|Outlined|Rounded|Default|AutoMirrored)\." app/src/main/java/app/closer/` — every hit is a generic icon that needs a bespoke Closer glyph (`ImageVector.vectorResource(R.drawable.glyph_*)` + `Icon(tint=…)`). (2026-06-27 audit: ~60 distinct Material icons across ~201 call sites still to replace.) - **States, not just happy path:** empty / loading / error / not-paired / locked-premium / signed-out / stale-or-deleted-target / populated-with-many where they exist; many need data setup (seeding is user-gated) — note unreachable states in coverage rather than skipping silently. - **Text/data stress:** test long names, long relationship labels, long question/answer text, emoji, multiline content, empty optional fields, many list items, and both partners having similar names. Verify no clipping, overlap, confusing attribution, broken sorting, or hidden actions. - **Readability at scale:** default font size + spot-check largest system font scale on text-heavy screens. (The full accessibility sweep — large-font on every primary flow, TalkBack labels, touch targets, keyboard, reduce-motion — is **Pass J**; per-route performance/jank is **Pass I**.) - **Orientation / form-factor (the app is NOT portrait-locked — `AndroidManifest.xml` declares no `screenOrientation`, so it DOES rotate to landscape).** Don't only check "rotation doesn't lose state" (that's Pass F) — verify the **landscape layout actually renders correctly** on the text-heavy / game / paywall / dialog screens (`adb shell settings put system accelerometer_rotation 1` then rotate, or use the emulator rotate control): no clipped/cut-off content, no broken scrolling, dialogs and bottom CTAs still reachable. Spot-check a **large-screen / tablet** AVD too. If landscape is not a supported experience, the correct fix is to **lock portrait in the manifest** — file that as the finding (see the app-finding note) rather than shipping an unverified landscape layout. - **Navigation from every entry point:** reach each screen from **all** the places that link to it and confirm it opens correctly each time — e.g. a conversation from the inbox AND from "Discuss" AND from a notification; a game from the Play hub AND from a notification; Paywall from each gated feature; Settings sub-pages; reveal from Today AND from history AND from `partner_answered`. A screen that works from one entry but breaks/duplicates from another = bug. - **Every link, CTA, and mission must prove its destination:** actively hunt for dead buttons, wrong targets, generic Home fallbacks, no-op taps, stale routes, and confusing affordances. Example class: a Reveal card saying **"Tiny Mission: Send one flirty text"** must open the relevant Messages/conversation flow, not do nothing. For every button/card/chip/row, record the expected destination before tapping, then verify the actual destination, state, payload, and back stack. Broken/no-op/wrong-destination CTA = bug (usually P2; P1 if it blocks a core flow). - **All routes into a game / join-game state (verify each opens the correct game + session + partner-state + mode + premium/couple-entitlement + back stack):** Play-hub cards (incl. premium-gated), active-session banners, Home/Today game prompts, game history, replay/results, waiting screens, notification-opened screens, in-app banners, "join/resume/continue/view results/end (their) game", deep-link/crafted intent, and bottom-tab return into an active game. Wrong/duplicate destination, double-back, stale-session join, dead-end, or a route that bypasses the premium/couple check = bug. - **TAKE EVERY AVENUE (exhaustive nav fuzzing — actively hunt for nav bugs, don't just walk the happy path):** treat navigation as something to *break*. On every screen, **tap every interactive element** — each button, card, row, icon, chip, link, tab, header back-arrow, system back, and any "see all / history / edit / manage" affordance — and follow where it goes. Then try the *combinations and sequences* a curious user hits: - **Every order:** switch bottom tabs in many orders, mid-flow (open a game, jump to Messages, come back); enter a deep screen then tab away then back; open A→B→C then back-back-back. - **Rapid / repeated input:** double- and triple-tap navigation targets (especially "open game", "Play now", "Create/Start session", notification taps) to surface double-push/duplicate-screen/stale-route bugs (cf. B-004). - **Interrupt mid-navigation:** background/rotate/lock during a transition; tap a notification while already on that screen, on a different screen, and while logged-out/unpaired; cold-start straight onto a deep link. - **Dead-ends & traps:** from *every* screen confirm there's always a way out (back/close/home) — no screen that strands the user, needs two backs, exits the app unexpectedly, loops, or lands blank. Re-check the asymmetric-game waiting screens, replay/results screens, and paywall specifically. - Log **every** wrong/duplicate/dead destination with the exact tap sequence to reproduce. Wrong/double-back or dead-end = **P2** (P1 if it traps the user or loses their progress). - **Back-stack / "double back":** from every entry point, **system back AND the in-app back arrow** return to the correct previous screen — no dead-ends, no exiting the app unexpectedly, and **no screen that requires pressing back twice** (duplicate/stacked destinations on the back stack = bug). Bottom-tab reselection and deep-link/ notification entries must land with a sane back stack (back → Home, not off the app or a blank screen). Wrong/ double back or a dead-end = **P2** (P1 if it traps the user). - **UI consistency / polish defects:** compare each screen against sibling patterns in the same area and across the app. Headers, labels, status chips, partner names, connected-state copy, spacing, card treatments, and button hierarchy should feel intentional and consistent. Awkward or out-of-place UI such as a Settings relationship row where **"Connected with ..."** looks visually odd, cramped, misaligned, or unlike the rest of Settings is a finding: file as a bug if it looks broken/inconsistent; log to `Future.md` only if it is purely a product/content improvement. ### Pass D — Security & encryption (cornerstone; findings default to P0) > Read first: manual's [E2EE model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · > [Firestore rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) · > [Encryption versions](docs/Engineering_Reference_Manual.md#encryption-versions). The cornerstone: every private field > is ciphertext at rest, rules hold against non-members, keys/recovery are sound. **D3 (live negative raw-API) is > MANDATORY every round** — never deferred to "only 2 emulators" (mint a non-member token via admin → Identity Toolkit > `signInWithCustomToken` → Firestore REST). Run all of D1–D7: - **D1 At-rest coverage:** admin-read RAW docs/objects, assert ciphertext for every private type — chat text + `lastMessagePreview` (`enc:v1:`), chat media bytes (Tink `01 69 59 51 f0…`), answers (`sealed:v1:`/`enc:v1:`), date plans + `date_swipes`, **date reflections** (`date_reflections/{dateId}/answers/{uid}/secure/payload` = `enc:v1:`; the `date_history` metadata doc is intentionally plaintext title/category/timestamp — no private words), Memory Lane capsules, Bucket List. Also: **wrappedCoupleKey** + recovery material never plaintext; **invite code (KDF seed) never stored raw**; **no push payload carries private content**. - **Date-reflection "private until both" gate + edit-seal (R25):** before you reflect, a D3 raw-API read of your partner's `secure/payload` must be **denied**; after you reflect, it's allowed. And the author's edit-before-reveal (secure `update`) must be **denied once the partner has reflected** (the seal holds) — verify both live via the raw-API angle, not just the UI. - **D2 Rules audit (static):** member-only reads, author/server-only writes, ciphertext enforced on every private field, immutability, **no premium self-grant**, entitlements write:false; re-audit conversations/typing/reactions + entitlement partner-read; **no catch-all** `match /{document=**}`; list/query not enumerable; `get()`-rules don't over-expose; **no legacy plaintext/downgrade path** (`coupleEncryptionEnabled` holds; no disabled-encryption branch). - **D3 Negative access tests (EXECUTE LIVE via raw API — do not defer):** a **non-member** account is *denied* reading messages/answers/dates/entitlements/sessions/capsules, writing plaintext to encrypted fields, self-granting premium, and any cross-couple access. Run it the **raw-API angle**: mint a non-member ID token (admin custom token → Identity Toolkit `signInWithCustomToken` REST) and issue Firestore REST GET/PATCH against the couple's docs — expect App Check `403` or rules `PERMISSION_DENIED` on every attempt. Also issue the **same** reads with a **member** token to characterize the enforcement layer (App Check vs rules). Any unauthorized `200` with couple data = **P0**. - **D4 Key exchange / management / recovery (E2EE crux):** couple key client-generated, only leaves device **wrapped** (KDF from invite seed; server holds only `wrappedCoupleKey`+`kdfSalt`/`kdfParams`+`encryptedRecoveryPhrase`); **KDF strength**; Tink AEAD = AES-GCM/256 with **AAD=coupleId**, no weak/custom crypto/nonce reuse; keybox/sealed/commitment integrity; **recovery-wrap server-blind**; **unpair revokes decrypt**; invites CSPRNG + single-use + expiry. - **NEW-DEVICE / LOST-PHONE RECOVERY — drive it end-to-end, don't just verify the phrase is revealed (the make-or-break data-continuity path for an E2E app).** The keys are single-device (a known limitation); the recovery phrase is the only bridge. Infra: `crypto/RecoveryKeyManager.kt`, `data/local/RecoveryPhraseStore.kt`, `ui/pairing/RecoveryViewModel.kt`, `crypto/CoupleKeyStore.kt`. Exercise the **full flow on a fresh install / second device**: sign in → enter the recovery phrase → the couple key is rebuilt → **prior `enc:v1:`/`sealed:v1:` messages and answers actually DECRYPT and render** (not just new ones). Then the failure paths: a **wrong/typo'd phrase** fails gracefully (clear error, no crash, no corruption); a user who **lost the phrase** is told honestly what is/isn't recoverable; and throughout, the **partner's** device keeps working (one side recovering must never break the other). Confirm the server stayed blind (only `wrappedCoupleKey`/`encryptedRecoveryPhrase` ever transit — verify via admin read). Without this, "I got a new phone" silently loses the relationship history. (Also exercised from the account-lifecycle angle in Pass F and the Settings → Security flow in Pass M.) - **CONVERSATION BACKUP + FULL PARTNER-ASSISTED RESTORE (R24) — server-blind + the OOB-code gate.** Send messages → a backup accrues (`couples/{id}/backup/manifest` + `.../chunks/{seq}` — admin-read shows ONLY `enc:v1:` payloads; snapshot blob at Storage `users/{uid}/backups/{id}` is ciphertext). **Self-restore:** on a device with the couple key, "restore" repopulates the local cache; admin confirms the server held only ciphertext. **Full partner-assist (no phrase) — the headline:** simulate device loss WITHOUT `pm clear` (clear only `couple_crypto_secure` + `user_key_secure` + `conversation_cache.db` via `run-as`) → recipient A "Ask your partner to restore" → shows a 6-digit code → partner B gets `restore_requested` push → B **types the code** → A's key + content restore, **never entering the phrase**. Admin confirms only `keybox:v1:`/ciphertext on the server. **Negative (rules):** non-member read of backup/restore docs **403**; partner writing a keybox to a non-partner request **403**; creating a restore_request for another uid **403**; post-unpair fulfil **403**. **OOB-code binding:** a mismatched code is **rejected** (B's device refuses to wrap); a swapped pubkey yields a different code. Files: `data/backup/*`, `crypto/CoupleKeyTransfer.kt`, `data/remote/FirestoreBackupDataSource.kt`, `functions/src/backup/onRestoreRequested.ts`. Unit coverage: `CoupleKeyTransferTest` + `BackupCodecTest`. See the Eng Ref Manual **R24-BACKUP** landmine. - **D5 App Check / Functions / secrets:** App Check enforced; callables validate auth+membership; webhook authenticity; admin-only writes rejected from clients; service-account JSONs never committed; no plaintext/secrets in logcat; temp files deleted. - **D6 Leak vectors:** no private content in analytics/crash; `allowBackup=false` + backup rules exclude sensitive data; deep links re-check membership; clipboard user-initiated; consider `FLAG_SECURE`; repo scan for committed secrets. - **D7 Encryption migration:** test the `encryptionVersion` paths (0 plaintext → 1 migrating → 2 strict) on a legacy couple — migration completes without exposing plaintext or losing/garbling old content, and a half-migrated couple is safe (no mixed read failures, no downgrade). This is the riskiest data path for existing users. ### Pass G — Account creation, validation & fake-account abuse (MANDATORY — both the happy path AND the attacks) Cover **every account-creation avenue a real user takes** and **every fake/abusive creation attempt an attacker would try.** Use throwaway test accounts (sign-out → fresh sign-up; never `pm clear`). Report-first like every pass. - **Real creation flows (happy path + validation):** sign-up (email/password and any social/anonymous path), profile creation, and pairing — both **create-invite** and **accept-invite** sides. Verify field validation (invalid/empty email, weak/short password, mismatched confirm, name length/emoji/unicode), the **error copy is friendly** (no raw SDK/Firebase error leaking — cf. A-OBS), loading/disabled states, and that a brand-new unpaired account lands on the correct "create or accept invite" home (not a broken/blank or paired view). - **Duplicate / conflicting creation:** sign up with an **already-registered email** (clear "already in use", no crash, offer sign-in); create a second account while one is signed in; re-run onboarding after completing it; accept an invite while **already paired** (must be rejected cleanly); two devices accepting the **same invite** (single-use — the second must fail gracefully). - **Fake / malicious creation attempts (security — expect DENY, never crash or leak):** create an account that is **NOT a member** of the test couple and attempt every cross-couple action (read messages/answers/dates/entitlements, write to the couple, self-grant `premium`/`hasPremium`, join/hijack pairing with a guessed/expired/reused invite code) — all must be **denied by rules** (this is the live execution of **D3**). Probe **invite-code abuse**: replay a used code, use an expired code, brute-force/guess attempts (CSPRNG entropy + single-use + expiry must hold). Probe **App Check**: a request without a valid token is rejected. Confirm a malformed/forged sign-up can't bypass profile or membership requirements. **Any successful unauthorized create/read/write = P0.** - **Account lifecycle around creation:** sign-out → sign-in (state restores, no stale couple); **delete account** then re-create with the same email (clean slate, partner notified/unpaired); an unpaired/just-created account tapping a stale notification or deep link is handled gracefully (no crash, sane landing). - **Done = every creation avenue exercised** (happy + duplicate + malicious) with each attack **denied** and each happy path validated end-to-end; findings filed with exact repro. ### Pass E — Full notification suite, deep-links & join-game navigation (every type, both clients, every app state) Run the **complete** suite across **both clients** (QA→Sam AND Sam→QA). Each type verified end-to-end: **trigger fires → delivered to the right partner (never self/non-member/ex-partner) → correct channel + copy with no private content → tap opens exactly the right item (loaded, not generic Home/dead-end) → sane back stack → privacy/authz re-checked on open**. No duplicates; rate limiter (20/day, 100/week) doesn't drop legit ones. - **Notification chunk contract (small chunks, complete coverage):** each chunk owns **one notification type** (or one explicit subchunk of that type, e.g. `chat_message QA→Sam foreground/source-screen sweep`, then `chat_message Sam→QA background+killed+stale`). Before starting, write the chunk's matrix in `ClaudeQACoverage.md`; after finishing, mark each cell `pass | fail→id | blocked→id | not implemented→Future.md`. A notification type is not complete until all applicable cells below are covered: - **Directions:** QA→Sam and Sam→QA; sender must not receive their own push unless intentionally designed. - **Process states:** foreground, background/warm, killed/cold-start, force-stopped if deliverable, screen locked, and resumed after rotation/process recreation when relevant. - **Current screens:** Home, Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation, Settings/sub-settings, Paywall, unrelated deep screen, logged-out, unpaired, and stale prior-partner context. - **Entry surfaces:** foreground in-app banner/head, Android system tray tap, any push action button, crafted deep-link/intent matching the payload, repeated/double tap, and tap after the target has changed. - **Targets:** fresh target, already-open target, completed target, stale/expired/deleted target, unauthorized target, wrong couple/session/item ID, malformed/missing extras, and no-network-on-open. - **Assertions:** correct recipient, correct channel/priority/copy, no private payload/log content, exact destination, membership/auth/entitlement re-check, no duplicate route/session, sane back stack, logcat clean, and coverage/docs updated before the chunk ends. - **Notification tap crash triage (mandatory):** never conclude "the notification didn't open" from UI behavior alone. Before each notification/deep-link tap, clear or timestamp logcat; after the tap, inspect both devices for `FATAL EXCEPTION`, ANR, ActivityTaskManager errors, `RuntimeException`, navigation/deep-link exceptions, `PERMISSION_DENIED`, and swallowed repository/decryption errors. If the app returns Home, stays put, flashes, restarts, or silently fails, classify whether it was wrong routing, missing extras, stale data, permission denial, or a crash. Any notification tap that crashes (example class: tapping a game notification to open **Spin the Wheel**) is a filed bug with stack trace + exact payload/session/game type, not a vague "didn't open" note. - **Test the REAL launch path, not a synthetic one.** `adb am start … --es type=…` does **not** reproduce a real notification tap: the OS notification tap launches the activity through the **SysUILaunch splash handover** (`reportSplashscreenViewShown` → `handOverSplashScreenView`), which `am start` skips. A whole bug class (e.g. the **splash-exit `provider.iconView` NPE** — the handover delivers a splash view with **no icon**, `SplashScreenView: Icon: view: null`, on notification cold-starts only) crashes onCreate → "Force finishing activity" → the app **opens-and-closes**, yet `am start` AND the normal launcher icon both pass. Verdict: for cold-start/notification routing, a synthetic-intent pass is **not** a pass — confirm with a real push tapped from the shade on an `am kill`'d app. - **"Opens and closes / flashes / returns to launcher" ⇒ assume a crash; pull the stack FIRST.** `logcat -c` before the tap, then grep `FATAL EXCEPTION|AndroidRuntime|Force finishing|getIconView`. A real repro + the stack trace beats code-reasoning every time (this bug was misdiagnosed as deep-link routing until the live stack named `MainActivity.kt` + `SplashScreenViewProvider.getIconView`). Confirm crashes reach **Crashlytics** so field cold-start crashes surface. - **Many notification types "broken" at once ⇒ suspect the SHARED entry path (splash/`onCreate`/launch), not each handler.** When chat AND every game's results push all fail identically, the bug is in what they share (the cold-start path), not per-type routing. Re-run a **cold-start smoke after ANY change to** `MainActivity` / splash / theme / manifest / launchMode / branding-"loading state" commits — these cosmetic-looking changes broke the launch. - **For "worked before, broken now": `git blame` / `git log -L` the crashing line/function** to pin the introducing commit, then re-test that exact path on it. - **Both-client × app-state matrix (per type):** QA→Sam and Sam→QA, each in **foreground / background / killed (cold-start)**, plus **already on the target screen**, **on a different screen**, **logged out**, **unpaired**, with a **stale/expired/completed/deleted target**, and **both users opening around the same time**. Not a `pass` unless it works from both clients in every state that applies. - **Current-screen/source-screen matrix (per type):** do not test notifications only from Home or only from a clean launch. For each notification type, vary where the receiving client is when the notification arrives/taps: **Home, Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation, Settings/sub-settings, Paywall, an unrelated deep screen, app backgrounded from each major tab, and app fully closed/killed**. Foreground banners, system-tray taps, warm-start `onNewIntent`, and cold-start launch must all route to the exact target. A tap that lands on generic Home, stays on the old screen, opens the wrong tab, loses extras, duplicates the destination, or needs a second tap is a bug. - **Permission/token health:** cover Android `POST_NOTIFICATIONS` granted, denied, "don't ask again"/system-disabled, and re-enabled states; Settings notification toggles; sign-out/sign-in token refresh; same account on two devices; partner/account switch; stale token cleanup; app reinstall/update; and notification channel migration. Denied/system disabled notifications should fail gracefully with in-app state still correct, never with lost data or broken routing after permission is restored. - **Doze / battery-optimization / background-restriction delivery (real-device gate — emulators NEVER enter these states, so per-round emulator passes systematically miss the #1 real-world "notifications don't work" cause).** Scheduling is entirely server-side (no client `WorkManager`), so the only thing standing between a fired push and the user is the OS power state. On a **physical device**, verify each push type still delivers when the recipient device is dozing / the app is battery-optimized or "Restricted": `adb shell dumpsys deviceidle force-idle` (then send a real partner action + a scheduled push), app set to **Optimized** then **Restricted** in battery settings, and App Standby buckets. Assert: high-priority FCM (partner actions) wakes the device and delivers; lower-priority/data-only pushes degrade *predictably* (document which); scheduled pushes (daily question, capsule unlock, reminders) still arrive within the expected window. Because our recurring setup is two emulators, keep this as a `blocked→needs-device` row in `ClaudeQACoverage.md` (with the device-matrix gate) rather than silently assuming delivery — and run it before any store push. OEM battery-killers (Xiaomi/Samsung/etc.) are even more aggressive; note them for the device matrix. - **Six assertions per notification:** (1) trigger fires correctly — right event, not early, not twice, sender doesn't get their own (unless intended), retry/idempotency doesn't duplicate; (2) delivered to the right person — correct token, old tokens unused after sign-out/account-switch; (3) copy + channel correct — friendly, right channel/ priority, no raw Firebase error/raw IDs, no private content in text/payload/logs/analytics/crash; (4) tap opens the exact destination — specific conversation/session/capsule/match/question/settings/pairing, never blank, never a crash on missing/stale/malformed/unauthorized data, no duplicate/stacked copies, completed→results/replay, expired/deleted→ graceful fallback; (5) back stack sane — back returns sensibly (Home/prev context), no double-back, no unexpected exit/loop/blank; (6) deep-link re-checks auth + couple membership + pairing + entitlement + target ownership + session status + existence — a non-member/logged-out/stale/unpaired open must NOT reach private content and must fail gracefully. - **`qa/qa_push.js` is faithful to the PUSH, not the TRIGGER — assertion #1 needs ≥1 real in-app action per round.** `qa_push.js` sends the FCM via admin (`messaging().send`), so it faithfully reproduces delivery + channel/copy + cold-start launch + tap-routing (use it for the bulk of the type×state matrix and the `entrypoint_smoke.sh` smoke). But it **bypasses the Cloud Function** — no `onMessageWritten`/`onGameSessionUpdate`/`onAnswerWritten`/`createDateMatch` actually ran. So a `qa_push.js`-only round can **never** satisfy assertion #1 (**trigger fires correctly**): a broken or un-deployed trigger (Firestore-path change, deploy regression, rules change) is **invisible** to synthetic pushes. Each round, drive **≥1 real in-app partner action** (send a chat, finish a game, answer the daily Q) and confirm the matching push lands on the partner. (UI-automation tip: the chat composer's send button is the rightmost control in the composer row, content-desc `Send`; if `uiautomator` taps mis-fire, verify the action via admin read — the new message/answer doc exists `enc:v1:` — rather than claiming the trigger from a synthetic push.) - **Inventory (type → Cloud-Function trigger → recipient → destination)** — verify each; mark any unimplemented type `not implemented→Future.md` (don't count as pass): `chat_message`(onMessageWritten → partner → conversation; foreground→chat-head bubble) · `partner_started_game`/`partner_finished_game`(onGameSessionUpdate → partner → game/join · results/reveal) · `partner_completed_part`(**onGamePartFinished** → waiting partner → game; fired when the FIRST player finishes an async game so the partner is told "your turn" — async games complete only when BOTH answer, so without this the waiting partner got nothing between first-finish and both-finish) · `join_game`/`game_invite` & `partner_joined_game` (if present → partner/starter → join screen · waiting-room update) · `partner_answered`(onAnswerWritten → partner → reveal) · `game_abandoned`/`game_ended` (if present → partner → safe ended state, not a stuck session) · `daily_question`(assignDailyQuestion)/`daily_question_reminder`/`daily_reminder`(dailyQuestionReminder → Today) · `date_match`(createDateMatch → match) · `date_plan_update` (if present → date plan/builder/match) · `partner_joined`+`invite_created`(acceptInviteCallable → pairing/home) · `partner_left`(onCoupleLeave)/`partner_deleted_account`(onUserDelete → home/relationship settings) · `memory_capsule_unlocked`(scheduled → capsule) & `memory_capsule_created` (if present → Memory Lane/locked capsule) · `challenge_day_ready`(→ Connection Challenges) & `challenge_day_completed` (if present → challenge progress) · `outcome_reminder`(scheduledOutcomesReminder) · `reengagement`(reengagement/gameRetention) · `gentle_reminder`(sendGentleReminderCallable) · `thinking_of_you`(**sendThinkingOfYouCallable** ← partner-bubble sheet "💜 Thinking of you" → partner → Home; generic copy, **no name**; rate-limited 10/rolling-24h; **quiet hours suppresses the push but still writes the in-app/Together record**; tapping the push → Home, not a dead-end) · `date_reflection_partner`/`date_reflection_ready`(**onDateReflectionWritten** → partner → the date reflection; "your turn" when one reflects, "ready to reveal" when both; gated `notifPartnerAnswered`+quiet hours) · `date_reflection_opened`(**onDateReflectionRevealed** → partner → "opened your reflection", after both reveal) · `date_logged`(**onDateHistoryCreated** → partner → reflect on the just-logged date) ·   ⚠️ **date deep-link regression guard (R25 — Changes 2 & 3):** for every `date_*` type, tapping must open the **exact date's reflection** in **BOTH background (OS tray) and foreground** — background nearly broke because `MainActivity` dropped `date_id` from the payload (fell back to DATE_MEMORIES); and the in-app **Together-feed** row for these types must route to **DATE_MEMORIES**, not DATE_MATCHES (the old `"date" in type` substring bug). Test the feed row AND the OS notification, not just one. · `restore_requested`(**onRestoreRequested** → partner → the restore-consent screen; high-signal help request, NOT suppressed by the routine partner-activity toggle, only quiet hours) · `spki`(key identity/confirm → security/key screen) · `subscription_entitlement_changed` & `security_recovery` (if present). - **Game-notification suite (per game):** A starts from Play hub → B gets the start/join push (if supported) → B taps and lands on the correct join/waiting/active screen → B can join from there → A sees B joined/answered → both finish → finish push opens the exact results/reveal → re-opening the push after completion opens replay/results (not a dead active session) → if A ends/quits, B is notified or shown a graceful ended state → a **stale** game push routes to results/history or a clear expired-session message → simultaneous start/join yields **one** session, neither stuck → premium gate holds (neither-premium push must NOT bypass paywall; either-premium unlocks for both). For each game type, including **Spin the Wheel**, notification taps must be paired with logcat review so crashes are caught even if the visible symptom looks like a no-op or generic Home fallback. - **Join-game navigation suite:** every entry that leads to joining/resuming a game opens the correct game + session + partner-state + mode + entitlement + back stack — Play-hub card, active-game banner/card, Home active-game card, Today game prompt, notification tap, in-app foreground banner, game history/replay, partner waiting screen, results/ reveal, "End their game"/stuck-session recovery, deep-link/crafted intent, cold-start from push, bottom-tab return into an active game, any push action buttons, and any "join/resume/continue/view results/play again". No wrong game type, no accidental stale-session join, no duplicate session on double-tap, back returns correctly. - **Payload security (P0 on any hit):** inspect raw payload + logs — no plaintext message/answer/capsule/date-plan/ bucket-list/swipe content, no raw invite code/seed, no recovery phrase, no wrapped/decrypted key material, no email/name unless intentionally public; payload carries only the minimum routing metadata. Any private content = P0. - **Malformed / stale intents:** fire crafted deep-links with missing/unknown type, missing/wrong target or couple ID, wrong game type, expired/completed/deleted target, unauthorized couple/session, malformed params, duplicate/rapid taps, a push for another user/previous partner, while logged-out/unpaired, while on the target screen, and during a different active game → never crash/leak, always a graceful fallback + sane back stack. - **Scheduled/time-based:** trigger manually (invoke callable/function or seed the due condition — user-gated). - **Foundations:** FCM token registration on sign-in (`TokenRegistrar`) + `onNewToken` + token cleanup on sign-out/ account-switch; POST_NOTIFICATIONS prompt + denied path; channels (`di/NotificationModule`); deep-link routing (`MainActivity.deepLinkRouteFromIntent` → `AppNavigation`); foreground/background split (`core/notifications/AppMessagingService`); no duplicate local+remote notification. - **Coverage:** record per row `type × trigger × recipient × app-state × destination × back-stack × privacy × both-client` in ClaudeQACoverage.md; only `pass` when delivery + routing + back-stack + privacy + both-client are all verified. Missed delivery or wrong deep-link = P1; private content in any payload = P0. ### Pass F — Resilience, concurrency, lifecycle & time (cross-cutting; a 2-user realtime app needs these) - **Concurrency / realtime races (two partners at once):** both answer the daily question simultaneously; both start/join the same game; both swipe a date / react at once; one quits while the other submits; both tap a notification at once; partner acts while you're mid-flow. No lost writes, no stuck state, no duplicate sessions, reveal still correct. (This is where a couples app breaks.) - **Lifecycle / process death:** background mid-flow + return; force-kill the app and relaunch (Android may kill the process) — state/auth/draft restore sanely; deep-link/notification after process death still loads (verified for chat — extend to all). Rotation/config-change doesn't lose Compose state. Low-memory. - **Deterministic state-restoration ("Don't keep activities" — do NOT rely only on `am kill`).** `am kill` is non-deterministic; enable **Developer options → Don't keep activities** (`adb shell settings put global always_finish_activities 1`) so the Activity/process is destroyed on *every* backgrounding, then walk each primary flow (sign-up, pairing, a game mid-answer, an unsent message draft, capsule/Date Builder in progress, paywall) and background→return at each step. Assert **no lost** form input, scroll position, draft, in-progress game state, or nav back-stack — i.e. `rememberSaveable`/`SavedStateHandle` actually persist it. Restore with `adb shell settings put global always_finish_activities 0` after. - **Interruptions mid-flow (the OS or another app steals focus):** incoming phone call, alarm, another app taking the foreground, screen-off/on, **split-screen / multi-window**, and picture-in-picture during a game/answer/message-compose → returning resumes cleanly with no lost state, no crash, no duplicate submit, and audio/camera (voice note, photo) releases + re-acquires sanely. - **Cold-start launch integrity from EVERY entry point (Pass F OWNS this — it's the shared path no other pass owned, and where the splash-crash hid):** the app must **open AND stay** (no crash, no "opens-and-closes", lands off the launcher) when cold-started from: the **launcher icon**, **each notification type tapped from a killed (`am kill`) app**, a **deep link**, and any widget/quick-action. This is the `MainActivity`/splash/`onCreate`/auth-bootstrap path; a crash here (e.g. splash-exit `iconView` NPE) breaks **all** notifications at once. **Run `qa/entrypoint_smoke.sh` here every round and after any MainActivity/splash/theme/manifest/nav/notification change.** Reproduce via the REAL push tapped from the shade (not `am start`); "opens-and-closes" ⇒ pull the FATAL stack (see Pass E crash-triage). - **Network resilience:** offline / flaky / airplane mid-action across answers, games, dates (not just chat media) — graceful failure + retry/queue, no crash, no silent data loss, recovery on reconnect. - **Idempotency / rapid input:** double-tap send/submit, rapid nav, double-start, double-join, repeated paywall-unlock taps — guarded (no double-send, no duplicate session, no crash). - **Time-dependent behavior:** daily-question rollover (6 PM CST assignment), streak day-boundary + repair window, capsule unlock times, reminder schedules, challenge-day availability, timezone change — test across a date change (manipulate device clock / trigger functions). - **Account/couple lifecycle:** brand-new (empty) account; unpaired state; pair → unpair → re-pair; partner leaves mid-session; account deletion cascade; same account on two devices; stale notifications after unpair/delete are graceful; invite accepted while already paired is rejected cleanly. No orphaned/broken state. - **Install/update/migration lifecycle:** fresh install, update over an existing signed-in install, app data retained, Room/DataStore/SharedPreferences migrations, notification channel migration, cached encryption/key material, pending deep links/notifications across update, and version-skew between partners if one device updates first. No sign-out loops, stale build routing, lost local state, broken permissions, or migration crashes. **When local state is lost but Firestore is intact (fresh device / cleared data), already-answered content must reconcile back rather than re-prompt** — see the **R23-DQ-001** daily-question reconcile check in Pass N (a re-answer offered against an immutable `secure/payload` is silent data loss). - **Crash reporting:** confirm crashes/ANRs are actually captured (Crashlytics) so field issues surface. ### Pass H — Branding & artwork (every screen: could it carry more of the brand? where would art help?) **Branding review is a MANDATORY part of QA every round** (not an optional polish pass) — its findings + the assets to create are logged in `ClaudeBrandingReview.md`. A consumer-mindset pass focused on **brand presence and delight** AND two hard brand standards. Walk **every screen and surface** and ask: *does this feel like Closer (private, warm, equal, intentional — a ritual for two)? Could brand color, the heart mark, a brand message, or an illustration make it warmer or clearer without clutter?* Output is **artwork descriptions written as ready-to-paste ChatGPT image-generation prompts** — the user generates the images; we only describe them. - **MANDATE 1 — every image has a light AND a dark variant (theme-matched).** Cross-check every image-bearing page against the **Image theme-variant coverage** table in `ClaudeBrandingReview.md`; a light-only image shown on dark (or vice-versa) is a **bug → `ClaudeReport.md`** and the missing variant is a **prompt to make → `ClaudeBrandingReview.md`**. (Shares the per-page audit with Pass C; H owns producing the prompts + tracking the coverage table.) - **MANDATE 2 — every icon/glyph is a CUSTOM Closer glyph (no generic Material icons / generic hearts).** Audit all icons in use (`grep -rE "Icons\.(Filled|Outlined|Rounded|Default|AutoMirrored)\."`); each generic icon is a brand defect → `ClaudeReport.md` + a **custom `glyph_*` to make → `ClaudeBrandingReview.md`** (the **Icon/glyph audit** table). The bar for ship: **zero generic Material icons** — every icon is bespoke and on-brand. - **Existing art integration check:** judge the art as part of the whole page, not as a standalone asset. Confirm each image supports the screen's job, aligns with the surrounding typography/actions, has enough breathing room, and uses the right light/dark treatment. Art that looks generic, unfinished, randomly placed, or visually disconnected is a finding even if the bitmap itself is technically valid. - **Soft edges (art melts into the surface):** illustrations should **fade/feather into the screen background**, not read as a hard-edged tile/card with a crisp boundary or outline. Confirm edge treatment on both themes; a hard tile edge is a finding (C-ART-EDGE-001). Generated art should carry **transparent/feathered edges** (no baked-in rounded-rect block); if rendered, the shared helper should fade the edges to the surface. Record the desired edge treatment in each prompt. - **First, lock the house style (do this once per round, refresh if the art evolved):** read `docs/brand/visual-identity.md` + `docs/brand/asset-system.md` AND open 2–3 existing illustrations (`illustration_couple_onboarding`, `illustration_reveal_celebration`, `pack_art_*`) to capture the *actual* look. New screens/features since the last brand review must be folded in. Keep the canonical **house-style prompt prefix** + palette in the branding deliverable (`ClaudeBrandingReview.md`) so every prompt reuses it and **all generated art matches the existing artwork.** - **House style (must hold for every prompt):** flat 2D pastel vector illustration; soft rounded shapes, no harsh outlines, gentle gradients; palette aubergine `#24122F` / deep purple `#56306F` / lavender `#B98AF4` / soft pink `#F7C8E4` / soft lavender `#D9B8FF` / blush white `#FFF8FC`; motifs = two-equal-halves heart, paired/sealed cards, floating hearts + petals, candle/mug/lavender-sprig warmth, moon/quiet-hours, calendar/date-card, capsule; mood = warm, quiet, equal, intentional. Couple figures balanced + inclusive, faces simple. **Never** show readable answer/ prompt/message text, invite codes, emails, dating-app clichés, stock photos, alarm/urgency/surveillance imagery. - **Per screen, decide the brand opportunity** (pick the lightest that fits — don't over-decorate): - none needed (already on-brand, or a dense list/form where art would clutter) — say so; - **color/typographic** brand touch (palette, heart mark, a rotating privacy message); - **small glyph** (brand glyph for a relationship concept — describe it for the glyph set); - **hero/empty-state/celebration illustration** (the high-value case → write the full ChatGPT prompt). - **Each artwork item records:** screen/route · placement (hero / empty / header / card / celebration) · why it helps · filename to match the existing scheme (`illustration_*`, `pack_art_*`, `glyph_*`, `particle_*`) · **the ChatGPT prompt** (house-style prefix + the specific scene) · aspect ratio/size + light/dark behavior. Cross-check the brand doc's "Needed additions" / empty-state list and **mark which already have assets vs still need art** (e.g. Android may still lack illustrations that iOS has). - **Prioritize** the screens a user feels most: onboarding/pairing, Home, paywall/subscription, reveal/celebration, empty states (no messages/dates/capsules/history), Memory Lane, Connection Challenges, date match, quiet-hours. - Branding *defects* (mis-colored, clipped, off-brand, low-contrast art) are bugs → `ClaudeReport.md`. Pure "works but could be warmer / a feature idea" → `Future.md` `## QA`. New art to create → `ClaudeBrandingReview.md`. ### Pass I — Performance & route efficiency (jank, redundant reads, caching) [Future.md P14] Before store polish, profile **every top route** and **every high-cardinality list** for jank, repeated Firestore reads, missing cache use, and slow navigation. Drive each route as a user and instrument reads/frames. - **Frame / jank:** scroll every long list (Messages inbox + conversation, Answer History, Question Packs, Past Games, Wheel History, Bucket List, Date deck, Activity/Progress) and open every top route while watching `adb shell dumpsys gfxinfo framestats` (or Perfetto / Studio Profiler) — flag dropped/janky frames, slow first frame, and `Choreographer: Skipped N frames` / main-thread stalls in logcat. Transitions/animations stay smooth (~60fps). - **Redundant Firestore / network reads:** count listeners/gets per screen. Switching bottom tabs and returning must **not** refetch unchanged data; opening a screen twice must not double-read; **snapshot listeners detach on leave** (no leaked/stacked listeners — a 2-user realtime app accumulates these fast). Watch for N+1 reads on lists — e.g. **DateMemories** derives each row's reflection state with per-row `hasReflected` gets; confirm they're cached per `dateId` and not re-fetched for every history-snapshot tick (R25 improvement). - **Memory leaks (beyond listener leaks):** add **LeakCanary** in the debug build (or take heap dumps) and navigate in→out of every heavy screen (conversation with media, game, image viewer, Memory Lane) repeatedly — flag retained Activities/Composables/bitmaps/Contexts. A leak that grows per navigation = bug (P2; **P1** if it OOMs). - **StrictMode in debug (catch main-thread I/O + leaked closeables cheaply):** enable a `StrictMode` thread + VM policy in the debug `Application` (`detectDiskReads/Writes/Network`, `detectLeakedClosableObjects`); any violation logged on a primary flow is a finding (disk/network on the main thread → jank/ANR risk). - **Caching / lazy-load:** static question/category data is cached locally (Room) and not re-fetched each entry; large lists use lazy paging (`LazyColumn`/paging, not load-all); images cached (Coil); offline reads serve from cache. - **Latency:** measure cold-start-to-interactive (splash→loader→Home) and tab/route transition latency; flag anything perceptibly slow (>~300ms). - **Deliverable:** a reusable **route smoke-test checklist** (every top route × {load time · jank · read count}), captured as a runnable script so each round re-checks cheaply. - **Remediation when found:** lazy-load/page large lists; cache local question/category data; dedupe + scope snapshot listeners; skip redundant fetches on tab switches; add skeleton/loading states (cf. Future.md P8) over blocking spinners. - Findings: real jank/leak/redundant-read = bug → `ClaudeReport.md` (P2; **P1** if it ANRs or leaks listeners, **P0** if it drops data); "could be smoother / add skeletons" → `Future.md` `## QA`. ### Pass J — Accessibility (font scale · contrast · screen reader · targets · keyboard · reduce-motion) [Future.md P15] Every **primary flow** must be usable with accessibility settings on. Enable each setting and walk the core flows (auth, onboarding, pairing, Home, a full game, daily question + reveal, Messages, Paywall, Settings) end to end. This is the deep home for a11y; the Pass C contrast/font spot-checks feed into it. - **⛔ Keyboard / IME overlap (run `scripts/ime-scan.sh` FIRST — it must PASS):** the app is edge-to-edge (`adjustResize` doesn't resize the window), so a text-input screen missing `imePadding()`/`safeDrawingPadding()` lets the soft keyboard **cover the fields** — the exact "you can't type in Date Reflection" bug (R25). The scanner flags any text-input file lacking IME handling (allowlisting components whose host handles it); a MISSING hit is a bug. Then **live-verify per input screen**: focus each field with the keyboard open and confirm the focused field stays visible and typable (don't assume — the daily flow is choice-only, so it never exercises this). Input screens: auth (login/signup/forgot), onboarding/profile, pairing/invite/recovery, Messages conversation, Bucket List, Date Builder, **Date Reflection**, Change/Delete/Edit in Settings, Wheel. - **Free-text length + truncation policy (R25 UI review):** every free-text input is bounded **at entry in its ViewModel** — the caps are centralized in `ui/components/TextInputLimits.kt` (`MESSAGE` 2000 · `DISCUSSION_MESSAGE` 500 · `WRITTEN_ANSWER` 2000; the conversation / discussion / question-detail / question-thread / wheel VMs alias those, and chat/discussion/wheel/written-answer also `.trim()` on send). Content is bounded on the way IN, **never truncated at display** — so the rule is **ellipsize chrome (TopAppBar titles, one-line labels/rows, pills, counts), never content or errors.** A `maxLines`+`TextOverflow.Ellipsis` on a message/answer bubble, a question, or an error string is a bug (it silently hides what a partner wrote). The shared written-answer field surfaces a character counter only within `TextInputLimits.COUNTER_THRESHOLD` of the cap. - **Font scaling:** `adb shell settings put system font_scale 1.3` (then 1.5, 2.0) — every primary flow stays usable: **no clipped/overlapping text, no cut-off or hidden buttons/actions** (scroll where needed). **Acceptance: all primary flows usable at increased font scale without clipped buttons or hidden actions.** Restore `font_scale 1.0` after. - **Screen reader (TalkBack):** every interactive element has a meaningful semantics/`contentDescription` (icon-buttons especially: back, send, like, close, the brand-mark loader, game option cards); decorative images are silenced (`clearAndSetSemantics {}` / null desc); reading order is logical; no unlabeled "Button"; custom controls (spin wheel, date swipe deck, answer cards) are operable + announced; no focus traps. - **Contrast:** body text + essential icons meet WCAG AA (4.5:1 body / 3:1 large) in **both** themes — measure, don't eyeball; re-check the known dim spots (game answer text, muted captions, the C-DS-001 area). - **Don't rely on color alone (color-blind / WCAG 1.4.1):** any state conveyed by color must also carry a non-color cue (icon, label, shape, position). Audit the **match/mismatch** rendering (e.g. `AnswerRevealScreen`), status chips, selected/disabled states, and any red=bad/green=good signal — they must be distinguishable in grayscale / with a color-blindness simulation (`adb shell settings put secure accessibility_display_daltonizer_enabled 1`). Color-only status = bug. - **Touch targets:** interactive targets ≥ **48dp** (icon buttons, chips, nav, close/back, reaction buttons, swipe-deck actions). Flag anything smaller. - **Keyboard / external input:** with a hardware keyboard, forms (sign-up, message, capsule, profile) tab in a sane order, IME/Enter actions work, focus is visible, no traps. - **Reduce-motion:** with "Remove animations" (`adb shell settings put global animator_duration_scale 0`), the loader, celebration particles, reveals, splash handoff, and transitions degrade gracefully and **no motion-gated content becomes unreachable** (the loader/particles already honor this — verify everywhere). Restore to `1` after. - **Remediation:** add semantics labels, raise touch targets, fix contrast tokens, guard motion behind the reduce-motion flag. - Findings: missing label / clipped-at-large-font / sub-48dp / failing contrast = bug → `ClaudeReport.md` (**P2**; **P1** if it blocks a primary flow for assistive-tech users); polish → `Future.md` `## QA`. ### Pass K — Billing & subscription lifecycle (the REAL money path, not the admin toggle) **Pass A tests the GATE (couple-shared unlock via an admin entitlement toggle); Pass K tests how the entitlement is actually earned, kept, and lost.** This is the revenue path and it is almost entirely unexercised by the admin toggle. Read the manual's [Billing](docs/Engineering_Reference_Manual.md#billing) section first. **Needs real services** (Google Play Billing sandbox + a Play **license tester** + RevenueCat) — emulators can't do real IAP, so run on a **physical device** with a sandbox account, or mark each money-path row `blocked→needs-device` (the admin toggle is **not** a substitute for these). - **Purchase, end to end:** Paywall → select a plan → Play billing sheet → buy as a sandbox tester → RevenueCat → `revenueCatWebhook`/`syncEntitlement` → `users/{uid}/entitlements/premium` flips active → features unlock for **both** partners (couple-shared, Pass A) → `onEntitlementChanged` fires the partner push → the one-time **Premium-unlock modal** (`PremiumUnlockOverlay`) shows once for **each** partner. - **Restore purchases:** Paywall "Restore" on a reinstall / second device / after sign-out→in → entitlement restored, no double-charge, features unlock. - **Plan switching:** monthly ↔ annual upgrade / downgrade / crossgrade → correct proration + entitlement continuity. - **Trial / intro pricing** (if configured); **price + currency are displayed from the store, localized, never hardcoded**; plan list + benefits render; offline/SDK-error paywall is friendly (A-OBS), Continue hidden until plans load. - **Cancel → expiry → RE-LOCK:** cancel keeps access until period end (`expiresAt`); at expiry, `CouplePremiumChecker` reports inactive → premium features **re-lock for BOTH** and the premium-unlock "celebrated" flag re-arms. Test the `expiresAt` boundary (admin: set it just-past) — the couple-shared checker must treat a lapsed entitlement as inactive. - **Billing retry / grace period / account hold / pause** (Play states) → entitlement + UI reflect the state; no hard crash, clear messaging. - **Refund / revocation:** RevenueCat `CANCELLATION`/`EXPIRATION`/refund webhook → entitlement removed promptly → re-lock. - **Security (overlaps D3/D5):** server-only entitlement writes (client self-grant → 403); **webhook authenticity** (forged/replayed RevenueCat webhook rejected); no client-trusted entitlement; receipt validated server-side. - **Error/abuse:** cancel the billing sheet mid-flow, kill network mid-purchase, double-tap buy, rapid unlock taps → no false unlock, no duplicate purchase, retry recovers. - **Settings → Subscription** reflects the live status; "Manage subscription" deep-links to Play. - Done = purchase + restore + switch + cancel→expiry-relock + refund all verified on a real device (or each explicitly `blocked→needs-device` with the admin-toggle gate covered in Pass A). ### Pass L — Messaging & chat (E2E, both clients, the whole feature) Chat is a core couple feature with no functional home until now (Pass C covers its visuals, Pass E its `chat_message` push). Drive the **main couple conversation AND the per-question "Discuss" threads** QA↔Sam, both directions. Read the manual's [E2EE model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model). - **Send every type, both directions:** text, emoji, **image (gallery + camera)**, **voice note** → arrives on the partner's device **decrypted**, correct attribution/timestamp/ordering, day separators. - **E2E at rest (overlaps D1):** every sent item is ciphertext (`enc:v1:` / Tink media bytes), `lastMessagePreview` encrypted, decrypts only on member devices; raw-API read by a non-member = 403. - **Interactions:** reactions (add / change / remove), read receipts ("Seen"), typing indicator, message ordering under rapid exchange. - **Failed send & offline:** airplane mid-send → failed-message row → **retry / dismiss** (the 48dp controls), offline queue flushes on reconnect, **no duplicate on retry** (idempotency, overlaps F); double/triple-tap send guarded. - **Delete / moderation:** delete a message (own / both) + deleted-message rendering; block/report a partner if such a flow exists. - **Media:** gallery + camera + mic permission granted **and denied** → graceful; premium-gated media is couple-shared (Pass A); oversized image handled; image viewer opens/zoom/back. - **Inbox:** conversation list, unread badge, decrypted last-message preview, recency sort; open a conversation from inbox **and** from "Discuss" **and** from a notification (Pass C/E) — all reach the same thread with a sane back stack. - **Foreground chat-head bubble** for an incoming message while the app is open but that thread isn't on screen (`MessageBubbleOverlay`) → tap opens it (Pass E overlap). - **Realtime + perf (overlaps I):** snapshot listener detaches on leaving the conversation; long-history scroll pages, no jank/leak. **Quiet hours** suppress the chat push (Pass M). Long/emoji/multiline/RTL text renders without clipping. ### Pass M — Settings & account management (functional: settings PERSIST and TAKE EFFECT) Pass C checks Settings **looks** right; Pass M checks each control **does** something, persists across relaunch, and takes real effect. Read [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow). - **Appearance theme** (Light / Dark / Device) → applies app-wide immediately, **persists across process death + relaunch**, and the decoupled-art behavior holds (Pass C). - **Notification toggles** (daily reminder · partner answered · chat · streak) → toggling one **OFF actually suppresses that push** (verify by triggering it), ON re-enables; survives relaunch. - **Quiet hours** → set a window covering "now" → partner-triggered pushes are suppressed/deferred during it and deliver outside it; the partner-action vs promotional rate-limit split holds. **MUST test with the recipient BACKGROUNDED/KILLED, not just foreground (RETROSPECTIVE — M-001):** a partner push carries a `notification` block the OS renders directly when the app isn't foreground, so any client-side `QuietHoursManager.isInQuietHours` check (which only runs in `onMessageReceived`, foreground-only) is bypassed exactly when quiet hours matters. Verdict bar: with QH on + recipient backgrounded, send a real partner action (chat/answer/game) → assert **0** notification in the shade AND the Cloud Function log says it suppressed (`recipientInQuietHours`); then QH off → same action delivers. Generalize: **any "don't notify when X" setting (quiet hours, snooze, DND, per-type opt-out) must be enforced server-side where the push is SENT** — verify the setting reaches Firestore and the sender honors it, not just the client. (Reminder: the `users/{uid}` update rule is a **field allowlist** — a newly-synced pref field is silently denied until added to it; confirm the write actually lands via an admin read, not just the UI toggle.) - **Biometric app-lock** → enable → background-return / cold-start prompts for biometric; correct unlock proceeds, cancel keeps it locked, disable removes the lock. (Security-relevant: no bypass.) - **Edit profile** → name, sex/gender (inclusive options), photo upload → persists, reflects on the **partner's** side, ciphertext/storage correct at rest. - **Relationship / unpair** → unpair returns **both** to the unpaired state, **revokes decrypt** (D4), notifies the partner (`partner_left`), makes couple data inaccessible; re-pair works cleanly. - **Delete account** → confirmation → account + couple data cascade (`onUserDelete`), partner unpaired + notified, re-create with the same email is a clean slate (overlaps G). - **Security** → recovery-phrase reveal for **both** accepter and inviter (C-SEC-001), server-blind (D4); regenerate if supported; **and the full new-device recovery flow — enter the phrase on a fresh install → existing history decrypts** (canonical steps + failure paths in **D4**). **Subscription** → "Manage subscription" → Play (Pass K). **Privacy & Terms / data export** links open (the export *contents* are verified in Pass O). - **Analytics / funnel-event correctness (not just the leak check in D6).** The app ships a real analytics tracker (`core/analytics/FirebaseAnalyticsTracker.kt`, wired via `di/ObservabilityModule.kt`); D6 only asserts *no private content* leaks into it — nobody verifies the events actually **fire correctly**, so the business funnel can silently break. Enable Firebase **DebugView** (`adb shell setprop debug.firebase.analytics.app app.closer`) and confirm the key lifecycle events fire **once, at the right moment, with correct params**: signup, pair, paywall_view, purchase/restore (Pass K), game_complete, daily_answer/reveal. Also confirm analytics honor any **consent / opt-out** (privacy): if a toggle or first-run consent exists, opting out must actually stop collection. Wrong/missing/duplicated events = bug; still no private content in any event (D6). - Every toggle survives **process death + reinstall-with-data** (overlaps F). **⛔ Notification Enforcement Matrix (the gap that let the dead Daily/Streak toggles ship — RETROSPECTIVE).** Crashes/visuals probing isn't enough; trace every toggle end-to-end and prove `off ⇒ suppressed`. Run `scripts/wiring-scan.sh` first — its Tier-4 check flags any `notif*` field mirrored to `users/{uid}` that **no** Cloud Function reads (a dead toggle). Then fill this matrix in `ClaudeQACoverage.md`: | Toggle / setting | Local store key | `users/{uid}` field | Function(s) that READ it | off ⇒ suppressed? | |---|---|---|---|---| | Partner answered | `partner_answered` | `notifPartnerAnswered` | `onAnswerWritten` | … | | New chat message | `chat_message` | `notifChatMessage` | `onMessageWritten` | … | | Daily question | `daily_reminder` | `notifDailyReminder` | `dailyQuestionReminder` | … | | Shared-rhythm (streak) | `streak_reminder` | `notifStreakReminder` | `streakReminder` | … | | Tips & nudges (promo) | `promotional_notifications` | `notifPromotional` | `reengagement` + `gameRetention`(challenge) | … | | Quiet hours window | `quiet_hours_*` | `quietHoursEnabled`/`*StartMinutes`/`*EndMinutes`/`timezone` | `recipientInQuietHours` (ALL senders) | … | Matrix rules: (1) a toggle with **no `users/{uid}` mirror** or **no function reader** is a DEAD setting — file it, don't pass it. (2) **Scheduled/cron senders are in scope** — do NOT blanket-defer them to `needs-device`: audit by code (does the sender read the pref + `recipientInQuietHours`?) and invoke manually where possible (Functions shell / temporary schedule). Senders to cover: `dailyQuestionReminder`, `streakReminder`, `reengagement`, `gameRetention` (capsule + challenge), `scheduledOutcomesReminder`. (3) prove `off` live: flip off → trigger → assert **0** push + **0** `notification_queue` for that user; on → delivers. **Standard-settings completeness checklist (presence, not just correctness).** A missing standard control is its own defect class — audit that each EXISTS: - [ ] **OS-notification-permission-off awareness** — when `areNotificationsEnabled()` is false, a banner + "Open system settings" deep-link (`Settings.ACTION_APP_NOTIFICATION_SETTINGS`), re-checked on `ON_RESUME` — else every toggle is silently dead. - [ ] **Promotional / marketing opt-out** — a toggle for non-essential nudges, enforced server-side (`notifPromotional`). - [ ] **Customizable quiet hours** — user-settable Start/End (not a hardcoded window), mirrored + server-enforced. - [ ] **Sign out** ✓ · **Delete account** ✓ · **Subscription** (Pass K) · **Security** (app-lock + recovery) · **Appearance/theme**. - [ ] **Export my data** (GDPR — SECURITY.md P2) and a **Help/Support** surface (contact · FAQ · report-a-bug · app version) — currently GAPS; flag in Future.md, don't silently pass "Settings looks complete". ### Pass N — Daily question, reveal, check-ins & the other interactive features > **⛔ CLAUDE: Run `scripts/wiring-scan.sh` BEFORE driving these features** (review `/tmp/claude-wiring-scan-.md`, > record counts in `ClaudeQACoverage.md`). Every 🔴 dead-setter / 🟠 orphan-reader is a likely silent dead feature — > this is the exact class that hid N-001 (Bucket List) + N-002 (Date Builder) behind innocent-looking empty states. The non-game interactive surfaces that have no functional home (Pass B is games only). Read [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle). - **Date Memories / Reflection (NEW — added R25; reverted-then-reinstated, so it slipped earlier rounds):** log a date (Date Match → mark done → `date_history` row) → Home "Reflect on your date" nudge (fires for **any** recent un-reflected date, not just the latest) → DateMemories timeline → tap a date → DateReflectionScreen. Drive the full loop on **both** devices: type all **4** fields (favorite / surprised / appreciated / free-form notes) — **confirm the keyboard does not cover the fields (Pass J / ime-scan)** — save → AWAITING_PARTNER (with **Edit** affordance: edits allowed only until the partner reflects) → partner reflects → both flip to the side-by-side REVEAL. Negative/edge: neither can read the other early (Pass D gate); a **blank/deep-linked bad `dateId`** shows an error, not a malformed write; a **locked vault** (key unavailable) shows "Locked", not blank dashes; DateMemories **read failure** shows an error state (not an infinite spinner); long-press a memory → **Remove** (confirm) deletes it. Notifications in Pass E. - **Daily-question loop (the core daily ritual):** assignment (6 PM CST, `assignDailyQuestion`) → answer (each answer type) → **both-answered gate** (neither sees the other's answer until both submit) → **mutual reveal** → per-question **Discuss** thread (Pass L) → **Answer History** → **streak** increment + milestone celebration (`streak_milestone`) → reveal `isRevealed` retry (the `onAnswerRevealed` push). Verify the premium daily-question fallback (`DailyQuestionResolver` per-user) does **not** desync the couple's shared daily Q. - **R23-DQ-001 — fresh-device / cleared-local-DB reconcile (data-loss guard):** after answering, simulate the Room↔Firestore desync (new device, reinstall with data cleared, or a wiped local answer store) so Firestore holds the answer but local Room/prefs don't. Home must **not** show a stale "your turn", and opening the daily question must show the **submitted/reveal** state (NOT an editable re-answer form) — because the `secure/payload` doc is immutable (`allow update:false`), so a re-answer would be silently rejected and a *changed* pick lost. The guard is `reconcileLocalAnswerFromFirestore` (Room-first; rebuilds from the read-gated couple-key payload) wired into `DailyQuestionViewModel` + `HomeViewModel`; covered by `ReconcileLocalAnswerTest`. - **Relationship check-ins / Your Progress (outcomes):** baseline check-in (gated to show once), 30/60/90-day follow-ups, slider inputs persist (`submitOutcomeCallable`), the progress view renders patterns/milestones, `scheduledOutcomesReminder` fires, "No baseline yet" → check-in dialog (C-DARK-UI-002 area). Submit + Skip both work. - **Partner bubble → quick-actions sheet (R22):** tapping the Home partner avatar opens the bottom sheet (NOT the old dead-end into "Together"). Verify the glance (avatar + name + "💜 N nights · together since {Mon yyyy}"; streak clause hidden when 0); **Message** → inbox, **Together** → feed, **Your relationship** → relationship settings, header → partner page; **💜 Thinking of you** sends the nudge (Pass E `thinking_of_you`) → "Sent 💜" + in-flight disable + friendly rate-limit/error message on failure; **unpaired account → the bubble still opens the invite flow** (never an empty sheet); a missing/locked partner name (E2EE key absent) shows **"Your partner"**, never ciphertext/🔒. - **"Together" feed is actionable (R22):** rows deep-link by type (message→inbox, game→Play, capsule→Memory Lane, challenge→Challenges, date→Date Matches, answer/reveal→Today); affection/reminder rows (`thinking_of_you`/ `gentle_reminder`/`streak`) have no deeper target and stay non-tappable; opening the feed clears the unread dot; sent `thinking_of_you` nudges show up here. - **Date Memories & Replay (R22) — the private→reveal loop on real dates:** on a mutual match, **"We did this"** → logs `date_history` (idempotent; admin shows PLAINTEXT title/category + completedAt) → opens the reflection. Both partners answer the 3 prompts privately; **admin read of the partner's `date_reflections/.../secure/payload` is DENIED until both have reflected** (the privacy gate, same proof as the daily question) → then both **reveal side-by-side** (real-time, no refresh). The **Date memories** timeline (entry on Date Matches) lists completed dates newest-first with the reflection chip (Reflect / Waiting / View); empty state shows `illustration_date_memories_empty`. Locked-key → placeholder, never ciphertext. **Home nudge:** while a completed date has no reflection from you, Home surfaces a **"Reflect on your date with [partner] 💭"** card (`glyph_date_replay`) → opens the Replay timeline; it clears once you've reflected. Notifications covered in Pass E (`date_reflection_*` / `date_logged`). - **Bucket List:** add / check-complete / edit / delete an item; empty state; both-device sync; at rest encrypted (D1); premium state if applicable (A). - **Plan a Date / Date Builder:** build a plan (shape/steps) → save → **persists + the partner sees it**; date plan + `date_swipes` ciphertext at rest (D1); submit-outcome path. - **ACTUALLY PERSIST + verify via admin read — an empty list can be a DEAD feature, not an empty one (RETROSPECTIVE — N-001/N-002).** For every interactive feature, create real data through the UI and confirm it **lands in Firestore** (admin read) AND **renders back**; don't accept the empty/initial state as "works." Bucket List looked like an empty list but was fully non-functional (`coupleId` never set → every op silently `return`ed); Date Builder's "Create Plan" silently no-ops (`dateIdeaId` never wired) and writes to a collection no screen reads. Reflex: any VM that gates on `if (someId.isEmpty()) return` and expects the screen to call `setX(...)` is suspect — `grep` for the `setX` caller; if none, it's dead. Also confirm there's a **display surface** for whatever a "save/create" writes (a save into an unread collection is an incomplete feature, not a working one). - **Activity / Together feed:** shared activity entries render + sort, unread count, navigation in/out. - Each feature: empty / loading / error / not-paired states, two-device realtime sync, no stuck/orphaned state. ### Pass O — Release build, store readiness & pre-launch security **Everything above runs on the DEBUG build; the shippable artifact is the minified RELEASE build — test THAT.** This is a pre-ship gate, not a per-round pass (run it before any store push and after build-config / dependency / keep-rule changes). - **Release/minified build (R8 + resource-shrink):** build the **release** APK/AAB and run `qa/entrypoint_smoke.sh` + a representative slice of A–N on it. R8 can strip/obfuscate classes that **Firebase/Firestore/Tink/RevenueCat/Gson/ kotlinx-serialization/Compose** need via reflection → crashes that never appear in debug. Verify keep-rules; 0 FATAL on launch + each core flow; **upload the ProGuard mapping to Crashlytics** so release crashes deobfuscate. - **Signing & packaging:** release signing config + upload key; build the **App Bundle**; install the signed AAB via bundletool / Play internal-app-sharing and smoke it; 64-bit + target-SDK compliance. - **App Check enforcement (pre-launch — currently OFF in dev per standing instruction; do NOT enable the dev project):** in **staging**, enable enforcement on Firestore + Functions → a valid-token app works, a raw/no-token request → **403** (extends D3/D5 beyond rules-only); confirm Play Integrity on a real device vs the debug provider. - **Deep links / Android App Links:** `closer://` **and** any `https` App Links (`assetlinks.json`) open the correct screen with auth/membership re-checked (overlaps E). - **Permissions & manifest:** the manifest declares only what's used; runtime prompts (POST_NOTIFICATIONS, camera, mic, Android-13+ photo picker / `READ_MEDIA_IMAGES`) appear and degrade gracefully when denied; `allowBackup=false` holds (D6); and the **`screenOrientation` decision is explicit** — today the manifest sets none, so the app rotates to an unverified landscape layout (Pass C). Either lock portrait or certify landscape; don't ship it undecided. - **Age gate / content rating / maturity (the app has adult/intimacy content — Desire Sync — and currently NO age gate).** Confirm an appropriate **18+ / age-appropriate gate** exists where required, the Play **content/maturity rating questionnaire** matches the actual content, and any IAP/intimacy content complies with store policy. A missing age gate on adult content is a **store-rejection + legal risk** — file it (see the app-finding note). - **Localization & formats (i18n):** strings are externalized (no hardcoded user-facing text), the longest translations don't clip (overlaps C/J), **RTL** mirrors correctly, dates/numbers/**subscription prices+currency** format per locale (overlaps K). Even if English-only today, confirm there's no layout that assumes English length. - **Play Store readiness:** the **Data Safety** form matches the actual data flows + E2E encryption; privacy-policy URL live; version code/name bumped; store listing/screenshots are the brand pass (H); min/target-SDK **device matrix** (Methodology) covered. - **Data-rights compliance (GDPR/CCPA — verify the CONTENTS, not just that a link opens).** Pass M confirms the export / privacy links resolve; here, confirm **right-to-access** actually returns the user's real data (and that E2E content is handled correctly — exported decrypted to the owner, or documented as unrecoverable), and **right-to-erasure** (delete account, Pass M/G) genuinely cascades server-side (`onUserDelete`). A privacy policy that claims flows the app doesn't do (or omits ones it does) is a finding. ### Pass P — Content, copy & language quality (voice, grammar, inclusivity, the question bank) **Wrong language is a BUG, not a "nice-to-have."** Typos, grammar/punctuation errors, off-brand or cold/salesy voice, non-inclusive or assumptive wording, leaked placeholder/dev text, raw SDK/Firebase/RevenueCat errors shown to users, copy that doesn't match behavior, and broken/duplicate/low-quality questions are all **defects → `ClaudeReport.md`**. Only genuinely-working copy that *could be warmer/clearer* goes to `Future.md`. **Read first:** `docs/brand/visual-identity.md` (**Store voice**) and `seed/questions/QUESTION_CONTENT_GUIDE.md` (**v3** — readability test, no-AI-writing, duplicate prevention, variety, fun/relationship-first/premium rules). This is a recurring pass. - **UI microcopy audit (every screen + state):** read ALL visible text — titles, labels, button verbs, helper text, empty states, dialogs/confirmations, toasts, loading + **error** copy, and notification text — for: typos, grammar, punctuation, capitalization/casing consistency, consistent terminology (feature names, "partner"/the partner's name, "couple"), and copy that **matches the actual action/state** (a button says what it does; "Day N of 7" matches the real state; correct names/counts/attribution). A label that misstates its destination or effect is a bug (overlaps C's CTA check). - **No raw / placeholder / dev text ever reaches a user:** no Lorem, "TODO", debug strings, untranslated keys, or raw exception/Firebase/RevenueCat error text surfaced to users (the A-OBS class) — always friendly copy. - **Brand voice & tone (against `visual-identity.md` Store voice):** copy is **warm, quiet, equal, calm, specific** — a private ritual for two. **Off-voice = finding:** cold/clinical, salesy/hype, urgent/alarmist, guilt- or streak-shaming, competitive, surveillance-y, or "we'll FIX your relationship" promises. Scrutinize paywall, notification, and streak copy. - **Inclusive & non-assumptive language:** no heteronormative or relationship-structure assumptions, no assumption about who initiates or about bodies/ability/culture; gender-neutral where the design calls for it (the de-gendering effort — `seed/degender_*.py`); sensitive topics (Desire Sync, intimacy) phrased with care + a consent framing, never crude or clinical. Assumptive/exclusionary wording = bug. - **Question-bank content QA (against `QUESTION_CONTENT_GUIDE.md` v3):** spot-check questions across **every category, depth, and answer type** as the user SEES them (live-rendered, not just the DB) for: passes the **readability test**; no **AI-writing tells**; **no duplicates / near-duplicates**; sensible **variety** + emotional mix; **answer options complete + mutually exclusive + sensible** (no overlapping, joke-only, or placeholder options); the **answer type fits** the prompt; **fun-rule / relationship-first** tone; and **no broken/empty/garbled/offensive/unsafe** prompts (cf. `seed/fix_depth5_grammar.py`, `seed/validate_question_variety.py`). A shipped question that's broken, duplicated, off-guide, or unsafe is a bug. - **Legal / store / monetary copy accuracy:** paywall benefit claims are truthful (no over-promise); subscription terms + renewal/price wording present and accurate (overlaps K); Privacy & Terms links resolve; store-voice rules hold. - **Localization correctness (overlaps O):** no clipped/awkward strings from concatenation, correct pluralization, dates/ numbers/currency per locale, RTL grammar intact — even if English-only today, flag any English-length/grammar assumption. - **Method:** harvest strings from `res/values/strings.xml` + in-code literals AND read them **in context on-device** (a string can be fine in isolation yet wrong/cramped/ambiguous in place). Routing: incorrect/off-voice/non-inclusive/ placeholder/inaccurate/unsafe = **bug → `ClaudeReport.md`** (P2 default; **P1** if it misleads, blocks, leaks a raw error, or is offensive/unsafe; P3 for pure nits); "could be warmer/clearer" → `Future.md`; new copy/voice work that's out of scope → note it. ## Reporting → ClaudeReport.md (living QA report) - Header: date, build, devices, round number + run-state header. - One section per pass (A–P), each a table: **ID | Area | Screen/Route | Mode | Severity | Description | Repro | Evidence | Suggested fix | Status**. - Summary: counts by severity. Report only during passes — no fixes recorded until the fix phase. ### Report hygiene — keep it CLEAN, lean, and never dangling (the report is a *current-state* doc, not an archive) The report's job is to show, at a glance, **what's wrong right now** — not to accumulate a history of everything ever fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So: - **A Fixed row survives exactly ONE confirmation round, then it's removed.** When you fix an issue, mark its row `Fixed` and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the row** — the durable root-cause/fix detail lives in the **Engineering Manual landmine** (mandatory for any escaped/deep bug) + git history (the row cites the landmine ID, and the commit hash once the user commits), so nothing is lost. Don't rely on "the commit message" as the only home — **you don't commit** (the user does, often batched), so the manual landmine is the reliable record. Don't carry confirmed-fixed issues across multiple rounds. - **Make the archived-ID line a usable duplicate-fix lookup, not bare IDs.** When you prune a row, attach a **2–4 word tag** to its archived ID (e.g. `C-PW-001 dark paywall pills`) so the Fix-phase **Regression triage** can search it by symptom. **Any fix whose class could plausibly recur gets at least a one-line Engineering Manual landmine entry** — not only the "escaped/deep" bugs the MANDATORY-retrospective requires — so a future regression check never lands on an ID with no description. (This is why a separate fix-history file is unnecessary: the manual landmines + this tagged archived line + git already are the fix history.) - **One run-state header, always.** Keep only the **current** `Round N | Pass X | Chunk Y | NEXT ACTION` block pinned at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a **single one-line history** entry each (e.g. `R6: branding regression — 0 new`), or drop them entirely once their fixes are confirmed-and-pruned. - **Open issues first; resolved issues compact.** Order every pass section **open (P0→P3) on top**; keep a short `Resolved & confirmed (archived — detail in git)` line listing only the **IDs** of older fixed-and-verified issues (not their tables). The big per-issue tables exist only for **currently-open** and **fixed-this-round-pending-confirm** issues. - **Severity board reflects NOW.** One board, current counts; `Open` is the number that actually matters. When `Open` hits 0 at every level, the report should be **short** — current run-state, a 0/0 board, the archived-ID line, and the operational constants (devices/accounts, standing-auth, playbook pointers). If it's long while everything is fixed, it needs pruning. ### Coverage-matrix hygiene (`ClaudeQACoverage.md` — a *current-status* matrix, not a per-round changelog) - **Flip, don't stack.** When a fix is confirmed, change that row's `fail→id` to `pass` and move the ID to an archived line — never leave a confirmed-fixed `fail→id` dangling, and never keep a contradicting "still owed" note next to a completed row. - **One status per cell, current.** Each screen/feature/game/notification shows its **latest** status only; collapse prior rounds' narration into a single one-line **round history**. Keep an at-a-glance pass-status table at the top. - **Keep the resume signal sharp.** What a returning session needs is *what's left* — surface `todo`/`deferred`/ `blocked` items plainly; don't bury them under superseded prose. ### Extremely-easy-to-read mandate (applies to ClaudeReport.md, ClaudeQACoverage.md, and Future.md) Optimize every QA doc for a reader who has **5 seconds** to find the current state: - **Lead with the answer.** Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean") before any detail. - **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **Engineering Manual landmine** (the durable home), not the row — the row gets a one-sentence description + repro + the landmine ID (and commit hash once the user commits). - **No walls of text.** Break run-state into scannable lines; bold the few words that matter; no multi-paragraph headers. If a paragraph is longer than ~3 lines, it's probably manual/landmine material, not report material. - **Consistent shape every round** so a returning reader (or a post-compaction resume) finds things in the same place. ## Fix phase (only AFTER all passes of the round complete) - Work strictly by severity: **all P0 → P1 → P2 → P3**. - **⛔ Regression triage — DIFF & history-check BEFORE you write a fix (every bug, not just crashes — don't fix blind).** First answer *"is this NEW, or did we break/relapse something that worked?"* — fixing without this risks re-fixing a known issue a different way (divergent fixes) or masking the real regression: 1. **Have we fixed this before? (duplicate-fix / regression check.)** Search the **fix history** for the same symptom/area/ID — the canonical home is the Engineering Manual [Known landmines and recent fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (root cause + the guard that should hold it) plus `ClaudeReport.md`'s `Resolved & confirmed` archived-ID line. **A match ⇒ this is a REGRESSION, not a new bug:** re-open under the **original ID**, and fix *why the guard lapsed* (a scanner/ test/pass-step that was supposed to catch it) — do **not** re-implement a fresh fix from scratch. 2. **What changed? (diff before you fix.)** `git log` / `git diff` / `git blame` / `git log -L` the failing area to pin the introducing change — **including OTHER agents' recent commits** (this repo is co-edited by Codex / kimi / Ripley, so "what changed" is frequently not your own work; `git log --since` / `git log ` across authors). Read that commit's diff and fix the **actual cause it introduced**, not the surface symptom. ("worked before, broken now" ⇒ always bisect to the change first.) 3. Only after you know **new-vs-regression** and **what introduced it** do you design the fix. - **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext in Firestore**, **`./gradlew testDebugUnitTest` + functions `npm test` still green** — a fix that reds a test isn't Fixed) → flip its row to **Fixed** + capture the durable substance in the Engineering Manual landmine → next (the **user** commits per issue/cluster — never run git yourself; see Guardrails). Don't start the next until the current is verified. - **Real-path verification gate (do NOT mark Fixed without it):** verify the fix through the **same path the user hits**, not a synthetic shortcut. A crash/launch/notification fix is only "Fixed" once reproduced-then-cleared via the REAL channel (real push tapped from the shade on an `am kill`'d app; real launcher cold-start) — `am start`/`am force-stop` passes don't count. For any cold-start/notification/launch fix, the gate is **`qa/entrypoint_smoke.sh` green**. (This session's miss: a routing "fix" was declared on `am start` evidence while the real bug was a splash crash on the FCM cold-start. Don't repeat it.) - **Couple-shared premium fix**: replace direct `isPremium()` gates with `CouplePremiumChecker.coupleHasPremium(partnerId)` in every gated VM/screen (partner-entitlement read rule deployed). **High regression risk** — re-verify each feature in BOTH self-premium and free states. - **Re-run associated scanners after fixing.** If the fix touches UI colors/surfaces (Pass C), re-run `scripts/theme-scan.sh` and confirm the relevant CRITICAL count dropped. If the fix touches launch/splash/notifications, re-run `qa/entrypoint_smoke.sh`. Update the coverage matrix with the new counts; a "Fixed" row is only valid when the scanner (and the live visual sweep) both agree. - Gated actions (entitlement toggles, deploys) are **user-authorized per occurrence**. - **New issues found while fixing** are logged (new ID), not silently fixed beyond scope — next re-QA round catches them. **Definition of done:** a **pass** is done when every coverage row is `pass`/`fail→id`/`not implemented→Future.md`/ `blocked→id`; a **round** is done when all **recurring** passes (A–N + P) are done; **flawless** = one full round with **zero open P0–P2 and Passes D + E + L + P fully clean** (no open P0/P1 in I/J), **every game fully played through, every notification type verified or explicitly `not implemented→Future.md`, chat (L) + the couple-shared premium gate (A) + settings-take-effect (M) + **interactive features (N: daily-Q/reveal, outcomes, Bucket List, Date Builder work end-to-end — created data persists AND is read back, `scripts/wiring-scan.sh` 🔴=0)** + content/language (P: no typos/ off-voice/non-inclusive copy, question bank on-guide) verified, all join-game navigation paths and all back-stack checks verified**, **the unit + functions test suites GREEN (`./gradlew testDebugUnitTest` + functions `npm test`)**, **and `qa/entrypoint_smoke.sh` GREEN on both emulators (0 FAIL — every entry-point cold-start opens and stays)**. Then stop (P3s optional). **Pass O (release build + store readiness) and Pass K's real-money path are pre-ship / real-device gates** — they don't block a per-round "flawless" but **must be GREEN before any store submission**. Don't re-open a clean pass within the same round. ## Re-QA loop (until flawless) After the fix phase, re-run Passes A–N + P (regression + confirm fixes; Pass K money-path when a sandbox device is available, Pass O when prepping a release). Repeat **fix → re-QA** rounds until a full round yields zero P0–P2 and Passes D+E fully clean. - **Prune on confirmation (Report hygiene):** the moment a re-QA round re-verifies a `Fixed` issue, **delete its row** from `ClaudeReport.md` (move its ID to the compact `Resolved & confirmed (archived — detail in git)` line) and collapse that finished round's run-state header. A fixed issue lives in the report for **one** confirmation round only — never let confirmed-fixed rows or old run-states accumulate. See **Report hygiene** under Reporting.