1382 lines
142 KiB
Markdown
1382 lines
142 KiB
Markdown
# Claude QA Playbook — Full-App QA → Fix → Re-QA until flawless
|
||
|
||
> Reusable QA plan for the Closer app. Run report-only first, fix everything, then re-QA until a clean round.
|
||
> Progress/state is tracked in **ClaudeReport.md** (issues) + **ClaudeQACoverage.md** (coverage matrix), which are
|
||
> the authoritative source of truth. See the Continuity section before resuming.
|
||
>
|
||
> **Program roadmap:** **Part 1** = Android QA (this doc) → **Part 2** = build the iOS app to Android's current
|
||
> parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in
|
||
> `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box).
|
||
|
||
## ⛔ This is a LIVING document — improve it whenever you see a gap (do this automatically)
|
||
This playbook, the coverage matrix, and the `scripts/`/`qa/` scanners are **yours to evolve every round** — that is part
|
||
of the job, not a separate task. Whenever a round teaches you something the plan doesn't yet capture, **edit it in the
|
||
same chunk** (no need to ask):
|
||
- A bug **escaped** a prior round, was hard to diagnose, or recurred → add the generalized reflex to the right Pass +
|
||
the durable substance to the Engineering Manual landmine (the MANDATORY-retrospective rule), and if the class is
|
||
greppable, **add/extend a scanner** (`scripts/theme-scan.sh`, `scripts/wiring-scan.sh`, `qa/entrypoint_smoke.sh`).
|
||
- A step is **wrong, contradictory, or stale** (e.g. it told you to do something a standing rule forbids) → fix the
|
||
wording so the next agent isn't misled.
|
||
- A new **route / feature / notification / collection / gate / asset** appeared → fold it into the relevant Pass +
|
||
`ClaudeQACoverage.md` (Living discovery ritual).
|
||
- The plan is **unclear or bloated** → tighten it; lead with the answer; keep one canonical home per fact (don't restate
|
||
a lesson in four places — link by ID).
|
||
Leave the plan better than you found it each round. When you change a scanner, update its header; when you change a
|
||
process rule, make sure it doesn't contradict the Guardrails.
|
||
|
||
## ✅ Per-round execution checklist (the literal flow — details in the sections below)
|
||
1. **Resume:** read `ClaudeReport.md` run-state + `ClaudeQACoverage.md` (the authoritative state); `adb devices` shows
|
||
both emulators; **installed build == HEAD** (rebuild+install if unsure — never QA a stale APK); baseline clean
|
||
(both free, 0 active sessions, logcat 0 FATAL).
|
||
2. **Discovery ritual:** reconcile routes/notifications/features/assets/backend with coverage; fold new surfaces in.
|
||
3. **Run the cheap gates FIRST (before live driving):** (a) the **automated test suites** — `./gradlew testDebugUnitTest`
|
||
+ `cd functions && npm test` (they cover the fragile logic: encryption format, rate limiter, quiet hours, streak,
|
||
entitlement math) — **a red suite is a P0/P1 regression gate, stop and fix before QA'ing a build**; (b) the **scanners** —
|
||
`qa/entrypoint_smoke.sh` (both serials), `scripts/theme-scan.sh` (Pass C), `scripts/wiring-scan.sh` (Pass N),
|
||
`scripts/painter-xml-scan.sh` (crash guard — `painterResource()` on a non-`<vector>` XML drawable throws on render;
|
||
caught O-ONBOARD-001 class — exit≠0 is a P0 gate); (c) the **instrumented render smoke** (when an emulator is attached)
|
||
— `./gradlew :app:connectedDebugAndroidTest` runs `FirstRunRenderSmokeTest` (first-run composables paint in light+dark;
|
||
the on-device net for the "composes fine, crashes on first paint" class — a red run is a P0 gate); (d) optional
|
||
**monkey fuzz** `adb shell monkey -p app.closer --throttle 300 --pct-touch 90 -v 5000` (any crash = bug). File 🔴/🟠 to
|
||
`ClaudeReport.md`; record counts in coverage.
|
||
4. **Run the passes report-only**, sub-batched to one context window each — recurring set **A–N + P** (K money-path +
|
||
O release gates only when a sandbox device / pre-ship is in scope). Checkpoint the MD files after each chunk.
|
||
5. **Fix phase** (after all passes): by severity P0→P1→P2→P3, one at a time, verify each live via the **real path** +
|
||
re-run the relevant scanner, flip the row to Fixed, capture the durable substance in the Engineering Manual.
|
||
6. **Re-QA loop** until **flawless** (see Definition of Done). Prune confirmed-fixed rows.
|
||
7. **You never `git commit`/push — the user commits.** Your durable state is the MD files (they survive compaction).
|
||
|
||
## 📖 Architecture reference (read BEFORE testing the matching area)
|
||
|
||
For each Pass below, before you start, read the relevant section of [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) — it documents the architecture, the wire-format contracts, the security invariants, and the [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (bugs that cost real debugging time and are easy to re-introduce).
|
||
|
||
**This is bidirectional — the manual is a LIVING document, not a read-only reference.** Read it before; **write back to it after.** Whenever a round fixes a bug, changes a contract/flow/gate, or finds the manual stale or missing something, update the manual in the same chunk (see *Where every finding goes*, the *Docs update rule*, and the *MANDATORY retrospective* — all now route durable engineering truth here). Treat it as part of every fix, same as `ClaudeReport.md`/`ClaudeQACoverage.md`.
|
||
|
||
| Pass | Manual section to read first |
|
||
|---|---|
|
||
| A — Couple-shared premium | [Premium-gated features and gate pattern](docs/Engineering_Reference_Manual.md#premium-gated-features-and-gate-pattern) · [Billing](docs/Engineering_Reference_Manual.md#billing) |
|
||
| B — Games lifecycle | [Game session push semantics (idempotent flag-claim)](docs/Engineering_Reference_Manual.md#game-session-push-semantics-idempotent-flag-claim) · [Foreground game-alert banner](docs/Engineering_Reference_Manual.md#foreground-game-alert-banner-r10) · [F-RACE-001](docs/Engineering_Reference_Manual.md#f-race-001-duplicate-game-start-push-on-rapid-partner-update) |
|
||
| C — Visual (light+dark) | [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle) · [C-NAV-001](docs/Engineering_Reference_Manual.md#c-nav-001-back-from-home-resurfaces-onboarding-auth) · [Back-stack gotchas](docs/Engineering_Reference_Manual.md#back-stack-gotchas-c-nav-002-c-nav-003) · [C-HOME-001](docs/Engineering_Reference_Manual.md#home-duplicate-pending-action-card-c-home-001) |
|
||
| D — Security & encryption | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Firestore security rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) · [Encryption versions](docs/Engineering_Reference_Manual.md#encryption-versions) |
|
||
| E — Notifications | [Notifications](docs/Engineering_Reference_Manual.md#notifications) · [Notification deep-link routing](docs/Engineering_Reference_Manual.md#notification-deep-link-routing) · [E-GAME-001](docs/Engineering_Reference_Manual.md#e-game-001-notification-deep-link-landed-in-stale-finished-game) · [E-GAME-002](docs/Engineering_Reference_Manual.md#e-game-002-game-start-push-easy-to-miss-when-app-is-foreground) |
|
||
| F — Resilience | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Known limitation: single-device keys](docs/Engineering_Reference_Manual.md#known-limitation-single-device-keys) |
|
||
| G — Account creation / fake-account | [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow) · [Rate limiting on accept](docs/Engineering_Reference_Manual.md#rate-limiting-on-accept) |
|
||
| H — Branding & artwork | `ClaudeBrandingReview.md` (this repo) · `docs/brand/visual-identity.md` |
|
||
| I — Performance | [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) · [Where to look first](docs/Engineering_Reference_Manual.md#where-to-look-first) |
|
||
| J — Accessibility | [CloserTheme](docs/Engineering_Reference_Manual.md#ios-specific-notes) · [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) |
|
||
| K — Billing & subscription lifecycle | [Billing](docs/Engineering_Reference_Manual.md#billing) · [Premium-gated features and gate pattern](docs/Engineering_Reference_Manual.md#premium-gated-features-and-gate-pattern) |
|
||
| L — Messaging & chat (E2E) | [End-to-end encryption model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) · [Notifications](docs/Engineering_Reference_Manual.md#notifications) |
|
||
| M — Settings & account management | [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow) · [Notifications](docs/Engineering_Reference_Manual.md#notifications) |
|
||
| N — Daily question & interactive features | [Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle) |
|
||
| O — Release build & store readiness | [Firestore security rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) · [Engineering conventions](docs/Engineering_Reference_Manual.md#engineering-conventions) |
|
||
| P — Content, copy & language | `docs/brand/visual-identity.md` (Store voice) · `seed/questions/QUESTION_CONTENT_GUIDE.md` (v3) |
|
||
|
||
**If you find a bug that LOOKS like it might be a re-introduction of a known landmine** (above table or [Known landmines](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes)), stop and verify the fix is still in place before filing a new ID — it may be a regression on a known issue, not a new bug.
|
||
|
||
## Where every finding goes (route it here — exactly one home each)
|
||
| What you found | Where it goes | Form |
|
||
|---|---|---|
|
||
| **A bug** — broken / incorrect / crashing / insecure, premium bypass, wrong-or-missing notification, dead-end nav | **`ClaudeReport.md`** | Table row: stable ID (`A-001`, `E-003`…) + severity (P0–P3) + repro + status |
|
||
| **An idea / improvement** — works but could be better, confusing copy, missing affordance, rough-but-not-broken flow, "it'd be great if…", feature idea | **`Future.md`** `## QA` | Short title + what prompted it + suggested improvement |
|
||
| **New artwork to create** — illustrations, glyphs, image-gen prompts | **`ClaudeBrandingReview.md`** | House-style prompt + placement |
|
||
| **What got tested + its status** (pass / fail / todo / deferred) | **`ClaudeQACoverage.md`** | Coverage cell (the resume anchor) |
|
||
| **Automated scanner findings** | **`ClaudeReport.md`** (CRITICAL/MAJOR that break themes/functionality) **+** `ClaudeQACoverage.md` (execution counts + filing status) | ID + file:line + pattern + fix suggestion |
|
||
| **Durable engineering knowledge** — a fixed bug's root cause + how it's easy to re-introduce, a new architecture fact / data path / wire-format contract / security invariant / gate pattern, or anything the manual is now stale/missing about | **[`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md)** (esp. [Known landmines and recent fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes)) | New landmine entry (ID + cause + the guard) and/or an updated architecture/gate/flow section |
|
||
|
||
- A branding **defect** (mis-colored, clipped, off-brand, low-contrast art) is a **bug → `ClaudeReport.md`**, not a brand
|
||
idea — only *new art to create* goes to `ClaudeBrandingReview.md`.
|
||
- **WRONG LANGUAGE IS A BUG (not a Future.md idea).** A typo, grammar/punctuation error, off-brand or cold/salesy voice,
|
||
non-inclusive/assumptive wording, leaked placeholder/dev/raw-error text, copy that doesn't match behavior, or a
|
||
broken/duplicate/off-guide question → **`ClaudeReport.md`** (see **Pass P**). Only genuinely-working copy that *could be
|
||
warmer/clearer* (a rewording for delight) goes to `Future.md`. "Confusing copy" that actually misleads the user is a bug.
|
||
- **ONE canonical home per fact; everywhere else is a pointer (ID/anchor), never a paraphrase.** This is the rule that
|
||
keeps the five docs from duplicating each other (and wasting tokens re-stating the same lesson). Route by *purpose*:
|
||
the **defect** (repro/severity/status) → `ClaudeReport.md` (transient — prunes to an ID after one confirm); the
|
||
**substance** (root cause / why it's fragile / how to not re-introduce it) → the **Engineering Reference Manual**
|
||
(permanent, engineer-facing); the **reflex** (how to FIND the class next round) → this `ClaudeQAPlan.md` Pass
|
||
(generalized, citing the ID); **coverage status** → `ClaudeQACoverage.md`; **cross-session ops not in the repo**
|
||
(accounts, tooling, auth) → `memory/`. State a fact in its home once; elsewhere cite the ID. Don't restate a fix in
|
||
four docs.
|
||
- **The Engineering Reference Manual is a LIVING document — read it before a pass, write back to it after.** When a
|
||
round teaches the codebase something durable (a fixed bug's re-introduction risk, a new/changed architecture fact,
|
||
data path, contract, gate, flow, collection/Function/route, or the manual disagreeing with reality), update the manual
|
||
in the **same chunk**. **A fix is not complete until its durable substance is in the manual** (see the
|
||
MANDATORY-retrospective rule). The report row and the Pass reflex just reference the manual's landmine ID — they don't
|
||
re-tell it.
|
||
- Logging an idea in `Future.md` is **never** a substitute for filing a real defect: if it's broken, it gets an ID in
|
||
`ClaudeReport.md` too.
|
||
- Bug lifecycle: filed in `ClaudeReport.md` → fixed → kept **one** confirmation round → pruned to the archived-ID line
|
||
(detail lives in git). `Future.md` ideas sit in the backlog until built. (See **Report hygiene** under Reporting.)
|
||
|
||
## Context
|
||
Drive the real app on both emulators, verify each thing live, report, fix, re-verify. Core QA dimensions (cornerstones):
|
||
1. **Couple-shared premium** — if EITHER partner is premium, **all** premium features unlock for **both**.
|
||
2. **Games** — each starts, plays, **joins, resumes**, finishes, **and reopens results** correctly on both devices.
|
||
3. **Full visual pass, light + dark** — every screen, text readable, nothing clipped/invisible.
|
||
4. **Security & encryption (cornerstone)** — every private field is ciphertext at rest, rules hold against
|
||
non-members, keys/recovery are sound. Findings here default to P0.
|
||
5. **Notifications** — the **full suite**: every type delivers to the right partner (foreground/background/killed),
|
||
deep-links correctly, opens the right destination on **both clients**, covers all **game/join-game** flows, handles
|
||
stale notifications, and leaks no private content.
|
||
|
||
These five are the original cornerstones; the playbook has since grown to cover the rest of the app as first-class
|
||
passes (see **QA passes** below): **K Billing & subscription lifecycle** (the real purchase/restore/cancel/expiry money
|
||
path, not just the admin entitlement toggle), **L Messaging & chat** (E2E send/receive/react/media, both clients),
|
||
**M Settings & account management** (every toggle persists + takes effect; biometric lock, quiet hours, unpair/delete),
|
||
**N Daily-question/reveal/check-ins + Bucket List/Date Builder/Activity** (the interactive non-game features), and
|
||
**O Release build & store readiness** (the **minified release** build, signing/AAB, App Check, i18n, deep/App-Links,
|
||
Play Data-Safety — everything else runs on the debug APK), and **P Content, copy & language** (typos/grammar, brand
|
||
voice, inclusive language, and the **question-bank** content — *wrong language is a bug, not a Future.md idea*). Plus the
|
||
existing **F resilience**, **G account/abuse**, **H branding**, **I performance**, **J accessibility**. **Pass letters are stable IDs — never renumber** (issue IDs and
|
||
coverage rows reference them; note D/E/G are not in strict alphabetical position for that reason).
|
||
|
||
Scope decisions: **exhaustive** visual pass (all ~50 screens, both modes); **full scope incl. pre-pairing** flows
|
||
(fresh throwaway account); **couple-shared everywhere** — per-user gates are bugs, fixed by routing through
|
||
`core/billing/CouplePremiumChecker.kt`; **full notification suite** — every type, game + join-game pushes, deep-links,
|
||
stale-notification handling, and all in-app paths into joining/resuming/results, verified on **both clients**.
|
||
|
||
**Early known signal:** only chat uses `CouplePremiumChecker`; games/packs/dates/wheel gate on the user's own
|
||
`EntitlementChecker.isPremium()` — so premium almost certainly does NOT unlock for the free partner there. Pass A
|
||
confirms + enumerates this; the fix phase applies couple-shared everywhere.
|
||
|
||
## Execution mode — run to completion (autonomous; do NOT stop)
|
||
- **Do not stop to check in or ask for approval.** Run all passes (A–P — recurring set A–N + P each round; K's real-money path and O's release/store gates run when a sandbox device / pre-ship is in scope) → the fix phase → re-QA rounds **continuously
|
||
until a flawless round** (zero open P0–P2, Passes D + E clean, every game fully played through, all notification
|
||
routes verified, navigation/back-stack verified). Don't hand control back early.
|
||
- **Unblock yourself:** if anything **blocks progress** (a stale/blocking session, a crash, a build break, a missing
|
||
prerequisite state, a broken nav path that prevents reaching a screen), **fix it immediately and continue** — even
|
||
though passes are otherwise report-only. Blocking issues are fixed inline so the run can proceed; non-blocking
|
||
findings are still logged and fixed in the fix phase.
|
||
- **"Once executed, complete it":** never declare done before the Definition of Done is met — keep cycling fix → re-QA
|
||
until flawless, then stop.
|
||
- **Context limits ≠ stopping — do NOT hand back to the user when context fills.** The harness auto-summarizes a long
|
||
conversation and continues in the next window; you continue **without the user**. (You cannot self-invoke `/compact`
|
||
— and you don't need to; auto-compaction handles it.) The **committed `ClaudeReport.md` run-state + `ClaudeQACoverage.md`
|
||
are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next
|
||
chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action
|
||
even with standing auth, or the macOS requirement for iOS).
|
||
- **Checkpoint (save the working-tree MD files + run-state) before anything interruptible** so a mid-chunk compaction never loses progress (the user commits — never run git yourself). Keep chunks atomic; if a
|
||
chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck
|
||
session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare.
|
||
- **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite
|
||
deliberate design — the log captures it). Never halt the run to ask.
|
||
- **Only true stop = a gated action you cannot perform.** Production deploys, admin Firestore writes/seeds, and
|
||
entitlement toggles still need per-occurrence authorization (the classifier enforces this regardless of this doc).
|
||
If one is genuinely required to proceed and is denied, do **all** other work first, then surface only that single
|
||
blocker — don't halt the whole run for it.
|
||
|
||
## Methodology (every pass)
|
||
- **EVIDENCE OVER ASSUMPTION — read the logs, never assume, always verify (the #1 rule).** Every conclusion —
|
||
`pass`, `fail`, `fixed`, "it works", "the notification didn't open" — must be backed by **observed evidence**, never
|
||
by what the UI *appears* to do or by reasoning about the code. Concretely:
|
||
- **Read `logcat` on EVERY action, not only when something looks wrong.** `logcat -c` before a tap/flow, then after,
|
||
scan for `FATAL EXCEPTION`/ANR/`PERMISSION_DENIED`/exceptions. **Absence of a visible symptom ≠ success** — a screen
|
||
that "looks fine" can be masking a swallowed exception, a denied read, or a crash on another device.
|
||
- **Verify with ground truth, not appearance:** confirm persisted state via **admin reads** (Firestore), confirm
|
||
delivery via `notification_queue`/`dumpsys notification`, confirm routing via the landed screen + back stack,
|
||
confirm encryption via the raw stored bytes. "Looked right" is not verified.
|
||
- **Don't theorize a root cause — reproduce it and read the stack.** If behavior is "didn't work / closed / flashed",
|
||
pull the crash log FIRST (this session's bug was misdiagnosed by reasoning until the live stack named the splash NPE).
|
||
- **Don't trust a synthetic pass** (`am start`, admin write, direct call) for launch/notification/permission paths —
|
||
verify through the **real** channel (see Reproduction fidelity). A green that didn't exercise the user's path is not green.
|
||
- Devices: **5554 (QA)**, **5556 (Sam)**, paired; one **fresh throwaway account** for pre-pairing flows.
|
||
- Drive via adb tap/swipe; resolve coords from `uiautomator dump` bounds; downscale screenshots to read;
|
||
scan `logcat` for `FATAL EXCEPTION`/ANR on each screen.
|
||
- Premium toggled via `scratchpad/set_premium.js` (admin, **user-authorized each time**).
|
||
- Theme toggled via **Settings → Appearance (Light/Dark)** (`MainActivity` `ThemeMode`).
|
||
- **REPORT-ONLY during passes — never fix mid-pass.**
|
||
- **THINK AS A CONSUMER — approach everything from different angles.** Beyond "does it work", constantly ask *"is this
|
||
what a real person would expect / want here? is this delightful, confusing, or annoying?"* Come at each flow from
|
||
multiple angles (first-time user, returning user, the partner who didn't start it, someone tapping fast, someone
|
||
reading carefully, the skeptic, the impatient). Vary inputs, depths, orders, and entry points (don't repeat one
|
||
happy path). A thing can be bug-free yet still *worse than it should be* — notice that too.
|
||
- **CAPTURE IMPROVEMENT / FEATURE IDEAS → `Future.md` (section `## QA`).** Bugs (broken/incorrect behavior) go to
|
||
`ClaudeReport.md` as always. But anything that *works yet could be better* — confusing copy, a missing affordance,
|
||
a rough-but-not-broken flow, a "it'd be great if…" feature idea — append it to **`Future.md` under `## QA`** with a
|
||
short title, what prompted it, and the suggested improvement. This is an idea backlog, **not** the bug log; logging
|
||
here is never a substitute for filing an actual defect in `ClaudeReport.md`.
|
||
- **Environment (senior-QA rec):** prefer the **Firebase Local Emulator Suite or a dedicated staging project** over
|
||
production — isolates test data, makes seeding / entitlement toggles / D3 negative tests **free** (no gated prod
|
||
writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real
|
||
services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but
|
||
every seed/toggle/deploy hits the gate.)
|
||
- **Device/OS matrix (pre-ship gate — currently NOT met; track it honestly):** per-round QA runs on our **two
|
||
identical emulators (5554/5556, same API + screen size)** — that's the realistic recurring setup, not full coverage.
|
||
Before any store push, certify across **minSdk + targetSdk**, a **small** and a **large** screen, and at least one
|
||
**physical device** (App Check / Play Integrity behave differently on emulators). Because this is unmet today, keep a
|
||
`blocked→needs-device` row for it in `ClaudeQACoverage.md` (alongside Pass K money-path + Pass O) so the gap stays
|
||
visible rather than silently assumed-covered — don't claim "device matrix ✓" off two same-size emulators.
|
||
- **⛔ First-run / cold-path lane (fixture blind-spot — learned from O-ONBOARD-001, a P0 we shipped past).** The two
|
||
recurring emulators (5554/5556) are **paired + signed-in + onboarding-complete** — a stable fixture that is great for the
|
||
A–N passes but **structurally cannot reach the entire first-run surface**: onboarding (every slide + **Skip**), sign-up,
|
||
login, the auth-screen logo, pairing/invite-accept, recovery-on-new-device, and day-1 empty states. A bug anywhere in
|
||
that region is **invisible no matter how thorough the passes are** — O-ONBOARD-001 (every fresh install crashed on the
|
||
last onboarding slide) sat undetected precisely because the fixtures skip it. **So: run a fresh-install lane on a
|
||
THROWAWAY device** (e.g. `emulator-5558`, or a fresh AVD — **never `pm clear` 5554/5556**, it breaks the App Check debug
|
||
token): install the build, walk onboarding all the way through (+ Skip) → sign-up → login → pairing → first daily Q,
|
||
asserting 0 FATAL and each screen renders. **Trigger it every time you touch onboarding / auth / pairing / branding /
|
||
launcher or any `res/drawable` asset, and always before a store push.** Keep a `first-run` row in `ClaudeQACoverage.md`
|
||
so this blind-spot stays visible instead of assumed-covered.
|
||
- **Render-level coverage gap (the other half of why O-ONBOARD-001 slipped).** Our cheap gates are all static or
|
||
logic-level — unit/functions tests, `theme-scan`, `wiring-scan`, `painter-xml-scan` — **none of them actually render a
|
||
composable.** `painterResource` on a `<bitmap>` compiled fine, passed all 205 unit tests, and only threw on first paint.
|
||
There is a whole class of "composes fine, crashes on render" bugs (resource resolution, `LocalContext` casts, bad
|
||
`painterResource`). **R20 added the first on-device net for it:** `app/src/androidTest/.../ui/FirstRunRenderSmokeTest.kt`
|
||
renders the first-run crash composables (`CtaSlide` + `AuthLogoMark`, light+dark — the exact O-ONBOARD-001 sites) via a
|
||
Compose `createComposeRule()` and asserts they paint; proven to FAIL on the reintroduced bug. Run it with
|
||
`./gradlew :app:connectedDebugAndroidTest` (needs a connected emulator; filter a class with
|
||
`-Pandroid.testInstrumentationRunnerArguments.class=…`). It currently covers **only the first-run leaf composables** —
|
||
most routes still have no render test, so the **fresh-install lane above remains the net for the rest** until the smoke
|
||
grows (see `Future.md` — extend toward sign-in→pair→daily-Q→game with a Hilt test runner, and/or a Roborazzi/Paparazzi
|
||
screenshot suite). Treat both the fresh-install lane and (when an emulator is attached) `connectedDebugAndroidTest` as
|
||
part of the render-crash net.
|
||
- **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round
|
||
re-checks it cheaply instead of by hand. **Built:** `qa/entrypoint_smoke.sh <serial> <recipient_uid>` (+ helper
|
||
`qa/qa_push.js`) — the cold-start / entry-point launch-integrity smoke. It launches via the launcher AND sends a
|
||
**real** push to a killed (`am kill`) app and **taps the actual OS notification** for each type, asserting the app
|
||
**opens and STAYS** (process alive, 0 FATAL, off the launcher). This is the smoke that catches the "opens-and-closes"
|
||
splash-crash class that `am start` can't. Run it **every round and after any change touching MainActivity / splash /
|
||
theme / manifest / nav / notifications**. `FAIL` = an app crash (real bug); `BLOCK` = push not delivered (flaky
|
||
emulator FCM — rerun, not a bug).
|
||
- **Run the project's OWN test suites every round (they are the cheapest, most deterministic regression net).** Before
|
||
the scanners and live driving, run `./gradlew testDebugUnitTest` (19 unit tests — `FieldEncryptorTest`,
|
||
`SealedAnswerEncryptorTest`, `NotificationRateLimiterTest`, `QuietHoursManagerTest`, `StreakCalculatorTest`,
|
||
`ChallengeStateMachineTest`, `PartnerNotificationManagerTest`, `HomePriorityEngineTest`, `DateMatchRepositoryImplTest`,
|
||
`CloserBrandCopyTest`, …) and `cd functions && npm test` (`entitlementLogic.test.ts`). **A failing test is a regression
|
||
bug (P0/P1) — file it and do not QA a build with a red suite.** A fix that breaks a test isn't "Fixed" (see Fix phase).
|
||
These guard the exact invariants this QA chases (ciphertext format, rate limiting, quiet-hours suppression,
|
||
entitlement math), so a green run is a precondition, not a bonus. (**Instrumented coverage (R20):** the first on-device
|
||
test now exists — `FirstRunRenderSmokeTest` (a Compose render smoke of the first-run screens); run it when an emulator
|
||
is attached with `./gradlew :app:connectedDebugAndroidTest`. It's still **first-run-only** — broader UI/nav/DB-DataStore
|
||
behavior remains uncovered, so the live passes + scanners are still the main UI-behavior net; grow the suite per `Future.md`.)
|
||
- **Stress / monkey fuzz (cheap random-crash net the manual nav-fuzz misses):** once per build run
|
||
`adb shell monkey -p app.closer --throttle 300 --pct-touch 90 -v 5000` on each emulator with `logcat` capturing —
|
||
any `FATAL EXCEPTION`/ANR it triggers is a bug (file it with the monkey seed). This complements Pass C's *targeted*
|
||
nav fuzzing with broad random input.
|
||
- **Run associated automated scanners BEFORE the manual pass.** Every pass with a supporting script must start with it:
|
||
- **Pass C:** run `scripts/theme-scan.sh` and review `/tmp/claude-theme-scan-<date>.md` before looking at any screen.
|
||
- **Pass N (+ discovery ritual):** run `scripts/wiring-scan.sh` and review `/tmp/claude-wiring-scan-<date>.md` before
|
||
driving the interactive features — it catches the **silent dead-feature class** (N-001 Bucket List, N-002 Date
|
||
Builder): 🔴 a `setX()` ViewModel setter with **no caller**, 🟠 a repository read method with **no `ui/` caller**
|
||
(data written but never displayed), 🟡 `if (x.isEmpty()) return` bail-guards to confirm the state is actually
|
||
provided. Every 🔴 is a likely dead feature — prove the feature works by persisting real data and reading it back
|
||
from Firestore (admin), not by trusting the empty-state render.
|
||
- If a scanner does not yet exist for a pass but the pass is highly automatable (e.g. touch-target sizing for Pass J,
|
||
`enc:v1:` leak grep for Pass L, redundant-read count for Pass I), consider building it and adding it here.
|
||
- Scanner findings narrow the manual sweep: every 🔴 CRITICAL must be verified (both themes for C; live persist→read
|
||
for N); 🟠 MAJOR must be reviewed for theme/art breakage or orphan data; 🟡 REVIEW is checked during the sweep.
|
||
- If a manual finding is something the scanner should have caught, improve the scanner (see Living discovery ritual).
|
||
- **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between
|
||
rounds so they don't masquerade as bugs.
|
||
- **Evidence standard:** every filed bug must be reproducible from text alone: build/commit, device, account, theme,
|
||
app/process state, screen/route, exact tap/input sequence, expected result, actual result, and whether logcat showed
|
||
a crash/ANR/permission denial. Screenshots/videos are helpful but never the only evidence because session artifacts
|
||
may not survive compaction.
|
||
- **Flake policy:** if something fails once and then passes, do not dismiss it. Repeat from a clean state, vary timing
|
||
(rapid tap / slow network / background-resume), inspect logs, and file it as intermittent if it cannot be made fully
|
||
deterministic. Intermittent routing, notification, encryption, duplicate-write, or crash behavior is still a bug.
|
||
- **Reproduction fidelity (how we catch DEEP bugs) — the test harness must exercise the SAME path as the user.** A
|
||
synthetic shortcut (`am start` extras, admin writes, calling a function directly, `am force-stop`) can **pass while the
|
||
real path crashes** — the splash-handover NPE only fires on a real notification cold-start, and `am force-stop` can't
|
||
even receive FCM. So for launch / notification / permission / IPC / deep-link behavior, reproduce through the **real OS
|
||
mechanism** (real push tapped from the shade, real launcher cold-start, real permission dialog). Record **which angle**
|
||
proved it in `ClaudeQACoverage.md`; "synthetic/UI-shortcut only" is **not** a pass for these paths.
|
||
- **Symptom→inspection reflexes (apply before theorizing a root cause):** (1) "opens-and-closes / flashes / silently
|
||
fails" ⇒ it's a **crash until the stack says otherwise** — `logcat -c` then capture `FATAL EXCEPTION` from the live
|
||
repro **before** proposing a cause (don't fix by reasoning, like the routing red-herring on this very bug). (2)
|
||
**Many features break at once ⇒ inspect the SHARED code path** (launch/`onCreate`/splash/auth/key-load), not each
|
||
feature. (3) "worked before, broken now" ⇒ **diff & history-check before you fix**: `git blame`/`git log -L`/`git diff`
|
||
the failing line to the introducing commit (**incl. other agents' commits — Codex/kimi/Ripley co-edit this repo**), and
|
||
search the Engineering Manual landmines + the report's archived-ID line for a prior fix of the same symptom — a match
|
||
means **regression, not a new bug** (full procedure: the Fix-phase **Regression triage** step). (4)
|
||
Treat cosmetic/branding/theme/manifest/splash commits as **capable of deep crashes** — re-run the cold-start +
|
||
notification smoke after them.
|
||
|
||
## Living discovery ritual (before each round, and whenever reality disagrees with the docs)
|
||
The app is allowed to grow; the QA plan must keep up. Before a pass or chunk, quickly inventory the current code/app
|
||
surface and reconcile it with `ClaudeQACoverage.md`:
|
||
- **Routes/screens:** inspect `core/navigation/AppRoute.kt`, navigation graph call sites, Settings sub-pages, dialogs,
|
||
bottom tabs, deep links, and any new composables reachable by buttons/cards.
|
||
- **Notifications:** inspect notification type enums/classes, Cloud Function triggers, Android intent/deep-link handling,
|
||
notification channels/actions, FCM token registration, and Android runtime notification permission paths.
|
||
- **Features/gates:** grep for premium checks, permission requests, media pickers, billing/paywall entry points,
|
||
destructive actions, account/couple lifecycle actions, and admin/server-only writes.
|
||
- **Assets/content:** inventory new drawables, `drawable-night*` variants, pack art, empty states, strings, feature flags,
|
||
remote config, and any debug-only screens that should not ship.
|
||
- **Backend/rules:** inspect Firestore rules, indexes/queries, Functions triggers/callables, Storage paths, scheduled
|
||
jobs, and migrations for new data shapes or access paths.
|
||
- **Docs update rule:** if the inventory finds a page, feature, notification, asset, state, backend path, or edge case
|
||
missing from the playbook/coverage, update `ClaudeQAPlan.md` and `ClaudeQACoverage.md` before marking the chunk done.
|
||
- **Scanner update rule:** if a manual finding is a pattern an existing scanner *should* have caught (e.g. a hardcoded
|
||
surface color the theme scanner missed, a route the smoke should have exercised), improve that script and document the
|
||
change in its header. If no scanner exists for a repeated failure mode, consider writing one and adding it to
|
||
**Methodology**.
|
||
If it is product polish, also add it to `Future.md`; if it needs new artwork, add it to `ClaudeBrandingReview.md`.
|
||
**And if the discovery is a durable engineering fact (new route/collection/Function/flag/contract, a changed wire
|
||
format, a renamed file, a gate/flow that the manual describes wrongly or omits), update
|
||
[`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) in the same chunk** — the discovery
|
||
ritual is exactly when the manual drifts out of date, so reconcile it then, not "later".
|
||
|
||
## Multi-angle attack mandate (go DEEPER than "does the happy path work")
|
||
A capability can pass via the UI yet fail when hit directly. Probe each meaningful capability (read/write a private
|
||
field, gate a premium feature, deliver/route a notification, start/finish a game, pair/unpair, create an account)
|
||
from as many **independent angles** as apply — not just the in-app happy path:
|
||
- **Real UI** (play-as-user) — the baseline angle.
|
||
- **Crafted intent / deep-link** — fire the exact intent a notification/link carries (bypasses UI nav) to test routing
|
||
in isolation; also send **malformed/missing extras** → must route gracefully or no-op, never crash.
|
||
- **Raw API against the DEPLOYED backend** — hit Firestore/Storage/Functions REST **directly** with a real token,
|
||
as a **member AND a non-member**, to exercise rules + App Check from OUTSIDE the app. A non-member (or no-App-Check)
|
||
request must be **DENIED** — App Check `403` or rules `PERMISSION_DENIED`. The member request characterizes which
|
||
layer enforces. **Any unauthorized `200` returning couple data = P0.**
|
||
- **Admin inspection (ground truth)** — read the RAW stored docs/objects (admin bypasses rules) to assert what is
|
||
actually persisted: ciphertext only, no plaintext, no raw keys/invite-seeds, no private content in pushes.
|
||
- **Concurrency / race** — two partners (or two rapid taps) hit the same thing at once.
|
||
- **Killed / cold state** — kill with **`am kill <pkg>`**, NOT `am force-stop`: a force-stopped app is in Android's
|
||
*stopped* state and is **excluded from FCM broadcasts** (`GCM broadcast …result=CANCELLED`), so the push never
|
||
arrives and you get a false "no notification". Then deliver a **real** push and **tap the actual OS notification**
|
||
(one at a time — clear the shade first; tapping a *grouped summary* launches with no extras and falsely lands on
|
||
Home). `am start … --es type …` is **not** equivalent to a real notification tap (different launch path — see the
|
||
crash-triage note in Pass E). Also cold-start straight onto a deep link.
|
||
- **Malformed / abusive input** — oversized, empty, rapid-fire, injection-ish, forged FCM payloads, replayed/expired
|
||
tokens & invite codes.
|
||
- **Offline / flaky** — drop network mid-action → graceful failure, recover on reconnect.
|
||
|
||
Record **which angles** were tried per area in `ClaudeQACoverage.md`. For security- or data-sensitive capabilities,
|
||
"UI happy path only" is **not** a `pass`. **D3/Pass G negative access MUST be executed live via the raw-API angle each
|
||
round — never deferred to "only 2 emulators."** (Mint a token for a non-member UID via admin → exchange for an ID
|
||
token via the Identity Toolkit REST `signInWithCustomToken` → use it as Bearer against the Firestore REST API.)
|
||
|
||
## Continuity & resumability (this effort WILL span many context windows — don't lose state)
|
||
State lives in **files**, not memory:
|
||
- **`ClaudeReport.md`** = the issue log (committed). Each issue row is **self-contained in text** (repro + expected
|
||
+ actual) — screenshots are session-only and won't survive a compaction; never rely on a screenshot path alone.
|
||
- **`ClaudeQACoverage.md`** = the coverage matrix: every screen×mode, feature×premium-state, game×lifecycle,
|
||
notification×{foreground,background,killed}, each `todo | pass | fail→id | not implemented→Future.md | blocked→id`.
|
||
The resume anchor.
|
||
- **`Future.md`** (`## QA`) = the non-bug improvement/idea backlog; **`ClaudeBrandingReview.md`** = the branding/artwork
|
||
review + image-prompt backlog. Both committed alongside the report/coverage.
|
||
- **Persistent memory** (`memory/`): QA methodology + exact commands; emulator↔account↔coupleId mapping;
|
||
`scratchpad/set_premium.js` + admin tooling; the couple-shared-premium-everywhere goal + the per-user-gate gap.
|
||
- **Run-state header** pinned at the TOP of `ClaudeReport.md`, always current: `Round N | Pass X | Chunk Y |
|
||
NEXT ACTION: …` — first thing to read, last thing to update before stopping.
|
||
- **Stable issue IDs**: `A-001 / B-002 / C-… / D-… / E-…` (pass-letter + number); coverage references the ID for
|
||
every `fail`. Never renumber or reuse.
|
||
- **Source of truth**: the two MD files are authoritative; the TodoWrite list is scratch for the current chunk only.
|
||
Update the MD files + run-state header *before* ending a session.
|
||
- **Living playbook rule:** when QA discovers any new app surface or recurring lesson — a new page/route, feature,
|
||
setting, game state, notification type/action/channel, entry point, background/killed-state behavior, asset/art
|
||
placement, repeatable bug class, missed edge case, fragile route, confusing state, image/layout failure mode,
|
||
security angle, or anything else that should be checked every future round — update **this `ClaudeQAPlan.md`** in the
|
||
relevant pass before ending the chunk. Also add the matching row/cell to `ClaudeQACoverage.md` if it needs recurring
|
||
verification. **And update [`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) when the
|
||
discovery is durable engineering truth** (a new architecture fact, data path, contract, gate, flow, or a fixed bug's
|
||
re-introduction risk) — the QA plan captures *what to re-test*, the manual captures *what the system is and why it's
|
||
fragile*; both are living and both get updated. Do this even after the immediate bug is filed/fixed so the lesson or
|
||
newly discovered surface is not lost to memory or git history.
|
||
- **Learn from every ESCAPED or DEEP bug — MANDATORY retrospective (do this automatically, not only when asked).**
|
||
Any bug that (a) **escaped a prior round**, (b) needed **non-obvious diagnosis** (a crash, an "opens-and-closes",
|
||
a "didn't work", an intermittent, a wrong-root-cause first guess), or (c) **recurred** triggers a short retrospective
|
||
the moment it's fixed — the fix is **not complete** until all four are done:
|
||
1. **Add the guard that would have caught it** — a new `qa/` smoke check, a coverage row, or a concrete pass step
|
||
(e.g. the cold-start bug → `qa/entrypoint_smoke.sh`). If an existing smoke missed it, extend the smoke.
|
||
2. **Capture the lesson in its ONE canonical home, then link by ID elsewhere — never paraphrase it twice.** Split by
|
||
purpose: the **reflex** (how to *find* this class next round) goes in the relevant Pass of **this doc**, written
|
||
*generalized* and citing the bug ID as an example (do NOT re-narrate the bug here); the **substance** (root cause +
|
||
where it lives now + re-introduction risk + the guard) goes in
|
||
[`docs/Engineering_Reference_Manual.md`](docs/Engineering_Reference_Manual.md) → [Known landmines and recent
|
||
fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (and update the matching
|
||
architecture/gate/flow section if the fix changed it). The manual is the next engineer's first read; a landmine
|
||
that isn't in it will be re-introduced. **Do NOT copy the fix into `memory/`** — per the memory rules, memory holds
|
||
only cross-session facts NOT in the repo (emulator↔account map, admin tooling/commands, standing auth,
|
||
never-commit); past fixes belong to the manual, so memory just points to the landmine ID if needed.
|
||
3. **Name the missing state/angle/entry-point** that let it hide and add it to the multi-angle / state matrices so it's
|
||
exercised every round (e.g. "real notification tap on an `am kill`'d app", not just `am start`).
|
||
4. **Note any wrong turn in diagnosis** so the misstep isn't repeated (e.g. "synthetic test passed while the real
|
||
path crashed → don't fix by reasoning; reproduce via the real channel + read the stack").
|
||
This is how the plan self-improves between rounds — treat the human pointing out a missed bug as a signal the plan had
|
||
a gap, and close the gap here, not just the bug.
|
||
- **Checkpoint cadence**: save `ClaudeReport.md` + `ClaudeQACoverage.md` + run-state after each pass and each chunk (the **user** commits — never run git yourself; see Guardrails).
|
||
- **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
|
||
- **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators
|
||
online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue
|
||
at the first `todo` / unverified-fix; (5) if a prior chunk left an active/stuck game session, recover it via in-app
|
||
"End their game" (log if needed), then redo that chunk.
|
||
|
||
## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration)
|
||
A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the
|
||
**largest coherent unit that reliably finishes AND checkpoints within one context window, with margin**. End every chunk
|
||
by saving the MD files + run-state (the user commits — never run git yourself). If a chunk starts overflowing, split it;
|
||
if chunks feel trivial, merge them.
|
||
**Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching
|
||
prevents half-done/lost work and gives cleaner per-chunk verification + revertable history.
|
||
|
||
Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API
|
||
verification, keep it to **one small route family, one game phase, or one notification type**. A chunk is too large if
|
||
it cannot produce a precise coverage update, issue log, and file-checkpoint before context gets tight. Split before starting
|
||
rather than leaving a half-tested matrix behind. **Prefer Claude-friendly micro-batches**: smaller chunks let the agent
|
||
fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows.
|
||
|
||
| Pass | Chunk granularity | ~chunks |
|
||
|---|---|---|
|
||
| A Premium | one gated-feature family per chunk if live toggles are needed; otherwise free-state sweep → couple-shared verify | 2–4 |
|
||
| B Games | **one game per chunk max**; split complex games into lifecycle/playthrough chunk + join/resume/results/notification-entry chunk | 7–14 |
|
||
| C Visual | **one small route family per chunk** (both themes, ~2–3 screens/states, screenshots reviewed + nav/back + image-fit + all CTAs for that family) — never "all screens" or a broad tab at once | 16–25 |
|
||
| D Security | one security assertion group per chunk: D1 at-rest · D2 rules static · D3 live negative raw API · D4 keys/recovery · D5/D6 leaks · D7 migration | ~6 |
|
||
| E Notifications | **one notification type per chunk** with the full contract below; split a type into direction/state subchunks if needed, but do not mark the type pass until both clients + source screens + fg/bg/killed + stale/malformed + payload/back-stack are covered | 16–30 |
|
||
| F Resilience | **one dimension per chunk** (concurrency · lifecycle/process-death · network · time · account-lifecycle) | ~5 |
|
||
| G Account creation | **one creation/abuse dimension per chunk** (happy/validation · duplicate/conflict · fake-account abuse · lifecycle) | ~4 |
|
||
| H Branding | **one small route family per chunk** (~2–3 screens/states) consumer brand walk + ready-to-paste art prompts + existing-image integration verdict | 8–14 |
|
||
| I Performance | **one route-group per chunk** — gfxinfo/jank + read-count instrumentation (build the route smoke checklist) | ~3 |
|
||
| J Accessibility | **one a11y setting per chunk** (font scale · TalkBack · contrast · targets · keyboard · reduce-motion) | ~5 |
|
||
| K Billing | **one money-path per chunk** (purchase · restore · plan-switch · cancel→expiry-relock · refund · webhook auth) — needs a real device/sandbox | ~6 |
|
||
| L Messaging | **one chat dimension per chunk** (send-types both dirs · reactions/receipts/typing · failed-send/offline · media perms · inbox/entry-points · delete/moderation) | ~6 |
|
||
| M Settings | **one settings group per chunk** (appearance · notif toggles · quiet hours · biometric lock · edit profile · unpair/delete · security/recovery) | ~6 |
|
||
| N Interactive features | **one feature per chunk** (daily-question loop · outcomes/check-ins · Bucket List · Date Builder · Activity feed) | ~5 |
|
||
| O Release/store | **one gate per chunk** (minified release smoke · signing/AAB · App Check (staging) · deep/App-Links · permissions/manifest · i18n · Data-Safety/store) — pre-ship, not per-round | ~6 |
|
||
| P Content/language | **one surface per chunk** (UI microcopy of a route family · voice/tone sweep · inclusive-language sweep · question-bank by category/depth · legal/store copy) | ~5 |
|
||
|
||
Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI sweeps; **montage** screenshots
|
||
(dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus.
|
||
|
||
## Guardrails & efficiency
|
||
- **⛔ NEVER `git commit` / `git push` — the USER does ALL commits.** This overrides every "commit" verb elsewhere in
|
||
this doc: wherever a step says "commit," read it as **"checkpoint = save the working-tree files (`ClaudeReport.md` +
|
||
`ClaudeQACoverage.md` + run-state, plus any code/docs)"** and leave the actual `git commit` to the user. Your durable
|
||
state lives in those files (they survive compaction), not in a commit you make. Never stage, commit, push, branch, or
|
||
amend.
|
||
- **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
|
||
- **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**.
|
||
- **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
|
||
- **Pass C parallelism:** set **5554 = Dark, 5556 = Light** to capture both themes at once.
|
||
- Never log decrypted message/answer content.
|
||
|
||
## Severity scale (label every issue)
|
||
- **P0 Critical** — crash/ANR, data loss, encryption/security leak, feature fully broken, premium bypass.
|
||
- **P1 Major** — feature partly broken, premium not unlocking for partner, wrong/missing notification, dead-end nav.
|
||
- **P2 Minor** — readability/contrast, clipping/overflow/truncation, theme not adapting, inconsistent styling, wrong/double-back navigation.
|
||
- **P3 Polish** — spacing/alignment/copy nits.
|
||
|
||
## QA passes (Round 1 = baseline)
|
||
|
||
### Pass A — Couple-shared premium (target: either partner premium → both unlock)
|
||
Test each gated feature in 3 states: **neither** premium → locked + paywall; **partner-only** premium → BOTH unlock;
|
||
**self** premium → unlock. Toggle Sam premium, confirm QA (free) unlocks; toggle off.
|
||
Features: Play-hub games (Desire Sync + any premium-badged), Connection Challenges, Memory Lane; Question Packs;
|
||
Spin the Wheel / Category Picker / Wheel History (+ any premium wheel categories); Date Match / Plan Date / Date
|
||
Builder; chat media + reactions + any premium chat tools (regression — already couple-shared); Subscription/Settings
|
||
reflects entitlement.
|
||
Gated files (for the fix): `ui/play/PlayHubViewModel`, `ui/desiresync/DesireSyncScreen`,
|
||
`ui/wheel/{CategoryPicker,SpinWheel,WheelHistory}*`, `ui/questions/QuestionPackLibrary*`,
|
||
`ui/dates/{DateMatch,DateMatches}Screen`, `ui/memorylane/MemoryLaneScreen`, `ui/challenges/ConnectionChallengesScreen`.
|
||
Also: **any VM/screen calling `EntitlementChecker.isPremium()` directly** (grep for it) is a candidate gate.
|
||
- **ENFORCEMENT, not just a checker-usage grep (mandatory — RETROSPECTIVE from A-201, R12).** A feature can carry an
|
||
`isPremium` **content flag** + a cosmetic `PremiumBadge` with **NO gate at all** — that's exactly how Date Match
|
||
shipped a premium **bypass** (free users could view/like/match ★Premium date ideas; `getDateIdeas()` returned
|
||
`DateIdeaSeed.all`, no `CouplePremiumChecker`, badge only). Prior rounds missed it because the audit grepped for
|
||
`CouplePremiumChecker` *usages* and found the gated features, never noticing the feature that had **no** checker.
|
||
So every round: (1) **grep for `isPremium` / `PremiumBadge` / premium content flags** (`DateIdea.isPremium`,
|
||
`category.access=="premium"`, `challenge.isPremium`, …) and for **each** confirm a real enforcement path exists —
|
||
a `CouplePremiumChecker` filter OR a paywall-on-interaction — **not just a badge**; (2) **actually TRY TO USE the
|
||
premium content as a free user** (like/open/play it), don't just confirm the lock renders — "badge shows" ≠ "gated".
|
||
A badge with no enforcement = **premium bypass** (P1+). Inspection lesson: *"shows a Premium badge" is a display
|
||
fact, not a gate; prove the gate by using the content while free.*
|
||
|
||
### Pass B — Games lifecycle (MANDATORY: play each game ONE complete time through ALL different play stayles of the game)
|
||
Games: This or That, How Well Do You Know Me, Desire Sync, Connection Challenges, Memory Lane, Spin the Wheel, + Date Match.
|
||
- **PLAY AS THE USER (mandatory mindset for this pass):** drive every game **the way a real user would** — reach it
|
||
through the actual in-app navigation a person would tap (Play hub → the game's card → its buttons), **not** via
|
||
deep-links, admin pokes, forced state, or any shortcut a user doesn't have. **Expect what the user expects:** if a
|
||
tap/button/flow doesn't do the obvious thing, or a screen doesn't behave the way a normal user would assume, **that
|
||
itself is a finding** — log it.
|
||
- **When something doesn't work: REPORT FIRST, then a minimal workaround (in that order).** Do **not** silently
|
||
engineer around breakage by taking extra steps the user wouldn't take. The moment the natural user path fails:
|
||
(1) **log the issue** in `ClaudeReport.md` with severity + the exact user action that failed and what was expected;
|
||
(2) **only then** apply the smallest workaround needed to keep the pass moving. The workaround **never replaces**
|
||
the report — a flow that needs a workaround to proceed is, by definition, broken and must be filed to fix. If a
|
||
workaround is impossible, mark the game `fail→<id>` (blocked) and continue with the next.
|
||
- **A launch/crash check is NOT sufficient. Each game MUST be played one full way through, end-to-end, on BOTH
|
||
devices** — start → answer/interact through **every** step/round/question on each device → reach the
|
||
**finish/reveal/results** screen → confirm the result renders correctly for both partners. Verify each
|
||
intermediate screen and interaction works (selections register, progress advances, both-answered gating,
|
||
reveal/scoring/summary correct). Premium games (Desire Sync, Memory Lane) need a premium toggle to play.
|
||
- The session lifecycle is exercised by the real playthrough: `status` active→completed; reveal/results correct on both.
|
||
- **GAME JOIN PATHS (mandatory — the second partner must JOIN, not just co-play):** the starter begins from real
|
||
in-app nav; the joiner then enters from **every** user-facing entry point — notification tap, Play-hub active state,
|
||
Home active-game card, Today prompt, waiting-room/resume screen, in-app foreground banner, game history/replay, and
|
||
(after the natural paths) deep-link/crafted intent + cold-start from a push. A game isn't complete unless **both**
|
||
partners can **start, join, resume, finish, reopen results, and recover from a stale/ended session** — with no
|
||
duplicate sessions, wrong routes, stuck waiting screens, broken back nav, or premium-gate mistakes.
|
||
- **FIRST-FINISHER → WAITING-PARTNER NOTIFICATION (mandatory state — async games):** explicitly exercise the asymmetric
|
||
state where **one partner finishes their part and the OTHER is idle/away**. The waiting partner MUST get a "your turn
|
||
to play" nudge (`partner_completed_part` via `onGamePartFinished`) the moment the first finishes — async games
|
||
(this_or_that / wheel / how_well / desire_sync) only flip to `completed` (→ `partner_finished_game`) once BOTH answer,
|
||
so without the first-finish nudge the waiting partner is told nothing. Verify the **idle partner** (on Home, or
|
||
backgrounded/killed) actually receives + can tap into the game. (This state was missed for a long time precisely
|
||
because QA always played both sides through; "one finishes, the other never played" is its own required angle.)
|
||
- **VARY THE STYLE OF PLAY (don't just repeat the happy path):** across runs, deliberately exercise *different* ways a
|
||
real couple would play each game, because different inputs hit different code paths:
|
||
- **Different DEPTHS and QUESTION COUNTS — cover the matrix, don't settle for one combo:** play each game across
|
||
**every depth/mood** (Light, Everyday, Deep, All-topics/shuffle) AND **every round length / number of questions**
|
||
(5 / 10 / 15), in *different pairings* across runs (e.g. Light×5, Deep×15, Everyday×10, All×5) — short *and* long
|
||
sessions, shallow *and* deep content. Different depths surface different question sets, tones, and edge content
|
||
(e.g. Deep/Desire-Sync sensitive prompts); different counts stress pacing, progress, and the both-answered gate.
|
||
Also exercise **each distinct answer type** (A/B, Yes/No, True/False, 1–5 scale, multi-select, free-text).
|
||
- **Different answer *patterns* that change the result** — all-match vs all-mismatch vs partial; both-yes vs both-no
|
||
vs split (so reveals show "shared", "all private", "0 matches", "perfect/zero score" — verify each renders right).
|
||
- **Different turn orders / who-starts** — partner A starts vs partner B starts; the guesser opens before vs after
|
||
the subject finishes; both open simultaneously (race); one device much slower than the other.
|
||
- **Different exit/resume styles** — finish normally; quit mid-game; background mid-game then resume; cold-kill
|
||
mid-game then reopen; "End their game"; re-open a completed session for the replay/results; play two games
|
||
back-to-back, and a *different* game type immediately after.
|
||
- **⛔ VERIFY QUIT/ABANDON ACTUALLY ENDS THE SESSION (server-side, by admin read — RETROSPECTIVE from B-ABANDON-001).**
|
||
"Quit" / "End their game" navigating away is **not** proof the session ended — the abandon write is best-effort and
|
||
**swallowed** (`runCatching{…}.onFailure{ Log.d }`), so a `PERMISSION_DENIED` looks like success in the UI. After any
|
||
quit/abandon, confirm the session is actually `completed` via an admin read (**0 active sessions**) AND that a *new/
|
||
different* game can then be started; watch `logcat` for `PERMISSION_DENIED` on the `sessions/{id}` doc during the quit.
|
||
A session that "won't clear" between rounds is a **bug to root-cause, not a test-data nuisance** — B-ABANDON-001 (the
|
||
full-`saveSession` `doc.set()` dropping server-only flags → rule rejects the removed `affectedKeys`) hid for several
|
||
rounds precisely because it was dismissed as cleanup difficulty. See the manual's [B-ABANDON-001 landmine](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes).
|
||
- **Edge inputs** — submit with nothing selected (should be blocked), rapid double-taps on answer/confirm/next,
|
||
spamming the start button, tapping during the reveal animation, switching tabs mid-game, receiving/tapping a
|
||
notification mid-game. None should crash, duplicate, or desync.
|
||
- Edges: re-open a completed session, leave mid-game (resume), no stuck session, no crash, logcat clean.
|
||
- Game start/finish pushes (`onGameSessionUpdate`) exercised here; full delivery/deep-link audit in **Pass E**.
|
||
- **Media permissions** (CAMERA, RECORD_AUDIO): granted works, denied degrades gracefully.
|
||
- **Done = every game has one verified complete playthrough** (a launch-only "opens, no crash" row is `partial`, not
|
||
`pass`). Coverage row format: `game × starter × join-entry × premium-state × depth/count × lifecycle-edge × result`;
|
||
only `pass` when start/join/play/finish/reopen/recover are all verified.
|
||
|
||
### ⛔ Pass C — Visual pass, light + dark, ALL screens (MANDATORY: run scan BEFORE sweep)
|
||
> **⛔ CLAUDE: Run the automated theme scan (below, Automated Tier 1) before starting the visual sweep.
|
||
> Read the output at `/tmp/claude-theme-scan-<date>.md` and file findings to ClaudeReport.md first.
|
||
> The sweep must verify every flagged screen in BOTH themes.**
|
||
|
||
Every route in `core/navigation/AppRoute.kt` (~50), in **both** modes: text contrast/readability (no invisible/
|
||
low-contrast), no clipping/overflow/ellipsis breakage, icons visible, backgrounds adapt, controls legible. Groups:
|
||
auth/onboarding/pairing (fresh acct); Home (solo + paired); Play + every game; Today + reveal/history; Messages
|
||
(inbox + conversation); Packs; Dates (Match/Builder/Matches/Bucket List); Wheel (picker/session/complete/history);
|
||
Settings + all sub-pages (Account, Notifications, Appearance, Privacy, Subscription, Relationship, Security, Delete
|
||
Account); Paywall; Your Progress/Activity; Recovery.
|
||
- **Images must belong to the screen:** during the UI sweep, visually inspect every illustration, glyph, banner,
|
||
empty-state image, pack art, celebration asset, and dark/light variant in context. It should feel intentionally
|
||
integrated with the page hierarchy, copy, spacing, and action area — not like a forgotten placeholder dropped into
|
||
an empty slot. Check crop, scale, padding, alignment, corner radius, background/tile treatment, theme variant,
|
||
**edge treatment**, loading/fallback state, and whether the image competes with or clarifies the primary task. If it is
|
||
broken, clipped, low-contrast, off-brand, stale, or placeholder-looking, file a bug in `ClaudeReport.md`; if the screen
|
||
works but would benefit from new/better art, log the prompt need in `ClaudeBrandingReview.md`.
|
||
- **SOFT EDGES — art must fade into the screen, not show a hard tile edge (mandatory):** every displayed illustration
|
||
should **blend/feather softly into the background**, not sit as a hard-edged rounded rectangle/card with a visible
|
||
boundary or border line. Inspect each illustration's edges against the screen on **both themes** — a crisp tile edge,
|
||
outline/border, or a pale block floating on the surface is a finding (C-ART-EDGE-001). (**Fixed R11:** `BrandIllustration`
|
||
now feathers its 4 edges to transparent via `Modifier.graphicsLayer{compositingStrategy=Offscreen}` + `drawWithContent`
|
||
`BlendMode.DstIn` linear gradients — `clip`+`border` removed — and `EmptyState` routes its illustration through
|
||
`BrandIllustration`, so all tiled art melts into the surface. Recurring check: verify it still holds and that any NEW art
|
||
helper / direct `painterResource` tile also feathers.) Fix pattern (if it regresses): feather the edges to transparent,
|
||
or a vignette matching the surface, or ship transparent-edged art — applied in the shared `BrandIllustration`/`EmptyState`
|
||
helpers so it's consistent everywhere.
|
||
- **⛔ CLAUDE — RUN THE AUTOMATED THEME SCAN FIRST (MANDATORY, BEFORE THE VISUAL SWEEP):**
|
||
Do NOT start the manual visual sweep until the automated scan has completed and you have reviewed its results.
|
||
The scanner is `scripts/theme-scan.sh`. Run it from the project root and save the report:
|
||
|
||
```bash
|
||
cd /home/kaspa/.openclaw/Projects/relationship-app
|
||
./scripts/theme-scan.sh > /tmp/claude-theme-scan-$(date +%Y%m%d).md
|
||
cat /tmp/claude-theme-scan-$(date +%Y%m%d).md
|
||
```
|
||
|
||
The script reports findings by severity and ends with a `## Summary` section showing the exact counts.
|
||
Record those counts in `ClaudeQACoverage.md` under Pass C **before** starting the visual sweep.
|
||
|
||
- **🔴 CRITICAL** — container/surface/background set to a hardcoded color. Will produce visible light/dark
|
||
mismatches. Example: `Surface(color = Color.White)` inside a dialog in dark mode.
|
||
- **🟠 MAJOR** — component color overrides or direct `painterResource` that bypasses `BrandIllustration`.
|
||
Likely to break theme adaptation or decoupled-theme art.
|
||
- **🟡 REVIEW** — hardcoded text/icon/border/gradient colors that may be correct on a branded container but
|
||
must be verified in both themes.
|
||
|
||
**⛔ CLAUDE: You are explicitly allowed to improve `scripts/theme-scan.sh` and this Pass C methodology
|
||
whenever you discover a new light/dark failure mode.** Examples: new Compose patterns that evade the
|
||
current grep, a new color token that should be checked, better false-positive filtering, or converting
|
||
the output to JSON/CSV. Keep the script runnable from the project root and update the script header
|
||
with what changed. Do not remove existing patterns unless they are provably wrong.
|
||
|
||
**After running the scan:** read the report, file all CRITICAL and MAJOR findings to `ClaudeReport.md` as
|
||
Pass C theme defects, then proceed to the manual visual sweep. Any screen flagged CRITICAL/MAJOR must be
|
||
verified in BOTH themes during the sweep. If you fix hardcoded colors during the QA round, re-run the
|
||
scan to confirm they are gone.
|
||
|
||
**Tier 2 — Theme definition validation:** `scripts/theme-scan.sh` also validates that `darkColors` in
|
||
`Theme.kt` has every required Material3 slot explicitly defined. If a slot is missing, log it to
|
||
`ClaudeReport.md` as a P2 theme defect.
|
||
|
||
**Tier 3 — Compose screenshot diff suite (endgame, not yet implemented):**
|
||
The true "catch everything" solution is an automated screenshot comparison pipeline that renders every
|
||
route in light mode, renders the same route in dark mode, and pixel-diffs them — flagging any screen
|
||
where the dark version has white backgrounds, invisible text, or wrong-variant art. This catches
|
||
compositional and gradient-based mismatches that static analysis cannot. When implemented, use
|
||
`papAROS`, `Shot`, or Roborazzi with a custom `darkTheme = true` test parameter for each route.
|
||
Log this to `Future.md` as "Tier 3: Compose screenshot diff for visual regression".
|
||
- **THEME-VARIANT ART must follow the IN-APP theme, not just the system (mandatory — RUN THE DECOUPLED STATE):** the app
|
||
has its own theme toggle (Settings → Appearance → Light/Dark/Device) that swaps Compose colors but does **not** change
|
||
the Android config `uiMode`, while `-night` drawables (`drawable-night-nodpi/`) and `painterResource` resolve off the
|
||
**system** `uiMode`. So art can mismatch the UI when the two disagree. **Test the decoupled state explicitly, every
|
||
round:** force system light then set the app to **Dark**, and force system dark then set the app to **Light**, and on
|
||
every screen that has a dark art variant confirm the illustration matches the **in-app** theme (no bright/light tile on
|
||
a dark screen, no dark tile on a light screen). Commands:
|
||
`adb -s <serial> shell cmd uimode night no` (system light) / `… night yes` (system dark); then toggle the in-app theme
|
||
in Appearance. Screens with `-night` variants to check: Security (privacy_recovery), Memory Lane, Bucket List, Answer
|
||
History, Date Match (empty + success), Connection Challenges header, Pairing success, Messages empty, Past Games,
|
||
Quiet-hours, Account-deletion, + any new `illustration_*` added to `drawable-night-nodpi/`. **Restore `cmd uimode night
|
||
auto` after.** Light art on a dark screen (or vice-versa) when the in-app theme is switched = bug (P2 theme-not-adapting;
|
||
see C-DARKART-001). (**Fixed R11:** `CloserTheme` provides `LocalAppInDarkTheme`; `BrandIllustration` loads each drawable
|
||
through `context.createConfigurationContext(cfg)` whose `UI_MODE_NIGHT_*` is set from `LocalAppInDarkTheme`, so the
|
||
`-night` variant follows the IN-APP theme, not the system. Verified live R11 both decoupled directions. Recurring check:
|
||
re-run the decoupled state and confirm it still holds, including any newly added `-night` art.) Fix pattern (if it
|
||
regresses): drive the resource `uiMode` from the in-app theme as above, or `AppCompatDelegate.setDefaultNightMode`/config
|
||
override, so `painterResource` picks `-night` per the app's own setting.
|
||
- **EVERY image needs BOTH a light AND a dark variant matching the theme (mandatory — audit every image-bearing page).**
|
||
It is not enough that the `-night` mechanism works — the dark asset must **exist**. Go page by page through every screen
|
||
that shows an illustration / hero / banner / empty-state / pack art and confirm there is a real **dark variant** in
|
||
`drawable-night-nodpi/` (and a light one in `drawable-nodpi/`) that **matches the in-app theme** — a light/pink image
|
||
shown on a dark screen (because only the light asset exists) is a **bug**, even with feathered edges. Cross-check each
|
||
page against the **Image theme-variant coverage** table in `ClaudeBrandingReview.md`; a missing variant is filed as a
|
||
bug in `ClaudeReport.md` **and** the dark/light asset to create is logged in `ClaudeBrandingReview.md` as a prompt to
|
||
be made. (2026-06-27 audit found all `illustration_couple_*` heroes, `daily_question`, `partner_activation`,
|
||
`tonight_partner_prompt`, `together_empty`, and all 10 `pack_art_*` are **light-only** — these still need dark variants.)
|
||
Only genuinely theme-agnostic transparent/celebration art is exempt, and only after you **verify** it reads on both.
|
||
- **EVERY icon/glyph must be a CUSTOM Closer glyph — no generic Material icons, no generic hearts (mandatory).** On every
|
||
screen, inspect each icon: any generic Material icon (`Icons.Filled.*`/`Icons.AutoMirrored.*`/`Icons.Default.*` —
|
||
ArrowBack, Favorite/FavoriteBorder, Person, Lock, Star, PlayArrow, Check, Close, Send, …) is a **placeholder, not
|
||
brand** → a finding. File it as a brand defect in `ClaudeReport.md` and log the **custom `glyph_*` to make** in
|
||
`ClaudeBrandingReview.md` (see its **Icon/glyph audit**). Reflex grep to find them:
|
||
`grep -rE "Icons\.(Filled|Outlined|Rounded|Default|AutoMirrored)\." app/src/main/java/app/closer/` — every hit is a
|
||
generic icon that needs a bespoke Closer glyph (`ImageVector.vectorResource(R.drawable.glyph_*)` + `Icon(tint=…)`).
|
||
(2026-06-27 audit: ~60 distinct Material icons across ~201 call sites still to replace.)
|
||
- **States, not just happy path:** empty / loading / error / not-paired / locked-premium / signed-out /
|
||
stale-or-deleted-target / populated-with-many where they exist; many need data setup (seeding is user-gated) — note
|
||
unreachable states in coverage rather than skipping silently.
|
||
- **Text/data stress:** test long names, long relationship labels, long question/answer text, emoji, multiline content,
|
||
empty optional fields, many list items, and both partners having similar names. Verify no clipping, overlap,
|
||
confusing attribution, broken sorting, or hidden actions.
|
||
- **Readability at scale:** default font size + spot-check largest system font scale on text-heavy screens. (The full
|
||
accessibility sweep — large-font on every primary flow, TalkBack labels, touch targets, keyboard, reduce-motion — is
|
||
**Pass J**; per-route performance/jank is **Pass I**.)
|
||
- **Orientation / form-factor (the app is NOT portrait-locked — `AndroidManifest.xml` declares no `screenOrientation`, so
|
||
it DOES rotate to landscape).** Don't only check "rotation doesn't lose state" (that's Pass F) — verify the **landscape
|
||
layout actually renders correctly** on the text-heavy / game / paywall / dialog screens (`adb shell settings put system
|
||
accelerometer_rotation 1` then rotate, or use the emulator rotate control): no clipped/cut-off content, no broken
|
||
scrolling, dialogs and bottom CTAs still reachable. Spot-check a **large-screen / tablet** AVD too. If landscape is not
|
||
a supported experience, the correct fix is to **lock portrait in the manifest** — file that as the finding (see the
|
||
app-finding note) rather than shipping an unverified landscape layout.
|
||
- **Navigation from every entry point:** reach each screen from **all** the places that link to it and confirm it
|
||
opens correctly each time — e.g. a conversation from the inbox AND from "Discuss" AND from a notification; a game
|
||
from the Play hub AND from a notification; Paywall from each gated feature; Settings sub-pages; reveal from Today
|
||
AND from history AND from `partner_answered`. A screen that works from one entry but breaks/duplicates from another = bug.
|
||
- **Every link, CTA, and mission must prove its destination:** actively hunt for dead buttons, wrong targets, generic
|
||
Home fallbacks, no-op taps, stale routes, and confusing affordances. Example class: a Reveal card saying
|
||
**"Tiny Mission: Send one flirty text"** must open the relevant Messages/conversation flow, not do nothing. For every
|
||
button/card/chip/row, record the expected destination before tapping, then verify the actual destination, state,
|
||
payload, and back stack. Broken/no-op/wrong-destination CTA = bug (usually P2; P1 if it blocks a core flow).
|
||
- **All routes into a game / join-game state (verify each opens the correct game + session + partner-state + mode +
|
||
premium/couple-entitlement + back stack):** Play-hub cards (incl. premium-gated), active-session banners, Home/Today
|
||
game prompts, game history, replay/results, waiting screens, notification-opened screens, in-app banners,
|
||
"join/resume/continue/view results/end (their) game", deep-link/crafted intent, and bottom-tab return into an active
|
||
game. Wrong/duplicate destination, double-back, stale-session join, dead-end, or a route that bypasses the
|
||
premium/couple check = bug.
|
||
- **TAKE EVERY AVENUE (exhaustive nav fuzzing — actively hunt for nav bugs, don't just walk the happy path):** treat
|
||
navigation as something to *break*. On every screen, **tap every interactive element** — each button, card, row,
|
||
icon, chip, link, tab, header back-arrow, system back, and any "see all / history / edit / manage" affordance — and
|
||
follow where it goes. Then try the *combinations and sequences* a curious user hits:
|
||
- **Every order:** switch bottom tabs in many orders, mid-flow (open a game, jump to Messages, come back); enter a
|
||
deep screen then tab away then back; open A→B→C then back-back-back.
|
||
- **Rapid / repeated input:** double- and triple-tap navigation targets (especially "open game", "Play now",
|
||
"Create/Start session", notification taps) to surface double-push/duplicate-screen/stale-route bugs (cf. B-004).
|
||
- **Interrupt mid-navigation:** background/rotate/lock during a transition; tap a notification while already on that
|
||
screen, on a different screen, and while logged-out/unpaired; cold-start straight onto a deep link.
|
||
- **Dead-ends & traps:** from *every* screen confirm there's always a way out (back/close/home) — no screen that
|
||
strands the user, needs two backs, exits the app unexpectedly, loops, or lands blank. Re-check the asymmetric-game
|
||
waiting screens, replay/results screens, and paywall specifically.
|
||
- Log **every** wrong/duplicate/dead destination with the exact tap sequence to reproduce. Wrong/double-back or
|
||
dead-end = **P2** (P1 if it traps the user or loses their progress).
|
||
- **Back-stack / "double back":** from every entry point, **system back AND the in-app back arrow** return to the
|
||
correct previous screen — no dead-ends, no exiting the app unexpectedly, and **no screen that requires pressing
|
||
back twice** (duplicate/stacked destinations on the back stack = bug). Bottom-tab reselection and deep-link/
|
||
notification entries must land with a sane back stack (back → Home, not off the app or a blank screen). Wrong/
|
||
double back or a dead-end = **P2** (P1 if it traps the user).
|
||
- **UI consistency / polish defects:** compare each screen against sibling patterns in the same area and across the
|
||
app. Headers, labels, status chips, partner names, connected-state copy, spacing, card treatments, and button
|
||
hierarchy should feel intentional and consistent. Awkward or out-of-place UI such as a Settings relationship row
|
||
where **"Connected with ..."** looks visually odd, cramped, misaligned, or unlike the rest of Settings is a finding:
|
||
file as a bug if it looks broken/inconsistent; log to `Future.md` only if it is purely a product/content improvement.
|
||
### Pass D — Security & encryption (cornerstone; findings default to P0)
|
||
> Read first: manual's [E2EE model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model) ·
|
||
> [Firestore rules](docs/Engineering_Reference_Manual.md#firestore-security-rules) ·
|
||
> [Encryption versions](docs/Engineering_Reference_Manual.md#encryption-versions). The cornerstone: every private field
|
||
> is ciphertext at rest, rules hold against non-members, keys/recovery are sound. **D3 (live negative raw-API) is
|
||
> MANDATORY every round** — never deferred to "only 2 emulators" (mint a non-member token via admin → Identity Toolkit
|
||
> `signInWithCustomToken` → Firestore REST). Run all of D1–D7:
|
||
|
||
- **D1 At-rest coverage:** admin-read RAW docs/objects, assert ciphertext for every private type — chat text +
|
||
`lastMessagePreview` (`enc:v1:`), chat media bytes (Tink `01 69 59 51 f0…`), answers (`sealed:v1:`/`enc:v1:`),
|
||
date plans + `date_swipes`, Memory Lane capsules, Bucket List. Also: **wrappedCoupleKey** + recovery material never
|
||
plaintext; **invite code (KDF seed) never stored raw**; **no push payload carries private content**.
|
||
- **D2 Rules audit (static):** member-only reads, author/server-only writes, ciphertext enforced on every private
|
||
field, immutability, **no premium self-grant**, entitlements write:false; re-audit conversations/typing/reactions
|
||
+ entitlement partner-read; **no catch-all** `match /{document=**}`; list/query not enumerable; `get()`-rules don't
|
||
over-expose; **no legacy plaintext/downgrade path** (`coupleEncryptionEnabled` holds; no disabled-encryption branch).
|
||
- **D3 Negative access tests (EXECUTE LIVE via raw API — do not defer):** a **non-member** account is *denied* reading
|
||
messages/answers/dates/entitlements/sessions/capsules, writing plaintext to encrypted fields, self-granting premium,
|
||
and any cross-couple access. Run it the **raw-API angle**: mint a non-member ID token (admin custom token →
|
||
Identity Toolkit `signInWithCustomToken` REST) and issue Firestore REST GET/PATCH against the couple's docs — expect
|
||
App Check `403` or rules `PERMISSION_DENIED` on every attempt. Also issue the **same** reads with a **member** token to
|
||
characterize the enforcement layer (App Check vs rules). Any unauthorized `200` with couple data = **P0**.
|
||
- **D4 Key exchange / management / recovery (E2EE crux):** couple key client-generated, only leaves device **wrapped**
|
||
(KDF from invite seed; server holds only `wrappedCoupleKey`+`kdfSalt`/`kdfParams`+`encryptedRecoveryPhrase`); **KDF
|
||
strength**; Tink AEAD = AES-GCM/256 with **AAD=coupleId**, no weak/custom crypto/nonce reuse; keybox/sealed/commitment
|
||
integrity; **recovery-wrap server-blind**; **unpair revokes decrypt**; invites CSPRNG + single-use + expiry.
|
||
- **NEW-DEVICE / LOST-PHONE RECOVERY — drive it end-to-end, don't just verify the phrase is revealed (the make-or-break
|
||
data-continuity path for an E2E app).** The keys are single-device (a known limitation); the recovery phrase is the
|
||
only bridge. Infra: `crypto/RecoveryKeyManager.kt`, `data/local/RecoveryPhraseStore.kt`, `ui/pairing/RecoveryViewModel.kt`,
|
||
`crypto/CoupleKeyStore.kt`. Exercise the **full flow on a fresh install / second device**: sign in → enter the
|
||
recovery phrase → the couple key is rebuilt → **prior `enc:v1:`/`sealed:v1:` messages and answers actually DECRYPT
|
||
and render** (not just new ones). Then the failure paths: a **wrong/typo'd phrase** fails gracefully (clear error, no
|
||
crash, no corruption); a user who **lost the phrase** is told honestly what is/isn't recoverable; and throughout, the
|
||
**partner's** device keeps working (one side recovering must never break the other). Confirm the server stayed blind
|
||
(only `wrappedCoupleKey`/`encryptedRecoveryPhrase` ever transit — verify via admin read). Without this, "I got a new
|
||
phone" silently loses the relationship history. (Also exercised from the account-lifecycle angle in Pass F and the
|
||
Settings → Security flow in Pass M.)
|
||
- **D5 App Check / Functions / secrets:** App Check enforced; callables validate auth+membership; webhook authenticity;
|
||
admin-only writes rejected from clients; service-account JSONs never committed; no plaintext/secrets in logcat; temp
|
||
files deleted.
|
||
- **D6 Leak vectors:** no private content in analytics/crash; `allowBackup=false` + backup rules exclude sensitive data;
|
||
deep links re-check membership; clipboard user-initiated; consider `FLAG_SECURE`; repo scan for committed secrets.
|
||
- **D7 Encryption migration:** test the `encryptionVersion` paths (0 plaintext → 1 migrating → 2 strict) on a legacy
|
||
couple — migration completes without exposing plaintext or losing/garbling old content, and a half-migrated couple
|
||
is safe (no mixed read failures, no downgrade). This is the riskiest data path for existing users.
|
||
|
||
### Pass G — Account creation, validation & fake-account abuse (MANDATORY — both the happy path AND the attacks)
|
||
Cover **every account-creation avenue a real user takes** and **every fake/abusive creation attempt an attacker would
|
||
try.** Use throwaway test accounts (sign-out → fresh sign-up; never `pm clear`). Report-first like every pass.
|
||
- **Real creation flows (happy path + validation):** sign-up (email/password and any social/anonymous path), profile
|
||
creation, and pairing — both **create-invite** and **accept-invite** sides. Verify field validation (invalid/empty
|
||
email, weak/short password, mismatched confirm, name length/emoji/unicode), the **error copy is friendly** (no raw
|
||
SDK/Firebase error leaking — cf. A-OBS), loading/disabled states, and that a brand-new unpaired account lands on the
|
||
correct "create or accept invite" home (not a broken/blank or paired view).
|
||
- **Duplicate / conflicting creation:** sign up with an **already-registered email** (clear "already in use", no crash,
|
||
offer sign-in); create a second account while one is signed in; re-run onboarding after completing it; accept an
|
||
invite while **already paired** (must be rejected cleanly); two devices accepting the **same invite** (single-use —
|
||
the second must fail gracefully).
|
||
- **Fake / malicious creation attempts (security — expect DENY, never crash or leak):** create an account that is
|
||
**NOT a member** of the test couple and attempt every cross-couple action (read messages/answers/dates/entitlements,
|
||
write to the couple, self-grant `premium`/`hasPremium`, join/hijack pairing with a guessed/expired/reused invite
|
||
code) — all must be **denied by rules** (this is the live execution of **D3**). Probe **invite-code abuse**: replay a
|
||
used code, use an expired code, brute-force/guess attempts (CSPRNG entropy + single-use + expiry must hold). Probe
|
||
**App Check**: a request without a valid token is rejected. Confirm a malformed/forged sign-up can't bypass profile
|
||
or membership requirements. **Any successful unauthorized create/read/write = P0.**
|
||
- **Account lifecycle around creation:** sign-out → sign-in (state restores, no stale couple); **delete account** then
|
||
re-create with the same email (clean slate, partner notified/unpaired); an unpaired/just-created account tapping a
|
||
stale notification or deep link is handled gracefully (no crash, sane landing).
|
||
- **Done = every creation avenue exercised** (happy + duplicate + malicious) with each attack **denied** and each happy
|
||
path validated end-to-end; findings filed with exact repro.
|
||
|
||
### Pass E — Full notification suite, deep-links & join-game navigation (every type, both clients, every app state)
|
||
Run the **complete** suite across **both clients** (QA→Sam AND Sam→QA). Each type verified end-to-end: **trigger fires
|
||
→ delivered to the right partner (never self/non-member/ex-partner) → correct channel + copy with no private content →
|
||
tap opens exactly the right item (loaded, not generic Home/dead-end) → sane back stack → privacy/authz re-checked on
|
||
open**. No duplicates; rate limiter (20/day, 100/week) doesn't drop legit ones.
|
||
- **Notification chunk contract (small chunks, complete coverage):** each chunk owns **one notification type** (or one
|
||
explicit subchunk of that type, e.g. `chat_message QA→Sam foreground/source-screen sweep`, then
|
||
`chat_message Sam→QA background+killed+stale`). Before starting, write the chunk's matrix in `ClaudeQACoverage.md`;
|
||
after finishing, mark each cell `pass | fail→id | blocked→id | not implemented→Future.md`. A notification type is
|
||
not complete until all applicable cells below are covered:
|
||
- **Directions:** QA→Sam and Sam→QA; sender must not receive their own push unless intentionally designed.
|
||
- **Process states:** foreground, background/warm, killed/cold-start, force-stopped if deliverable, screen locked,
|
||
and resumed after rotation/process recreation when relevant.
|
||
- **Current screens:** Home, Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation,
|
||
Settings/sub-settings, Paywall, unrelated deep screen, logged-out, unpaired, and stale prior-partner context.
|
||
- **Entry surfaces:** foreground in-app banner/head, Android system tray tap, any push action button, crafted
|
||
deep-link/intent matching the payload, repeated/double tap, and tap after the target has changed.
|
||
- **Targets:** fresh target, already-open target, completed target, stale/expired/deleted target, unauthorized target,
|
||
wrong couple/session/item ID, malformed/missing extras, and no-network-on-open.
|
||
- **Assertions:** correct recipient, correct channel/priority/copy, no private payload/log content, exact destination,
|
||
membership/auth/entitlement re-check, no duplicate route/session, sane back stack, logcat clean, and coverage/docs
|
||
updated before the chunk ends.
|
||
- **Notification tap crash triage (mandatory):** never conclude "the notification didn't open" from UI behavior alone.
|
||
Before each notification/deep-link tap, clear or timestamp logcat; after the tap, inspect both devices for
|
||
`FATAL EXCEPTION`, ANR, ActivityTaskManager errors, `RuntimeException`, navigation/deep-link exceptions,
|
||
`PERMISSION_DENIED`, and swallowed repository/decryption errors. If the app returns Home, stays put, flashes,
|
||
restarts, or silently fails, classify whether it was wrong routing, missing extras, stale data, permission denial, or
|
||
a crash. Any notification tap that crashes (example class: tapping a game notification to open **Spin the Wheel**)
|
||
is a filed bug with stack trace + exact payload/session/game type, not a vague "didn't open" note.
|
||
- **Test the REAL launch path, not a synthetic one.** `adb am start … --es type=…` does **not** reproduce a real
|
||
notification tap: the OS notification tap launches the activity through the **SysUILaunch splash handover**
|
||
(`reportSplashscreenViewShown` → `handOverSplashScreenView`), which `am start` skips. A whole bug class
|
||
(e.g. the **splash-exit `provider.iconView` NPE** — the handover delivers a splash view with **no icon**,
|
||
`SplashScreenView: Icon: view: null`, on notification cold-starts only) crashes onCreate → "Force finishing
|
||
activity" → the app **opens-and-closes**, yet `am start` AND the normal launcher icon both pass. Verdict: for
|
||
cold-start/notification routing, a synthetic-intent pass is **not** a pass — confirm with a real push tapped from
|
||
the shade on an `am kill`'d app.
|
||
- **"Opens and closes / flashes / returns to launcher" ⇒ assume a crash; pull the stack FIRST.** `logcat -c`
|
||
before the tap, then grep `FATAL EXCEPTION|AndroidRuntime|Force finishing|getIconView`. A real repro + the stack
|
||
trace beats code-reasoning every time (this bug was misdiagnosed as deep-link routing until the live stack named
|
||
`MainActivity.kt` + `SplashScreenViewProvider.getIconView`). Confirm crashes reach **Crashlytics** so field cold-start
|
||
crashes surface.
|
||
- **Many notification types "broken" at once ⇒ suspect the SHARED entry path (splash/`onCreate`/launch), not each
|
||
handler.** When chat AND every game's results push all fail identically, the bug is in what they share (the
|
||
cold-start path), not per-type routing. Re-run a **cold-start smoke after ANY change to** `MainActivity` / splash /
|
||
theme / manifest / launchMode / branding-"loading state" commits — these cosmetic-looking changes broke the launch.
|
||
- **For "worked before, broken now": `git blame` / `git log -L` the crashing line/function** to pin the introducing
|
||
commit, then re-test that exact path on it.
|
||
- **Both-client × app-state matrix (per type):** QA→Sam and Sam→QA, each in **foreground / background / killed
|
||
(cold-start)**, plus **already on the target screen**, **on a different screen**, **logged out**, **unpaired**, with
|
||
a **stale/expired/completed/deleted target**, and **both users opening around the same time**. Not a `pass` unless it
|
||
works from both clients in every state that applies.
|
||
- **Current-screen/source-screen matrix (per type):** do not test notifications only from Home or only from a clean
|
||
launch. For each notification type, vary where the receiving client is when the notification arrives/taps: **Home,
|
||
Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation, Settings/sub-settings,
|
||
Paywall, an unrelated deep screen, app backgrounded from each major tab, and app fully closed/killed**. Foreground
|
||
banners, system-tray taps, warm-start `onNewIntent`, and cold-start launch must all route to the exact target. A tap
|
||
that lands on generic Home, stays on the old screen, opens the wrong tab, loses extras, duplicates the destination,
|
||
or needs a second tap is a bug.
|
||
- **Permission/token health:** cover Android `POST_NOTIFICATIONS` granted, denied, "don't ask again"/system-disabled,
|
||
and re-enabled states; Settings notification toggles; sign-out/sign-in token refresh; same account on two devices;
|
||
partner/account switch; stale token cleanup; app reinstall/update; and notification channel migration. Denied/system
|
||
disabled notifications should fail gracefully with in-app state still correct, never with lost data or broken routing
|
||
after permission is restored.
|
||
- **Doze / battery-optimization / background-restriction delivery (real-device gate — emulators NEVER enter these states,
|
||
so per-round emulator passes systematically miss the #1 real-world "notifications don't work" cause).** Scheduling is
|
||
entirely server-side (no client `WorkManager`), so the only thing standing between a fired push and the user is the OS
|
||
power state. On a **physical device**, verify each push type still delivers when the recipient device is dozing /
|
||
the app is battery-optimized or "Restricted": `adb shell dumpsys deviceidle force-idle` (then send a real partner
|
||
action + a scheduled push), app set to **Optimized** then **Restricted** in battery settings, and App Standby buckets.
|
||
Assert: high-priority FCM (partner actions) wakes the device and delivers; lower-priority/data-only pushes degrade
|
||
*predictably* (document which); scheduled pushes (daily question, capsule unlock, reminders) still arrive within the
|
||
expected window. Because our recurring setup is two emulators, keep this as a `blocked→needs-device` row in
|
||
`ClaudeQACoverage.md` (with the device-matrix gate) rather than silently assuming delivery — and run it before any
|
||
store push. OEM battery-killers (Xiaomi/Samsung/etc.) are even more aggressive; note them for the device matrix.
|
||
- **Six assertions per notification:** (1) trigger fires correctly — right event, not early, not twice, sender doesn't
|
||
get their own (unless intended), retry/idempotency doesn't duplicate; (2) delivered to the right person — correct
|
||
token, old tokens unused after sign-out/account-switch; (3) copy + channel correct — friendly, right channel/
|
||
priority, no raw Firebase error/raw IDs, no private content in text/payload/logs/analytics/crash; (4) tap opens the
|
||
exact destination — specific conversation/session/capsule/match/question/settings/pairing, never blank, never a crash
|
||
on missing/stale/malformed/unauthorized data, no duplicate/stacked copies, completed→results/replay, expired/deleted→
|
||
graceful fallback; (5) back stack sane — back returns sensibly (Home/prev context), no double-back, no unexpected
|
||
exit/loop/blank; (6) deep-link re-checks auth + couple membership + pairing + entitlement + target ownership +
|
||
session status + existence — a non-member/logged-out/stale/unpaired open must NOT reach private content and must fail
|
||
gracefully.
|
||
- **`qa/qa_push.js` is faithful to the PUSH, not the TRIGGER — assertion #1 needs ≥1 real in-app action per round.**
|
||
`qa_push.js` sends the FCM via admin (`messaging().send`), so it faithfully reproduces delivery + channel/copy +
|
||
cold-start launch + tap-routing (use it for the bulk of the type×state matrix and the `entrypoint_smoke.sh` smoke).
|
||
But it **bypasses the Cloud Function** — no `onMessageWritten`/`onGameSessionUpdate`/`onAnswerWritten`/`createDateMatch`
|
||
actually ran. So a `qa_push.js`-only round can **never** satisfy assertion #1 (**trigger fires correctly**): a broken
|
||
or un-deployed trigger (Firestore-path change, deploy regression, rules change) is **invisible** to synthetic pushes.
|
||
Each round, drive **≥1 real in-app partner action** (send a chat, finish a game, answer the daily Q) and confirm the
|
||
matching push lands on the partner. (UI-automation tip: the chat composer's send button is the rightmost control in
|
||
the composer row, content-desc `Send`; if `uiautomator` taps mis-fire, verify the action via admin read — the new
|
||
message/answer doc exists `enc:v1:` — rather than claiming the trigger from a synthetic push.)
|
||
- **Inventory (type → Cloud-Function trigger → recipient → destination)** — verify each; mark any unimplemented type
|
||
`not implemented→Future.md` (don't count as pass):
|
||
`chat_message`(onMessageWritten → partner → conversation; foreground→chat-head bubble) ·
|
||
`partner_started_game`/`partner_finished_game`(onGameSessionUpdate → partner → game/join · results/reveal) ·
|
||
`partner_completed_part`(**onGamePartFinished** → waiting partner → game; fired when the FIRST player finishes an
|
||
async game so the partner is told "your turn" — async games complete only when BOTH answer, so without this the
|
||
waiting partner got nothing between first-finish and both-finish) ·
|
||
`join_game`/`game_invite` & `partner_joined_game` (if present → partner/starter → join screen · waiting-room update) ·
|
||
`partner_answered`(onAnswerWritten → partner → reveal) ·
|
||
`game_abandoned`/`game_ended` (if present → partner → safe ended state, not a stuck session) ·
|
||
`daily_question`(assignDailyQuestion)/`daily_question_reminder`/`daily_reminder`(dailyQuestionReminder → Today) ·
|
||
`date_match`(createDateMatch → match) · `date_plan_update` (if present → date plan/builder/match) ·
|
||
`partner_joined`+`invite_created`(acceptInviteCallable → pairing/home) ·
|
||
`partner_left`(onCoupleLeave)/`partner_deleted_account`(onUserDelete → home/relationship settings) ·
|
||
`memory_capsule_unlocked`(scheduled → capsule) & `memory_capsule_created` (if present → Memory Lane/locked capsule) ·
|
||
`challenge_day_ready`(→ Connection Challenges) & `challenge_day_completed` (if present → challenge progress) ·
|
||
`outcome_reminder`(scheduledOutcomesReminder) · `reengagement`(reengagement/gameRetention) ·
|
||
`gentle_reminder`(sendGentleReminderCallable) · `spki`(key identity/confirm → security/key screen) ·
|
||
`subscription_entitlement_changed` & `security_recovery` (if present).
|
||
- **Game-notification suite (per game):** A starts from Play hub → B gets the start/join push (if supported) → B taps
|
||
and lands on the correct join/waiting/active screen → B can join from there → A sees B joined/answered → both finish
|
||
→ finish push opens the exact results/reveal → re-opening the push after completion opens replay/results (not a dead
|
||
active session) → if A ends/quits, B is notified or shown a graceful ended state → a **stale** game push routes to
|
||
results/history or a clear expired-session message → simultaneous start/join yields **one** session, neither stuck →
|
||
premium gate holds (neither-premium push must NOT bypass paywall; either-premium unlocks for both). For each game
|
||
type, including **Spin the Wheel**, notification taps must be paired with logcat review so crashes are caught even if
|
||
the visible symptom looks like a no-op or generic Home fallback.
|
||
- **Join-game navigation suite:** every entry that leads to joining/resuming a game opens the correct game + session +
|
||
partner-state + mode + entitlement + back stack — Play-hub card, active-game banner/card, Home active-game card,
|
||
Today game prompt, notification tap, in-app foreground banner, game history/replay, partner waiting screen, results/
|
||
reveal, "End their game"/stuck-session recovery, deep-link/crafted intent, cold-start from push, bottom-tab return
|
||
into an active game, any push action buttons, and any "join/resume/continue/view results/play again". No wrong game
|
||
type, no accidental stale-session join, no duplicate session on double-tap, back returns correctly.
|
||
- **Payload security (P0 on any hit):** inspect raw payload + logs — no plaintext message/answer/capsule/date-plan/
|
||
bucket-list/swipe content, no raw invite code/seed, no recovery phrase, no wrapped/decrypted key material, no
|
||
email/name unless intentionally public; payload carries only the minimum routing metadata. Any private content = P0.
|
||
- **Malformed / stale intents:** fire crafted deep-links with missing/unknown type, missing/wrong target or couple ID,
|
||
wrong game type, expired/completed/deleted target, unauthorized couple/session, malformed params, duplicate/rapid
|
||
taps, a push for another user/previous partner, while logged-out/unpaired, while on the target screen, and during a
|
||
different active game → never crash/leak, always a graceful fallback + sane back stack.
|
||
- **Scheduled/time-based:** trigger manually (invoke callable/function or seed the due condition — user-gated).
|
||
- **Foundations:** FCM token registration on sign-in (`TokenRegistrar`) + `onNewToken` + token cleanup on sign-out/
|
||
account-switch; POST_NOTIFICATIONS prompt + denied path; channels (`di/NotificationModule`); deep-link routing
|
||
(`MainActivity.deepLinkRouteFromIntent` → `AppNavigation`); foreground/background split
|
||
(`core/notifications/AppMessagingService`); no duplicate local+remote notification.
|
||
- **Coverage:** record per row `type × trigger × recipient × app-state × destination × back-stack × privacy ×
|
||
both-client` in ClaudeQACoverage.md; only `pass` when delivery + routing + back-stack + privacy + both-client are all
|
||
verified. Missed delivery or wrong deep-link = P1; private content in any payload = P0.
|
||
|
||
### Pass F — Resilience, concurrency, lifecycle & time (cross-cutting; a 2-user realtime app needs these)
|
||
- **Concurrency / realtime races (two partners at once):** both answer the daily question simultaneously; both
|
||
start/join the same game; both swipe a date / react at once; one quits while the other submits; both tap a
|
||
notification at once; partner acts while you're mid-flow. No lost writes, no stuck state, no duplicate sessions,
|
||
reveal still correct. (This is where a couples app breaks.)
|
||
- **Lifecycle / process death:** background mid-flow + return; force-kill the app and relaunch (Android may kill the
|
||
process) — state/auth/draft restore sanely; deep-link/notification after process death still loads (verified for
|
||
chat — extend to all). Rotation/config-change doesn't lose Compose state. Low-memory.
|
||
- **Deterministic state-restoration ("Don't keep activities" — do NOT rely only on `am kill`).** `am kill` is
|
||
non-deterministic; enable **Developer options → Don't keep activities** (`adb shell settings put global
|
||
always_finish_activities 1`) so the Activity/process is destroyed on *every* backgrounding, then walk each primary
|
||
flow (sign-up, pairing, a game mid-answer, an unsent message draft, capsule/Date Builder in progress, paywall) and
|
||
background→return at each step. Assert **no lost** form input, scroll position, draft, in-progress game state, or
|
||
nav back-stack — i.e. `rememberSaveable`/`SavedStateHandle` actually persist it. Restore with
|
||
`adb shell settings put global always_finish_activities 0` after.
|
||
- **Interruptions mid-flow (the OS or another app steals focus):** incoming phone call, alarm, another app taking the
|
||
foreground, screen-off/on, **split-screen / multi-window**, and picture-in-picture during a game/answer/message-compose
|
||
→ returning resumes cleanly with no lost state, no crash, no duplicate submit, and audio/camera (voice note, photo)
|
||
releases + re-acquires sanely.
|
||
- **Cold-start launch integrity from EVERY entry point (Pass F OWNS this — it's the shared path no other pass owned, and
|
||
where the splash-crash hid):** the app must **open AND stay** (no crash, no "opens-and-closes", lands off the launcher)
|
||
when cold-started from: the **launcher icon**, **each notification type tapped from a killed (`am kill`) app**, a
|
||
**deep link**, and any widget/quick-action. This is the `MainActivity`/splash/`onCreate`/auth-bootstrap path; a crash
|
||
here (e.g. splash-exit `iconView` NPE) breaks **all** notifications at once. **Run `qa/entrypoint_smoke.sh` here every
|
||
round and after any MainActivity/splash/theme/manifest/nav/notification change.** Reproduce via the REAL push tapped
|
||
from the shade (not `am start`); "opens-and-closes" ⇒ pull the FATAL stack (see Pass E crash-triage).
|
||
- **Network resilience:** offline / flaky / airplane mid-action across answers, games, dates (not just chat media) —
|
||
graceful failure + retry/queue, no crash, no silent data loss, recovery on reconnect.
|
||
- **Idempotency / rapid input:** double-tap send/submit, rapid nav, double-start, double-join, repeated paywall-unlock
|
||
taps — guarded (no double-send, no duplicate session, no crash).
|
||
- **Time-dependent behavior:** daily-question rollover (6 PM CST assignment), streak day-boundary + repair window,
|
||
capsule unlock times, reminder schedules, challenge-day availability, timezone change — test across a date change
|
||
(manipulate device clock / trigger functions).
|
||
- **Account/couple lifecycle:** brand-new (empty) account; unpaired state; pair → unpair → re-pair; partner leaves
|
||
mid-session; account deletion cascade; same account on two devices; stale notifications after unpair/delete are
|
||
graceful; invite accepted while already paired is rejected cleanly. No orphaned/broken state.
|
||
- **Install/update/migration lifecycle:** fresh install, update over an existing signed-in install, app data retained,
|
||
Room/DataStore/SharedPreferences migrations, notification channel migration, cached encryption/key material,
|
||
pending deep links/notifications across update, and version-skew between partners if one device updates first. No
|
||
sign-out loops, stale build routing, lost local state, broken permissions, or migration crashes.
|
||
- **Crash reporting:** confirm crashes/ANRs are actually captured (Crashlytics) so field issues surface.
|
||
|
||
### Pass H — Branding & artwork (every screen: could it carry more of the brand? where would art help?)
|
||
**Branding review is a MANDATORY part of QA every round** (not an optional polish pass) — its findings + the assets to
|
||
create are logged in `ClaudeBrandingReview.md`. A consumer-mindset pass focused on **brand presence and delight** AND
|
||
two hard brand standards. Walk **every screen and surface** and ask: *does this feel like Closer (private, warm, equal,
|
||
intentional — a ritual for two)? Could brand color, the heart mark, a brand message, or an illustration make it warmer
|
||
or clearer without clutter?* Output is **artwork descriptions written as ready-to-paste ChatGPT image-generation
|
||
prompts** — the user generates the images; we only describe them.
|
||
- **MANDATE 1 — every image has a light AND a dark variant (theme-matched).** Cross-check every image-bearing page
|
||
against the **Image theme-variant coverage** table in `ClaudeBrandingReview.md`; a light-only image shown on dark (or
|
||
vice-versa) is a **bug → `ClaudeReport.md`** and the missing variant is a **prompt to make → `ClaudeBrandingReview.md`**.
|
||
(Shares the per-page audit with Pass C; H owns producing the prompts + tracking the coverage table.)
|
||
- **MANDATE 2 — every icon/glyph is a CUSTOM Closer glyph (no generic Material icons / generic hearts).** Audit all
|
||
icons in use (`grep -rE "Icons\.(Filled|Outlined|Rounded|Default|AutoMirrored)\."`); each generic icon is a brand
|
||
defect → `ClaudeReport.md` + a **custom `glyph_*` to make → `ClaudeBrandingReview.md`** (the **Icon/glyph audit**
|
||
table). The bar for ship: **zero generic Material icons** — every icon is bespoke and on-brand.
|
||
- **Existing art integration check:** judge the art as part of the whole page, not as a standalone asset. Confirm each
|
||
image supports the screen's job, aligns with the surrounding typography/actions, has enough breathing room, and uses
|
||
the right light/dark treatment. Art that looks generic, unfinished, randomly placed, or visually disconnected is a
|
||
finding even if the bitmap itself is technically valid.
|
||
- **Soft edges (art melts into the surface):** illustrations should **fade/feather into the screen background**, not read
|
||
as a hard-edged tile/card with a crisp boundary or outline. Confirm edge treatment on both themes; a hard tile edge is
|
||
a finding (C-ART-EDGE-001). Generated art should carry **transparent/feathered edges** (no baked-in rounded-rect block);
|
||
if rendered, the shared helper should fade the edges to the surface. Record the desired edge treatment in each prompt.
|
||
- **First, lock the house style (do this once per round, refresh if the art evolved):** read `docs/brand/visual-identity.md`
|
||
+ `docs/brand/asset-system.md` AND open 2–3 existing illustrations (`illustration_couple_onboarding`,
|
||
`illustration_reveal_celebration`, `pack_art_*`) to capture the *actual* look. New screens/features since the last
|
||
brand review must be folded in. Keep the canonical **house-style prompt prefix** + palette in the branding deliverable
|
||
(`ClaudeBrandingReview.md`) so every prompt reuses it and **all generated art matches the existing artwork.**
|
||
- **House style (must hold for every prompt):** flat 2D pastel vector illustration; soft rounded shapes, no harsh
|
||
outlines, gentle gradients; palette aubergine `#24122F` / deep purple `#56306F` / lavender `#B98AF4` / soft pink
|
||
`#F7C8E4` / soft lavender `#D9B8FF` / blush white `#FFF8FC`; motifs = two-equal-halves heart, paired/sealed cards,
|
||
floating hearts + petals, candle/mug/lavender-sprig warmth, moon/quiet-hours, calendar/date-card, capsule; mood =
|
||
warm, quiet, equal, intentional. Couple figures balanced + inclusive, faces simple. **Never** show readable answer/
|
||
prompt/message text, invite codes, emails, dating-app clichés, stock photos, alarm/urgency/surveillance imagery.
|
||
- **Per screen, decide the brand opportunity** (pick the lightest that fits — don't over-decorate):
|
||
- none needed (already on-brand, or a dense list/form where art would clutter) — say so;
|
||
- **color/typographic** brand touch (palette, heart mark, a rotating privacy message);
|
||
- **small glyph** (brand glyph for a relationship concept — describe it for the glyph set);
|
||
- **hero/empty-state/celebration illustration** (the high-value case → write the full ChatGPT prompt).
|
||
- **Each artwork item records:** screen/route · placement (hero / empty / header / card / celebration) · why it helps ·
|
||
filename to match the existing scheme (`illustration_*`, `pack_art_*`, `glyph_*`, `particle_*`) · **the ChatGPT
|
||
prompt** (house-style prefix + the specific scene) · aspect ratio/size + light/dark behavior. Cross-check the
|
||
brand doc's "Needed additions" / empty-state list and **mark which already have assets vs still need art** (e.g.
|
||
Android may still lack illustrations that iOS has).
|
||
- **Prioritize** the screens a user feels most: onboarding/pairing, Home, paywall/subscription, reveal/celebration,
|
||
empty states (no messages/dates/capsules/history), Memory Lane, Connection Challenges, date match, quiet-hours.
|
||
- Branding *defects* (mis-colored, clipped, off-brand, low-contrast art) are bugs → `ClaudeReport.md`. Pure
|
||
"works but could be warmer / a feature idea" → `Future.md` `## QA`. New art to create → `ClaudeBrandingReview.md`.
|
||
|
||
### Pass I — Performance & route efficiency (jank, redundant reads, caching) [Future.md P14]
|
||
Before store polish, profile **every top route** and **every high-cardinality list** for jank, repeated Firestore
|
||
reads, missing cache use, and slow navigation. Drive each route as a user and instrument reads/frames.
|
||
- **Frame / jank:** scroll every long list (Messages inbox + conversation, Answer History, Question Packs, Past Games,
|
||
Wheel History, Bucket List, Date deck, Activity/Progress) and open every top route while watching
|
||
`adb shell dumpsys gfxinfo <pkg> framestats` (or Perfetto / Studio Profiler) — flag dropped/janky frames, slow first
|
||
frame, and `Choreographer: Skipped N frames` / main-thread stalls in logcat. Transitions/animations stay smooth (~60fps).
|
||
- **Redundant Firestore / network reads:** count listeners/gets per screen. Switching bottom tabs and returning must
|
||
**not** refetch unchanged data; opening a screen twice must not double-read; **snapshot listeners detach on leave**
|
||
(no leaked/stacked listeners — a 2-user realtime app accumulates these fast). Watch for N+1 reads on lists.
|
||
- **Memory leaks (beyond listener leaks):** add **LeakCanary** in the debug build (or take heap dumps) and navigate
|
||
in→out of every heavy screen (conversation with media, game, image viewer, Memory Lane) repeatedly — flag retained
|
||
Activities/Composables/bitmaps/Contexts. A leak that grows per navigation = bug (P2; **P1** if it OOMs).
|
||
- **StrictMode in debug (catch main-thread I/O + leaked closeables cheaply):** enable a `StrictMode` thread + VM policy
|
||
in the debug `Application` (`detectDiskReads/Writes/Network`, `detectLeakedClosableObjects`); any violation logged on a
|
||
primary flow is a finding (disk/network on the main thread → jank/ANR risk).
|
||
- **Caching / lazy-load:** static question/category data is cached locally (Room) and not re-fetched each entry; large
|
||
lists use lazy paging (`LazyColumn`/paging, not load-all); images cached (Coil); offline reads serve from cache.
|
||
- **Latency:** measure cold-start-to-interactive (splash→loader→Home) and tab/route transition latency; flag anything
|
||
perceptibly slow (>~300ms).
|
||
- **Deliverable:** a reusable **route smoke-test checklist** (every top route × {load time · jank · read count}),
|
||
captured as a runnable script so each round re-checks cheaply.
|
||
- **Remediation when found:** lazy-load/page large lists; cache local question/category data; dedupe + scope snapshot
|
||
listeners; skip redundant fetches on tab switches; add skeleton/loading states (cf. Future.md P8) over blocking spinners.
|
||
- Findings: real jank/leak/redundant-read = bug → `ClaudeReport.md` (P2; **P1** if it ANRs or leaks listeners, **P0** if
|
||
it drops data); "could be smoother / add skeletons" → `Future.md` `## QA`.
|
||
|
||
### Pass J — Accessibility (font scale · contrast · screen reader · targets · keyboard · reduce-motion) [Future.md P15]
|
||
Every **primary flow** must be usable with accessibility settings on. Enable each setting and walk the core flows
|
||
(auth, onboarding, pairing, Home, a full game, daily question + reveal, Messages, Paywall, Settings) end to end.
|
||
This is the deep home for a11y; the Pass C contrast/font spot-checks feed into it.
|
||
- **Font scaling:** `adb shell settings put system font_scale 1.3` (then 1.5, 2.0) — every primary flow stays usable:
|
||
**no clipped/overlapping text, no cut-off or hidden buttons/actions** (scroll where needed). **Acceptance: all primary
|
||
flows usable at increased font scale without clipped buttons or hidden actions.** Restore `font_scale 1.0` after.
|
||
- **Screen reader (TalkBack):** every interactive element has a meaningful semantics/`contentDescription` (icon-buttons
|
||
especially: back, send, like, close, the brand-mark loader, game option cards); decorative images are silenced
|
||
(`clearAndSetSemantics {}` / null desc); reading order is logical; no unlabeled "Button"; custom controls (spin wheel,
|
||
date swipe deck, answer cards) are operable + announced; no focus traps.
|
||
- **Contrast:** body text + essential icons meet WCAG AA (4.5:1 body / 3:1 large) in **both** themes — measure, don't
|
||
eyeball; re-check the known dim spots (game answer text, muted captions, the C-DS-001 area).
|
||
- **Don't rely on color alone (color-blind / WCAG 1.4.1):** any state conveyed by color must also carry a non-color cue
|
||
(icon, label, shape, position). Audit the **match/mismatch** rendering (e.g. `AnswerRevealScreen`), status chips,
|
||
selected/disabled states, and any red=bad/green=good signal — they must be distinguishable in grayscale / with a
|
||
color-blindness simulation (`adb shell settings put secure accessibility_display_daltonizer_enabled 1`). Color-only
|
||
status = bug.
|
||
- **Touch targets:** interactive targets ≥ **48dp** (icon buttons, chips, nav, close/back, reaction buttons, swipe-deck
|
||
actions). Flag anything smaller.
|
||
- **Keyboard / external input:** with a hardware keyboard, forms (sign-up, message, capsule, profile) tab in a sane
|
||
order, IME/Enter actions work, focus is visible, no traps.
|
||
- **Reduce-motion:** with "Remove animations" (`adb shell settings put global animator_duration_scale 0`), the loader,
|
||
celebration particles, reveals, splash handoff, and transitions degrade gracefully and **no motion-gated content
|
||
becomes unreachable** (the loader/particles already honor this — verify everywhere). Restore to `1` after.
|
||
- **Remediation:** add semantics labels, raise touch targets, fix contrast tokens, guard motion behind the reduce-motion flag.
|
||
- Findings: missing label / clipped-at-large-font / sub-48dp / failing contrast = bug → `ClaudeReport.md` (**P2**; **P1**
|
||
if it blocks a primary flow for assistive-tech users); polish → `Future.md` `## QA`.
|
||
|
||
### Pass K — Billing & subscription lifecycle (the REAL money path, not the admin toggle)
|
||
**Pass A tests the GATE (couple-shared unlock via an admin entitlement toggle); Pass K tests how the entitlement is
|
||
actually earned, kept, and lost.** This is the revenue path and it is almost entirely unexercised by the admin toggle.
|
||
Read the manual's [Billing](docs/Engineering_Reference_Manual.md#billing) section first. **Needs real services**
|
||
(Google Play Billing sandbox + a Play **license tester** + RevenueCat) — emulators can't do real IAP, so run on a
|
||
**physical device** with a sandbox account, or mark each money-path row `blocked→needs-device` (the admin toggle is
|
||
**not** a substitute for these).
|
||
- **Purchase, end to end:** Paywall → select a plan → Play billing sheet → buy as a sandbox tester → RevenueCat →
|
||
`revenueCatWebhook`/`syncEntitlement` → `users/{uid}/entitlements/premium` flips active → features unlock for **both**
|
||
partners (couple-shared, Pass A) → `onEntitlementChanged` fires the partner push → the one-time **Premium-unlock modal**
|
||
(`PremiumUnlockOverlay`) shows once for **each** partner.
|
||
- **Restore purchases:** Paywall "Restore" on a reinstall / second device / after sign-out→in → entitlement restored,
|
||
no double-charge, features unlock.
|
||
- **Plan switching:** monthly ↔ annual upgrade / downgrade / crossgrade → correct proration + entitlement continuity.
|
||
- **Trial / intro pricing** (if configured); **price + currency are displayed from the store, localized, never
|
||
hardcoded**; plan list + benefits render; offline/SDK-error paywall is friendly (A-OBS), Continue hidden until plans load.
|
||
- **Cancel → expiry → RE-LOCK:** cancel keeps access until period end (`expiresAt`); at expiry, `CouplePremiumChecker`
|
||
reports inactive → premium features **re-lock for BOTH** and the premium-unlock "celebrated" flag re-arms. Test the
|
||
`expiresAt` boundary (admin: set it just-past) — the couple-shared checker must treat a lapsed entitlement as inactive.
|
||
- **Billing retry / grace period / account hold / pause** (Play states) → entitlement + UI reflect the state; no hard
|
||
crash, clear messaging.
|
||
- **Refund / revocation:** RevenueCat `CANCELLATION`/`EXPIRATION`/refund webhook → entitlement removed promptly → re-lock.
|
||
- **Security (overlaps D3/D5):** server-only entitlement writes (client self-grant → 403); **webhook authenticity**
|
||
(forged/replayed RevenueCat webhook rejected); no client-trusted entitlement; receipt validated server-side.
|
||
- **Error/abuse:** cancel the billing sheet mid-flow, kill network mid-purchase, double-tap buy, rapid unlock taps →
|
||
no false unlock, no duplicate purchase, retry recovers.
|
||
- **Settings → Subscription** reflects the live status; "Manage subscription" deep-links to Play.
|
||
- Done = purchase + restore + switch + cancel→expiry-relock + refund all verified on a real device (or each explicitly
|
||
`blocked→needs-device` with the admin-toggle gate covered in Pass A).
|
||
|
||
### Pass L — Messaging & chat (E2E, both clients, the whole feature)
|
||
Chat is a core couple feature with no functional home until now (Pass C covers its visuals, Pass E its
|
||
`chat_message` push). Drive the **main couple conversation AND the per-question "Discuss" threads** QA↔Sam, both
|
||
directions. Read the manual's [E2EE model](docs/Engineering_Reference_Manual.md#end-to-end-encryption-model).
|
||
- **Send every type, both directions:** text, emoji, **image (gallery + camera)**, **voice note** → arrives on the
|
||
partner's device **decrypted**, correct attribution/timestamp/ordering, day separators.
|
||
- **E2E at rest (overlaps D1):** every sent item is ciphertext (`enc:v1:` / Tink media bytes), `lastMessagePreview`
|
||
encrypted, decrypts only on member devices; raw-API read by a non-member = 403.
|
||
- **Interactions:** reactions (add / change / remove), read receipts ("Seen"), typing indicator, message ordering under
|
||
rapid exchange.
|
||
- **Failed send & offline:** airplane mid-send → failed-message row → **retry / dismiss** (the 48dp controls), offline
|
||
queue flushes on reconnect, **no duplicate on retry** (idempotency, overlaps F); double/triple-tap send guarded.
|
||
- **Delete / moderation:** delete a message (own / both) + deleted-message rendering; block/report a partner if such a
|
||
flow exists.
|
||
- **Media:** gallery + camera + mic permission granted **and denied** → graceful; premium-gated media is couple-shared
|
||
(Pass A); oversized image handled; image viewer opens/zoom/back.
|
||
- **Inbox:** conversation list, unread badge, decrypted last-message preview, recency sort; open a conversation from
|
||
inbox **and** from "Discuss" **and** from a notification (Pass C/E) — all reach the same thread with a sane back stack.
|
||
- **Foreground chat-head bubble** for an incoming message while the app is open but that thread isn't on screen
|
||
(`MessageBubbleOverlay`) → tap opens it (Pass E overlap).
|
||
- **Realtime + perf (overlaps I):** snapshot listener detaches on leaving the conversation; long-history scroll pages,
|
||
no jank/leak. **Quiet hours** suppress the chat push (Pass M). Long/emoji/multiline/RTL text renders without clipping.
|
||
|
||
### Pass M — Settings & account management (functional: settings PERSIST and TAKE EFFECT)
|
||
Pass C checks Settings **looks** right; Pass M checks each control **does** something, persists across relaunch, and
|
||
takes real effect. Read [Authentication and pairing flow](docs/Engineering_Reference_Manual.md#authentication-and-pairing-flow).
|
||
- **Appearance theme** (Light / Dark / Device) → applies app-wide immediately, **persists across process death +
|
||
relaunch**, and the decoupled-art behavior holds (Pass C).
|
||
- **Notification toggles** (daily reminder · partner answered · chat · streak) → toggling one **OFF actually suppresses
|
||
that push** (verify by triggering it), ON re-enables; survives relaunch.
|
||
- **Quiet hours** → set a window covering "now" → partner-triggered pushes are suppressed/deferred during it
|
||
and deliver outside it; the partner-action vs promotional rate-limit split holds. **MUST test with the recipient
|
||
BACKGROUNDED/KILLED, not just foreground (RETROSPECTIVE — M-001):** a partner push carries a `notification` block
|
||
the OS renders directly when the app isn't foreground, so any client-side `QuietHoursManager.isInQuietHours` check
|
||
(which only runs in `onMessageReceived`, foreground-only) is bypassed exactly when quiet hours matters. Verdict bar:
|
||
with QH on + recipient backgrounded, send a real partner action (chat/answer/game) → assert **0** notification in the
|
||
shade AND the Cloud Function log says it suppressed (`recipientInQuietHours`); then QH off → same action delivers.
|
||
Generalize: **any "don't notify when X" setting (quiet hours, snooze, DND, per-type opt-out) must be enforced
|
||
server-side where the push is SENT** — verify the setting reaches Firestore and the sender honors it, not just the
|
||
client. (Reminder: the `users/{uid}` update rule is a **field allowlist** — a newly-synced pref field is silently
|
||
denied until added to it; confirm the write actually lands via an admin read, not just the UI toggle.)
|
||
- **Biometric app-lock** → enable → background-return / cold-start prompts for biometric; correct unlock proceeds,
|
||
cancel keeps it locked, disable removes the lock. (Security-relevant: no bypass.)
|
||
- **Edit profile** → name, sex/gender (inclusive options), photo upload → persists, reflects on the **partner's** side,
|
||
ciphertext/storage correct at rest.
|
||
- **Relationship / unpair** → unpair returns **both** to the unpaired state, **revokes decrypt** (D4), notifies the
|
||
partner (`partner_left`), makes couple data inaccessible; re-pair works cleanly.
|
||
- **Delete account** → confirmation → account + couple data cascade (`onUserDelete`), partner unpaired + notified,
|
||
re-create with the same email is a clean slate (overlaps G).
|
||
- **Security** → recovery-phrase reveal for **both** accepter and inviter (C-SEC-001), server-blind (D4); regenerate if
|
||
supported; **and the full new-device recovery flow — enter the phrase on a fresh install → existing history decrypts**
|
||
(canonical steps + failure paths in **D4**). **Subscription** → "Manage subscription" → Play (Pass K). **Privacy &
|
||
Terms / data export** links open (the export *contents* are verified in Pass O).
|
||
- **Analytics / funnel-event correctness (not just the leak check in D6).** The app ships a real analytics tracker
|
||
(`core/analytics/FirebaseAnalyticsTracker.kt`, wired via `di/ObservabilityModule.kt`); D6 only asserts *no private
|
||
content* leaks into it — nobody verifies the events actually **fire correctly**, so the business funnel can silently
|
||
break. Enable Firebase **DebugView** (`adb shell setprop debug.firebase.analytics.app app.closer`) and confirm the key
|
||
lifecycle events fire **once, at the right moment, with correct params**: signup, pair, paywall_view, purchase/restore
|
||
(Pass K), game_complete, daily_answer/reveal. Also confirm analytics honor any **consent / opt-out** (privacy): if a
|
||
toggle or first-run consent exists, opting out must actually stop collection. Wrong/missing/duplicated events = bug;
|
||
still no private content in any event (D6).
|
||
- Every toggle survives **process death + reinstall-with-data** (overlaps F).
|
||
|
||
### Pass N — Daily question, reveal, check-ins & the other interactive features
|
||
> **⛔ CLAUDE: Run `scripts/wiring-scan.sh` BEFORE driving these features** (review `/tmp/claude-wiring-scan-<date>.md`,
|
||
> record counts in `ClaudeQACoverage.md`). Every 🔴 dead-setter / 🟠 orphan-reader is a likely silent dead feature —
|
||
> this is the exact class that hid N-001 (Bucket List) + N-002 (Date Builder) behind innocent-looking empty states.
|
||
|
||
The non-game interactive surfaces that have no functional home (Pass B is games only). Read
|
||
[Daily question lifecycle](docs/Engineering_Reference_Manual.md#daily-question-lifecycle).
|
||
- **Daily-question loop (the core daily ritual):** assignment (6 PM CST, `assignDailyQuestion`) → answer (each answer
|
||
type) → **both-answered gate** (neither sees the other's answer until both submit) → **mutual reveal** → per-question
|
||
**Discuss** thread (Pass L) → **Answer History** → **streak** increment + milestone celebration (`streak_milestone`)
|
||
→ reveal `isRevealed` retry (the `onAnswerRevealed` push). Verify the premium daily-question fallback
|
||
(`DailyQuestionResolver` per-user) does **not** desync the couple's shared daily Q.
|
||
- **Relationship check-ins / Your Progress (outcomes):** baseline check-in (gated to show once), 30/60/90-day
|
||
follow-ups, slider inputs persist (`submitOutcomeCallable`), the progress view renders patterns/milestones,
|
||
`scheduledOutcomesReminder` fires, "No baseline yet" → check-in dialog (C-DARK-UI-002 area). Submit + Skip both work.
|
||
- **Bucket List:** add / check-complete / edit / delete an item; empty state; both-device sync; at rest encrypted (D1);
|
||
premium state if applicable (A).
|
||
- **Plan a Date / Date Builder:** build a plan (shape/steps) → save → **persists + the partner sees it**; date plan +
|
||
`date_swipes` ciphertext at rest (D1); submit-outcome path.
|
||
- **ACTUALLY PERSIST + verify via admin read — an empty list can be a DEAD feature, not an empty one (RETROSPECTIVE —
|
||
N-001/N-002).** For every interactive feature, create real data through the UI and confirm it **lands in Firestore**
|
||
(admin read) AND **renders back**; don't accept the empty/initial state as "works." Bucket List looked like an empty
|
||
list but was fully non-functional (`coupleId` never set → every op silently `return`ed); Date Builder's "Create Plan"
|
||
silently no-ops (`dateIdeaId` never wired) and writes to a collection no screen reads. Reflex: any VM that gates on
|
||
`if (someId.isEmpty()) return` and expects the screen to call `setX(...)` is suspect — `grep` for the `setX` caller; if
|
||
none, it's dead. Also confirm there's a **display surface** for whatever a "save/create" writes (a save into an unread
|
||
collection is an incomplete feature, not a working one).
|
||
- **Activity / Together feed:** shared activity entries render + sort, unread count, navigation in/out.
|
||
- Each feature: empty / loading / error / not-paired states, two-device realtime sync, no stuck/orphaned state.
|
||
|
||
### Pass O — Release build, store readiness & pre-launch security
|
||
**Everything above runs on the DEBUG build; the shippable artifact is the minified RELEASE build — test THAT.** This is
|
||
a pre-ship gate, not a per-round pass (run it before any store push and after build-config / dependency / keep-rule
|
||
changes).
|
||
- **Release/minified build (R8 + resource-shrink):** build the **release** APK/AAB and run `qa/entrypoint_smoke.sh` +
|
||
a representative slice of A–N on it. R8 can strip/obfuscate classes that **Firebase/Firestore/Tink/RevenueCat/Gson/
|
||
kotlinx-serialization/Compose** need via reflection → crashes that never appear in debug. Verify keep-rules; 0 FATAL
|
||
on launch + each core flow; **upload the ProGuard mapping to Crashlytics** so release crashes deobfuscate.
|
||
- **Signing & packaging:** release signing config + upload key; build the **App Bundle**; install the signed AAB via
|
||
bundletool / Play internal-app-sharing and smoke it; 64-bit + target-SDK compliance.
|
||
- **App Check enforcement (pre-launch — currently OFF in dev per standing instruction; do NOT enable the dev project):**
|
||
in **staging**, enable enforcement on Firestore + Functions → a valid-token app works, a raw/no-token request → **403**
|
||
(extends D3/D5 beyond rules-only); confirm Play Integrity on a real device vs the debug provider.
|
||
- **Deep links / Android App Links:** `closer://` **and** any `https` App Links (`assetlinks.json`) open the correct
|
||
screen with auth/membership re-checked (overlaps E).
|
||
- **Permissions & manifest:** the manifest declares only what's used; runtime prompts (POST_NOTIFICATIONS, camera, mic,
|
||
Android-13+ photo picker / `READ_MEDIA_IMAGES`) appear and degrade gracefully when denied; `allowBackup=false` holds (D6);
|
||
and the **`screenOrientation` decision is explicit** — today the manifest sets none, so the app rotates to an unverified
|
||
landscape layout (Pass C). Either lock portrait or certify landscape; don't ship it undecided.
|
||
- **Age gate / content rating / maturity (the app has adult/intimacy content — Desire Sync — and currently NO age gate).**
|
||
Confirm an appropriate **18+ / age-appropriate gate** exists where required, the Play **content/maturity rating
|
||
questionnaire** matches the actual content, and any IAP/intimacy content complies with store policy. A missing age gate
|
||
on adult content is a **store-rejection + legal risk** — file it (see the app-finding note).
|
||
- **Localization & formats (i18n):** strings are externalized (no hardcoded user-facing text), the longest translations
|
||
don't clip (overlaps C/J), **RTL** mirrors correctly, dates/numbers/**subscription prices+currency** format per locale
|
||
(overlaps K). Even if English-only today, confirm there's no layout that assumes English length.
|
||
- **Play Store readiness:** the **Data Safety** form matches the actual data flows + E2E encryption; privacy-policy URL
|
||
live; version code/name bumped; store listing/screenshots are the brand pass (H); min/target-SDK **device matrix**
|
||
(Methodology) covered.
|
||
- **Data-rights compliance (GDPR/CCPA — verify the CONTENTS, not just that a link opens).** Pass M confirms the export /
|
||
privacy links resolve; here, confirm **right-to-access** actually returns the user's real data (and that E2E content is
|
||
handled correctly — exported decrypted to the owner, or documented as unrecoverable), and **right-to-erasure** (delete
|
||
account, Pass M/G) genuinely cascades server-side (`onUserDelete`). A privacy policy that claims flows the app doesn't
|
||
do (or omits ones it does) is a finding.
|
||
|
||
### Pass P — Content, copy & language quality (voice, grammar, inclusivity, the question bank)
|
||
**Wrong language is a BUG, not a "nice-to-have."** Typos, grammar/punctuation errors, off-brand or cold/salesy voice,
|
||
non-inclusive or assumptive wording, leaked placeholder/dev text, raw SDK/Firebase/RevenueCat errors shown to users,
|
||
copy that doesn't match behavior, and broken/duplicate/low-quality questions are all **defects → `ClaudeReport.md`**.
|
||
Only genuinely-working copy that *could be warmer/clearer* goes to `Future.md`. **Read first:**
|
||
`docs/brand/visual-identity.md` (**Store voice**) and `seed/questions/QUESTION_CONTENT_GUIDE.md` (**v3** — readability
|
||
test, no-AI-writing, duplicate prevention, variety, fun/relationship-first/premium rules). This is a recurring pass.
|
||
- **UI microcopy audit (every screen + state):** read ALL visible text — titles, labels, button verbs, helper text,
|
||
empty states, dialogs/confirmations, toasts, loading + **error** copy, and notification text — for: typos, grammar,
|
||
punctuation, capitalization/casing consistency, consistent terminology (feature names, "partner"/the partner's name,
|
||
"couple"), and copy that **matches the actual action/state** (a button says what it does; "Day N of 7" matches the real
|
||
state; correct names/counts/attribution). A label that misstates its destination or effect is a bug (overlaps C's CTA check).
|
||
- **No raw / placeholder / dev text ever reaches a user:** no Lorem, "TODO", debug strings, untranslated keys, or raw
|
||
exception/Firebase/RevenueCat error text surfaced to users (the A-OBS class) — always friendly copy.
|
||
- **Brand voice & tone (against `visual-identity.md` Store voice):** copy is **warm, quiet, equal, calm, specific** — a
|
||
private ritual for two. **Off-voice = finding:** cold/clinical, salesy/hype, urgent/alarmist, guilt- or streak-shaming,
|
||
competitive, surveillance-y, or "we'll FIX your relationship" promises. Scrutinize paywall, notification, and streak copy.
|
||
- **Inclusive & non-assumptive language:** no heteronormative or relationship-structure assumptions, no assumption about
|
||
who initiates or about bodies/ability/culture; gender-neutral where the design calls for it (the de-gendering effort —
|
||
`seed/degender_*.py`); sensitive topics (Desire Sync, intimacy) phrased with care + a consent framing, never crude or
|
||
clinical. Assumptive/exclusionary wording = bug.
|
||
- **Question-bank content QA (against `QUESTION_CONTENT_GUIDE.md` v3):** spot-check questions across **every category,
|
||
depth, and answer type** as the user SEES them (live-rendered, not just the DB) for: passes the **readability test**;
|
||
no **AI-writing tells**; **no duplicates / near-duplicates**; sensible **variety** + emotional mix; **answer options
|
||
complete + mutually exclusive + sensible** (no overlapping, joke-only, or placeholder options); the **answer type fits**
|
||
the prompt; **fun-rule / relationship-first** tone; and **no broken/empty/garbled/offensive/unsafe** prompts (cf.
|
||
`seed/fix_depth5_grammar.py`, `seed/validate_question_variety.py`). A shipped question that's broken, duplicated,
|
||
off-guide, or unsafe is a bug.
|
||
- **Legal / store / monetary copy accuracy:** paywall benefit claims are truthful (no over-promise); subscription terms +
|
||
renewal/price wording present and accurate (overlaps K); Privacy & Terms links resolve; store-voice rules hold.
|
||
- **Localization correctness (overlaps O):** no clipped/awkward strings from concatenation, correct pluralization, dates/
|
||
numbers/currency per locale, RTL grammar intact — even if English-only today, flag any English-length/grammar assumption.
|
||
- **Method:** harvest strings from `res/values/strings.xml` + in-code literals AND read them **in context on-device** (a
|
||
string can be fine in isolation yet wrong/cramped/ambiguous in place). Routing: incorrect/off-voice/non-inclusive/
|
||
placeholder/inaccurate/unsafe = **bug → `ClaudeReport.md`** (P2 default; **P1** if it misleads, blocks, leaks a raw
|
||
error, or is offensive/unsafe; P3 for pure nits); "could be warmer/clearer" → `Future.md`; new copy/voice work that's
|
||
out of scope → note it.
|
||
|
||
## Reporting → ClaudeReport.md (living QA report)
|
||
- Header: date, build, devices, round number + run-state header.
|
||
- One section per pass (A–P), each a table: **ID | Area | Screen/Route | Mode | Severity | Description | Repro
|
||
| Evidence | Suggested fix | Status**.
|
||
- Summary: counts by severity. Report only during passes — no fixes recorded until the fix phase.
|
||
|
||
### Report hygiene — keep it CLEAN, lean, and never dangling (the report is a *current-state* doc, not an archive)
|
||
The report's job is to show, at a glance, **what's wrong right now** — not to accumulate a history of everything ever
|
||
fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So:
|
||
- **A Fixed row survives exactly ONE confirmation round, then it's removed.** When you fix an issue, mark its row
|
||
`Fixed` and keep it through the **next** re-QA round. Once that round re-verifies it, **delete the row** — the durable
|
||
root-cause/fix detail lives in the **Engineering Manual landmine** (mandatory for any escaped/deep bug) + git history
|
||
(the row cites the landmine ID, and the commit hash once the user commits), so nothing is lost. Don't rely on "the
|
||
commit message" as the only home — **you don't commit** (the user does, often batched), so the manual landmine is the
|
||
reliable record. Don't carry confirmed-fixed issues across multiple rounds.
|
||
- **Make the archived-ID line a usable duplicate-fix lookup, not bare IDs.** When you prune a row, attach a **2–4 word
|
||
tag** to its archived ID (e.g. `C-PW-001 dark paywall pills`) so the Fix-phase **Regression triage** can search it by
|
||
symptom. **Any fix whose class could plausibly recur gets at least a one-line Engineering Manual landmine entry** —
|
||
not only the "escaped/deep" bugs the MANDATORY-retrospective requires — so a future regression check never lands on an
|
||
ID with no description. (This is why a separate fix-history file is unnecessary: the manual landmines + this tagged
|
||
archived line + git already are the fix history.)
|
||
- **One run-state header, always.** Keep only the **current** `Round N | Pass X | Chunk Y | NEXT ACTION` block pinned
|
||
at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a **single one-line history**
|
||
entry each (e.g. `R6: branding regression — 0 new`), or drop them entirely once their fixes are confirmed-and-pruned.
|
||
- **Open issues first; resolved issues compact.** Order every pass section **open (P0→P3) on top**; keep a short
|
||
`Resolved & confirmed (archived — detail in git)` line listing only the **IDs** of older fixed-and-verified issues
|
||
(not their tables). The big per-issue tables exist only for **currently-open** and **fixed-this-round-pending-confirm**
|
||
issues.
|
||
- **Severity board reflects NOW.** One board, current counts; `Open` is the number that actually matters. When `Open`
|
||
hits 0 at every level, the report should be **short** — current run-state, a 0/0 board, the archived-ID line, and the
|
||
operational constants (devices/accounts, standing-auth, playbook pointers). If it's long while everything is fixed,
|
||
it needs pruning.
|
||
|
||
### Coverage-matrix hygiene (`ClaudeQACoverage.md` — a *current-status* matrix, not a per-round changelog)
|
||
- **Flip, don't stack.** When a fix is confirmed, change that row's `fail→id` to `pass` and move the ID to an archived
|
||
line — never leave a confirmed-fixed `fail→id` dangling, and never keep a contradicting "still owed" note next to a
|
||
completed row.
|
||
- **One status per cell, current.** Each screen/feature/game/notification shows its **latest** status only; collapse
|
||
prior rounds' narration into a single one-line **round history**. Keep an at-a-glance pass-status table at the top.
|
||
- **Keep the resume signal sharp.** What a returning session needs is *what's left* — surface `todo`/`deferred`/
|
||
`blocked` items plainly; don't bury them under superseded prose.
|
||
|
||
### Extremely-easy-to-read mandate (applies to ClaudeReport.md, ClaudeQACoverage.md, and Future.md)
|
||
Optimize every QA doc for a reader who has **5 seconds** to find the current state:
|
||
- **Lead with the answer.** Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean")
|
||
before any detail.
|
||
- **Tables over prose** for issues; **short rows**. Put long root-cause analysis in the **Engineering Manual landmine**
|
||
(the durable home), not the row — the row gets a one-sentence description + repro + the landmine ID (and commit hash
|
||
once the user commits).
|
||
- **No walls of text.** Break run-state into scannable lines; bold the few words that matter; no multi-paragraph
|
||
headers. If a paragraph is longer than ~3 lines, it's probably manual/landmine material, not report material.
|
||
- **Consistent shape every round** so a returning reader (or a post-compaction resume) finds things in the same place.
|
||
|
||
## Fix phase (only AFTER all passes of the round complete)
|
||
- Work strictly by severity: **all P0 → P1 → P2 → P3**.
|
||
- **⛔ Regression triage — DIFF & history-check BEFORE you write a fix (every bug, not just crashes — don't fix blind).**
|
||
First answer *"is this NEW, or did we break/relapse something that worked?"* — fixing without this risks re-fixing a
|
||
known issue a different way (divergent fixes) or masking the real regression:
|
||
1. **Have we fixed this before? (duplicate-fix / regression check.)** Search the **fix history** for the same
|
||
symptom/area/ID — the canonical home is the Engineering Manual
|
||
[Known landmines and recent fixes](docs/Engineering_Reference_Manual.md#known-landmines-and-recent-fixes) (root
|
||
cause + the guard that should hold it) plus `ClaudeReport.md`'s `Resolved & confirmed` archived-ID line. **A match ⇒
|
||
this is a REGRESSION, not a new bug:** re-open under the **original ID**, and fix *why the guard lapsed* (a scanner/
|
||
test/pass-step that was supposed to catch it) — do **not** re-implement a fresh fix from scratch.
|
||
2. **What changed? (diff before you fix.)** `git log` / `git diff` / `git blame` / `git log -L` the failing area to pin
|
||
the introducing change — **including OTHER agents' recent commits** (this repo is co-edited by Codex / kimi / Ripley,
|
||
so "what changed" is frequently not your own work; `git log --since` / `git log <file>` across authors). Read that
|
||
commit's diff and fix the **actual cause it introduced**, not the surface symptom. ("worked before, broken now"
|
||
⇒ always bisect to the change first.)
|
||
3. Only after you know **new-vs-regression** and **what introduced it** do you design the fix.
|
||
- **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct
|
||
device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext
|
||
in Firestore**, **`./gradlew testDebugUnitTest` + functions `npm test` still green** — a fix that reds a test isn't
|
||
Fixed) → flip its row to **Fixed** + capture the durable substance in the Engineering Manual landmine → next
|
||
(the **user** commits per issue/cluster — never run git yourself; see Guardrails). Don't start the next until the
|
||
current is verified.
|
||
- **Real-path verification gate (do NOT mark Fixed without it):** verify the fix through the **same path the user hits**,
|
||
not a synthetic shortcut. A crash/launch/notification fix is only "Fixed" once reproduced-then-cleared via the REAL
|
||
channel (real push tapped from the shade on an `am kill`'d app; real launcher cold-start) — `am start`/`am force-stop`
|
||
passes don't count. For any cold-start/notification/launch fix, the gate is **`qa/entrypoint_smoke.sh` green**. (This
|
||
session's miss: a routing "fix" was declared on `am start` evidence while the real bug was a splash crash on the FCM
|
||
cold-start. Don't repeat it.)
|
||
- **Couple-shared premium fix**: replace direct `isPremium()` gates with
|
||
`CouplePremiumChecker.coupleHasPremium(partnerId)` in every gated VM/screen (partner-entitlement read rule deployed).
|
||
**High regression risk** — re-verify each feature in BOTH self-premium and free states.
|
||
- **Re-run associated scanners after fixing.** If the fix touches UI colors/surfaces (Pass C), re-run
|
||
`scripts/theme-scan.sh` and confirm the relevant CRITICAL count dropped. If the fix touches launch/splash/notifications,
|
||
re-run `qa/entrypoint_smoke.sh`. Update the coverage matrix with the new counts; a "Fixed" row is only valid when the
|
||
scanner (and the live visual sweep) both agree.
|
||
- Gated actions (entitlement toggles, deploys) are **user-authorized per occurrence**.
|
||
- **New issues found while fixing** are logged (new ID), not silently fixed beyond scope — next re-QA round catches them.
|
||
|
||
**Definition of done:** a **pass** is done when every coverage row is `pass`/`fail→id`/`not implemented→Future.md`/
|
||
`blocked→id`; a **round** is done when all **recurring** passes (A–N + P) are done; **flawless** = one full round with
|
||
**zero open P0–P2 and Passes D + E + L + P fully clean** (no open P0/P1 in I/J), **every game fully played through,
|
||
every notification type verified or explicitly `not implemented→Future.md`, chat (L) + the couple-shared premium gate
|
||
(A) + settings-take-effect (M) + **interactive features (N: daily-Q/reveal, outcomes, Bucket List, Date Builder work
|
||
end-to-end — created data persists AND is read back, `scripts/wiring-scan.sh` 🔴=0)** + content/language (P: no typos/
|
||
off-voice/non-inclusive copy, question bank on-guide) verified, all join-game navigation paths and all back-stack checks
|
||
verified**, **the unit + functions test suites GREEN (`./gradlew testDebugUnitTest` + functions `npm test`)**, **and
|
||
`qa/entrypoint_smoke.sh` GREEN on
|
||
both emulators (0 FAIL — every entry-point cold-start opens and stays)**. Then stop (P3s optional). **Pass O (release
|
||
build + store readiness) and Pass K's real-money path are pre-ship / real-device gates** — they don't block a per-round
|
||
"flawless" but **must be GREEN before any store submission**. Don't re-open a clean pass within the same round.
|
||
|
||
## Re-QA loop (until flawless)
|
||
After the fix phase, re-run Passes A–N + P (regression + confirm fixes; Pass K money-path when a sandbox device is
|
||
available, Pass O when prepping a release). Repeat **fix → re-QA** rounds until a full
|
||
round yields zero P0–P2 and Passes D+E fully clean.
|
||
- **Prune on confirmation (Report hygiene):** the moment a re-QA round re-verifies a `Fixed` issue, **delete its row**
|
||
from `ClaudeReport.md` (move its ID to the compact `Resolved & confirmed (archived — detail in git)` line) and
|
||
collapse that finished round's run-state header. A fixed issue lives in the report for **one** confirmation round
|
||
only — never let confirmed-fixed rows or old run-states accumulate. See **Report hygiene** under Reporting.
|