Closer/ClaudeQAPlan.md

394 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Claude QA Playbook — Full-App QA → Fix → Re-QA until flawless
> Reusable QA plan for the Closer app. Run report-only first, fix everything, then re-QA until a clean round.
> Progress/state is tracked in **ClaudeReport.md** (issues) + **ClaudeQACoverage.md** (coverage matrix), which are
> the authoritative source of truth. See the Continuity section before resuming.
>
> **Program roadmap:** **Part 1** = Android QA (this doc) → **Part 2** = build the iOS app to Android's current
> parity → **Part 3** = run these same passes on iOS + a cross-platform (Android↔iOS) pass. **Parts 2 & 3 live in
> `ClaudeiOSPlan.md`** (note: iOS build/run/QA requires macOS — not possible from this Linux box).
## Context
Drive the real app on both emulators, verify each thing live, report, fix, re-verify. Five QA dimensions:
1. **Couple-shared premium** — if EITHER partner is premium, **all** premium features unlock for **both**.
2. **Games** — each starts, plays, finishes correctly on both devices.
3. **Full visual pass, light + dark** — every screen, text readable, nothing clipped/invisible.
4. **Security & encryption (cornerstone)** — every private field is ciphertext at rest, rules hold against
non-members, keys/recovery are sound. Findings here default to P0.
5. **Notifications** — all 17 types deliver to the right partner (foreground/background/killed), deep-link
correctly, and leak no private content.
Scope decisions: **exhaustive** visual pass (all ~50 screens, both modes); **full scope incl. pre-pairing** flows
(fresh throwaway account); **couple-shared everywhere** — per-user gates are bugs, fixed by routing through
`core/billing/CouplePremiumChecker.kt`.
**Early known signal:** only chat uses `CouplePremiumChecker`; games/packs/dates/wheel gate on the user's own
`EntitlementChecker.isPremium()` — so premium almost certainly does NOT unlock for the free partner there. Pass A
confirms + enumerates this; the fix phase applies couple-shared everywhere.
## Execution mode — run to completion (autonomous; do NOT stop)
- **Do not stop to check in or ask for approval.** Run all five passes → the fix phase → re-QA rounds **continuously
until a flawless round** (zero open P0P2, Passes D + E clean, every game fully played through, navigation/back-stack
verified). Don't hand control back early.
- **Unblock yourself:** if anything **blocks progress** (a stale/blocking session, a crash, a build break, a missing
prerequisite state, a broken nav path that prevents reaching a screen), **fix it immediately and continue** — even
though passes are otherwise report-only. Blocking issues are fixed inline so the run can proceed; non-blocking
findings are still logged and fixed in the fix phase.
- **"Once executed, complete it":** never declare done before the Definition of Done is met — keep cycling fix → re-QA
until flawless, then stop.
- **Context limits ≠ stopping — do NOT hand back to the user when context fills.** The harness auto-summarizes a long
conversation and continues in the next window; you continue **without the user**. (You cannot self-invoke `/compact`
— and you don't need to; auto-compaction handles it.) The **committed `ClaudeReport.md` run-state + `ClaudeQACoverage.md`
are the authoritative state** and survive any compaction — after a summary, **re-read them and continue at the next
chunk**. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action
even with standing auth, or the macOS requirement for iOS).
- **Commit before anything interruptible** so a mid-chunk compaction never loses progress. Keep chunks atomic; if a
chunk is cut off mid-way (e.g., a game session left active), the **session-start ritual recovers it** (clear the stuck
session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare.
- **Don't pause for "by-design vs bug":** log the ambiguous finding and keep going (don't unilaterally rewrite
deliberate design — the log captures it). Never halt the run to ask.
- **Only true stop = a gated action you cannot perform.** Production deploys, admin Firestore writes/seeds, and
entitlement toggles still need per-occurrence authorization (the classifier enforces this regardless of this doc).
If one is genuinely required to proceed and is denied, do **all** other work first, then surface only that single
blocker — don't halt the whole run for it.
## Methodology (every pass)
- Devices: **5554 (QA)**, **5556 (Sam)**, paired; one **fresh throwaway account** for pre-pairing flows.
- Drive via adb tap/swipe; resolve coords from `uiautomator dump` bounds; downscale screenshots to read;
scan `logcat` for `FATAL EXCEPTION`/ANR on each screen.
- Premium toggled via `scratchpad/set_premium.js` (admin, **user-authorized each time**).
- Theme toggled via **Settings → Appearance (Light/Dark)** (`MainActivity` `ThemeMode`).
- **REPORT-ONLY during passes — never fix mid-pass.**
- **THINK AS A CONSUMER — approach everything from different angles.** Beyond "does it work", constantly ask *"is this
what a real person would expect / want here? is this delightful, confusing, or annoying?"* Come at each flow from
multiple angles (first-time user, returning user, the partner who didn't start it, someone tapping fast, someone
reading carefully, the skeptic, the impatient). Vary inputs, depths, orders, and entry points (don't repeat one
happy path). A thing can be bug-free yet still *worse than it should be* — notice that too.
- **CAPTURE IMPROVEMENT / FEATURE IDEAS → `Future.md` (section `## QA`).** Bugs (broken/incorrect behavior) go to
`ClaudeReport.md` as always. But anything that *works yet could be better* — confusing copy, a missing affordance,
a rough-but-not-broken flow, a "it'd be great if…" feature idea — append it to **`Future.md` under `## QA`** with a
short title, what prompted it, and the suggested improvement. This is an idea backlog, **not** the bug log; logging
here is never a substitute for filing an actual defect in `ClaudeReport.md`.
- **Environment (senior-QA rec):** prefer the **Firebase Local Emulator Suite or a dedicated staging project** over
production — isolates test data, makes seeding / entitlement toggles / D3 negative tests **free** (no gated prod
writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real
services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but
every seed/toggle/deploy hits the gate.)
- **Device/OS matrix:** don't certify on one emulator only — cover **minSdk + targetSdk**, a **small** and a **large**
screen, and at least one **physical device** (App Check / Play Integrity behave differently on emulators).
- **Automate the regression smoke:** capture the smoke checklist as a runnable script (adb/Maestro) so every round
re-checks it cheaply instead of by hand.
- **Test-data hygiene:** keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between
rounds so they don't masquerade as bugs.
## Multi-angle attack mandate (go DEEPER than "does the happy path work")
A capability can pass via the UI yet fail when hit directly. Probe each meaningful capability (read/write a private
field, gate a premium feature, deliver/route a notification, start/finish a game, pair/unpair, create an account)
from as many **independent angles** as apply — not just the in-app happy path:
- **Real UI** (play-as-user) — the baseline angle.
- **Crafted intent / deep-link** — fire the exact intent a notification/link carries (bypasses UI nav) to test routing
in isolation; also send **malformed/missing extras** → must route gracefully or no-op, never crash.
- **Raw API against the DEPLOYED backend** — hit Firestore/Storage/Functions REST **directly** with a real token,
as a **member AND a non-member**, to exercise rules + App Check from OUTSIDE the app. A non-member (or no-App-Check)
request must be **DENIED** — App Check `403` or rules `PERMISSION_DENIED`. The member request characterizes which
layer enforces. **Any unauthorized `200` returning couple data = P0.**
- **Admin inspection (ground truth)** — read the RAW stored docs/objects (admin bypasses rules) to assert what is
actually persisted: ciphertext only, no plaintext, no raw keys/invite-seeds, no private content in pushes.
- **Concurrency / race** — two partners (or two rapid taps) hit the same thing at once.
- **Killed / cold state** — force-stop, then deliver + tap a notification; cold-start straight onto a deep link.
- **Malformed / abusive input** — oversized, empty, rapid-fire, injection-ish, forged FCM payloads, replayed/expired
tokens & invite codes.
- **Offline / flaky** — drop network mid-action → graceful failure, recover on reconnect.
Record **which angles** were tried per area in `ClaudeQACoverage.md`. For security- or data-sensitive capabilities,
"UI happy path only" is **not** a `pass`. **D3/Pass G negative access MUST be executed live via the raw-API angle each
round — never deferred to "only 2 emulators."** (Mint a token for a non-member UID via admin → exchange for an ID
token via the Identity Toolkit REST `signInWithCustomToken` → use it as Bearer against the Firestore REST API.)
## Continuity & resumability (this effort WILL span many context windows — don't lose state)
State lives in **files**, not memory:
- **`ClaudeReport.md`** = the issue log (committed). Each issue row is **self-contained in text** (repro + expected
+ actual) — screenshots are session-only and won't survive a compaction; never rely on a screenshot path alone.
- **`ClaudeQACoverage.md`** = the coverage matrix: every screen×mode, feature×premium-state, game×lifecycle,
notification×{foreground,background,killed}, each `todo | pass | fail(→issue id)`. The resume anchor.
- **Persistent memory** (`memory/`): QA methodology + exact commands; emulator↔account↔coupleId mapping;
`scratchpad/set_premium.js` + admin tooling; the couple-shared-premium-everywhere goal + the per-user-gate gap.
- **Run-state header** pinned at the TOP of `ClaudeReport.md`, always current: `Round N | Pass X | Chunk Y |
NEXT ACTION: …` — first thing to read, last thing to update before stopping.
- **Stable issue IDs**: `A-001 / B-002 / C-… / D-… / E-…` (pass-letter + number); coverage references the ID for
every `fail`. Never renumber or reuse.
- **Source of truth**: the two MD files are authoritative; the TodoWrite list is scratch for the current chunk only.
Update the MD files + run-state header *before* ending a session.
- **Commit cadence**: commit `ClaudeReport.md` + `ClaudeQACoverage.md` after each pass and each chunk.
- **Chunking**: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
- **Session-start ritual**: (1) read run-state header + both MD files; (2) `adb devices` shows **both** emulators
online; (3) **installed build == current HEAD** (rebuild+reinstall if unsure — never QA a stale APK); (4) continue
at the first `todo` / unverified-fix.
## Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration)
A pass is a **category**, not a unit of work. Execute each pass as **sub-batches (chunks)**, where a chunk = the
**largest coherent unit that reliably finishes AND commits within one context window, with margin**. End every chunk
with a commit + run-state update. If a chunk starts overflowing, split it; if chunks feel trivial, merge them.
**Why:** in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching
prevents half-done/lost work and gives cleaner per-chunk verification + revertable commits.
| Pass | Chunk granularity | ~chunks |
|---|---|---|
| A Premium | free-state gating sweep; then couple-shared verify (mostly code + a few live taps) | 12 |
| B Games | **one game per chunk** — full two-device playthrough + edges + commit | 7 |
| C Visual | **one screen-group per chunk** (both themes, ~610 screens, montage-reviewed + nav/back for that group) — never "all screens" at once (heaviest, image-bound) | 68 |
| D Security | D1 at-rest · D2 rules + D3 negative · D4 keys/recovery · D5D7 appcheck/secrets/leaks/migration | ~4 |
| E Notifications | **35 types per chunk** × {foreground/background/killed} + tap-to-open | ~4 |
| F Resilience | **one dimension per chunk** (concurrency · lifecycle/process-death · network · time · account-lifecycle) | ~5 |
Context-cost tips: prefer **code/admin-read audits** (cheap) before live UI sweeps; **montage** screenshots
(dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus.
## Guardrails & efficiency
- **Never `pm clear` / wipe app data** — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
- **Never run `seed/build_db.py`.** Admin seeds/writes, entitlement toggles, and any deploys are **user-authorized per occurrence**.
- **By-design vs bug:** if a finding may be intended behavior, **log it and keep going** (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
- **Pass C parallelism:** set **5554 = Dark, 5556 = Light** to capture both themes at once.
- Never log decrypted message/answer content.
## Severity scale (label every issue)
- **P0 Critical** — crash/ANR, data loss, encryption/security leak, feature fully broken, premium bypass.
- **P1 Major** — feature partly broken, premium not unlocking for partner, wrong/missing notification, dead-end nav.
- **P2 Minor** — readability/contrast, clipping/overflow/truncation, theme not adapting, inconsistent styling.
- **P3 Polish** — spacing/alignment/copy nits.
## QA passes (Round 1 = baseline)
### Pass A — Couple-shared premium (target: either partner premium → both unlock)
Test each gated feature in 3 states: **neither** premium → locked + paywall; **partner-only** premium → BOTH unlock;
**self** premium → unlock. Toggle Sam premium, confirm QA (free) unlocks; toggle off.
Features: Play-hub games (Desire Sync + any premium-badged), Connection Challenges, Memory Lane; Question Packs;
Spin the Wheel / Category Picker / Wheel History; Date Match / Plan Date / Date Builder; chat media + reactions
(regression — already couple-shared); Subscription/Settings reflects entitlement.
Gated files (for the fix): `ui/play/PlayHubViewModel`, `ui/desiresync/DesireSyncScreen`,
`ui/wheel/{CategoryPicker,SpinWheel,WheelHistory}*`, `ui/questions/QuestionPackLibrary*`,
`ui/dates/{DateMatch,DateMatches}Screen`, `ui/memorylane/MemoryLaneScreen`, `ui/challenges/ConnectionChallengesScreen`.
### Pass B — Games lifecycle (MANDATORY: play each game ONE complete time through)
Games: This or That, How Well Do You Know Me, Desire Sync, Connection Challenges, Memory Lane, Spin the Wheel, + Date Match.
- **PLAY AS THE USER (mandatory mindset for this pass):** drive every game **the way a real user would** — reach it
through the actual in-app navigation a person would tap (Play hub → the game's card → its buttons), **not** via
deep-links, admin pokes, forced state, or any shortcut a user doesn't have. **Expect what the user expects:** if a
tap/button/flow doesn't do the obvious thing, or a screen doesn't behave the way a normal user would assume, **that
itself is a finding** — log it.
- **When something doesn't work: REPORT FIRST, then a minimal workaround (in that order).** Do **not** silently
engineer around breakage by taking extra steps the user wouldn't take. The moment the natural user path fails:
(1) **log the issue** in `ClaudeReport.md` with severity + the exact user action that failed and what was expected;
(2) **only then** apply the smallest workaround needed to keep the pass moving. The workaround **never replaces**
the report — a flow that needs a workaround to proceed is, by definition, broken and must be filed to fix. If a
workaround is impossible, mark the game `fail→<id>` (blocked) and continue with the next.
- **A launch/crash check is NOT sufficient. Each game MUST be played one full way through, end-to-end, on BOTH
devices** — start → answer/interact through **every** step/round/question on each device → reach the
**finish/reveal/results** screen → confirm the result renders correctly for both partners. Verify each
intermediate screen and interaction works (selections register, progress advances, both-answered gating,
reveal/scoring/summary correct). Premium games (Desire Sync, Memory Lane) need a premium toggle to play.
- The session lifecycle is exercised by the real playthrough: `status` active→completed; reveal/results correct on both.
- **VARY THE STYLE OF PLAY (don't just repeat the happy path):** across runs, deliberately exercise *different* ways a
real couple would play each game, because different inputs hit different code paths:
- **Different DEPTHS and QUESTION COUNTS — cover the matrix, don't settle for one combo:** play each game across
**every depth/mood** (Light, Everyday, Deep, All-topics/shuffle) AND **every round length / number of questions**
(5 / 10 / 15), in *different pairings* across runs (e.g. Light×5, Deep×15, Everyday×10, All×5) — short *and* long
sessions, shallow *and* deep content. Different depths surface different question sets, tones, and edge content
(e.g. Deep/Desire-Sync sensitive prompts); different counts stress pacing, progress, and the both-answered gate.
Also exercise **each distinct answer type** (A/B, Yes/No, True/False, 15 scale, multi-select, free-text).
- **Different answer *patterns* that change the result** — all-match vs all-mismatch vs partial; both-yes vs both-no
vs split (so reveals show "shared", "all private", "0 matches", "perfect/zero score" — verify each renders right).
- **Different turn orders / who-starts** — partner A starts vs partner B starts; the guesser opens before vs after
the subject finishes; both open simultaneously (race); one device much slower than the other.
- **Different exit/resume styles** — finish normally; quit mid-game; background mid-game then resume; cold-kill
mid-game then reopen; "End their game"; re-open a completed session for the replay/results; play two games
back-to-back, and a *different* game type immediately after.
- **Edge inputs** — submit with nothing selected (should be blocked), rapid double-taps on answer/confirm/next,
spamming the start button, tapping during the reveal animation. None should crash, duplicate, or desync.
- Edges: re-open a completed session, leave mid-game (resume), no stuck session, no crash, logcat clean.
- Game start/finish pushes (`onGameSessionUpdate`) exercised here; full delivery/deep-link audit in **Pass E**.
- **Media permissions** (CAMERA, RECORD_AUDIO): granted works, denied degrades gracefully.
- **Done = every game has one verified complete playthrough** (a launch-only "opens, no crash" row is `partial`, not `pass`).
### Pass C — Visual pass, light + dark, ALL screens
Every route in `core/navigation/AppRoute.kt` (~50), in **both** modes: text contrast/readability (no invisible/
low-contrast), no clipping/overflow/ellipsis breakage, icons visible, backgrounds adapt, controls legible. Groups:
auth/onboarding/pairing (fresh acct); Home (solo + paired); Play + every game; Today + reveal/history; Messages
(inbox + conversation); Packs; Dates (Match/Builder/Matches/Bucket List); Wheel (picker/session/complete/history);
Settings + all sub-pages (Account, Notifications, Appearance, Privacy, Subscription, Relationship, Security, Delete
Account); Paywall; Your Progress/Activity; Recovery.
- **Probe:** `ui/theme/Theme.kt` hardcoded brand colors + chat's custom `closerBackgroundBrush` — verify dark mode
truly adapts; grep screens for hardcoded `Color(0x...)`.
- **States, not just happy path:** empty / loading / error / not-paired where they exist; many need data setup
(seeding is user-gated) — note unreachable states in coverage rather than skipping silently.
- **Readability at scale:** default font size + spot-check largest system font scale on text-heavy screens.
- **Navigation from every entry point:** reach each screen from **all** the places that link to it and confirm it
opens correctly each time — e.g. a conversation from the inbox AND from "Discuss" AND from a notification; a game
from the Play hub AND from a notification; Paywall from each gated feature; Settings sub-pages; reveal from Today
AND from history AND from `partner_answered`. A screen that works from one entry but breaks/duplicates from another = bug.
- **TAKE EVERY AVENUE (exhaustive nav fuzzing — actively hunt for nav bugs, don't just walk the happy path):** treat
navigation as something to *break*. On every screen, **tap every interactive element** — each button, card, row,
icon, chip, link, tab, header back-arrow, system back, and any "see all / history / edit / manage" affordance — and
follow where it goes. Then try the *combinations and sequences* a curious user hits:
- **Every order:** switch bottom tabs in many orders, mid-flow (open a game, jump to Messages, come back); enter a
deep screen then tab away then back; open A→B→C then back-back-back.
- **Rapid / repeated input:** double- and triple-tap navigation targets (especially "open game", "Play now",
"Create/Start session", notification taps) to surface double-push/duplicate-screen/stale-route bugs (cf. B-004).
- **Interrupt mid-navigation:** background/rotate/lock during a transition; tap a notification while already on that
screen, on a different screen, and while logged-out/unpaired; cold-start straight onto a deep link.
- **Dead-ends & traps:** from *every* screen confirm there's always a way out (back/close/home) — no screen that
strands the user, needs two backs, exits the app unexpectedly, loops, or lands blank. Re-check the asymmetric-game
waiting screens, replay/results screens, and paywall specifically.
- Log **every** wrong/duplicate/dead destination with the exact tap sequence to reproduce. Wrong/double-back or
dead-end = **P2** (P1 if it traps the user or loses their progress).
- **Back-stack / "double back":** from every entry point, **system back AND the in-app back arrow** return to the
correct previous screen — no dead-ends, no exiting the app unexpectedly, and **no screen that requires pressing
back twice** (duplicate/stacked destinations on the back stack = bug). Bottom-tab reselection and deep-link/
notification entries must land with a sane back stack (back → Home, not off the app or a blank screen). Wrong/
double back or a dead-end = **P2** (P1 if it traps the user).
- **D1 At-rest coverage:** admin-read RAW docs/objects, assert ciphertext for every private type — chat text +
`lastMessagePreview` (`enc:v1:`), chat media bytes (Tink `01 69 59 51 f0…`), answers (`sealed:v1:`/`enc:v1:`),
date plans + `date_swipes`, Memory Lane capsules, Bucket List. Also: **wrappedCoupleKey** + recovery material never
plaintext; **invite code (KDF seed) never stored raw**; **no push payload carries private content**.
- **D2 Rules audit (static):** member-only reads, author/server-only writes, ciphertext enforced on every private
field, immutability, **no premium self-grant**, entitlements write:false; re-audit conversations/typing/reactions
+ entitlement partner-read; **no catch-all** `match /{document=**}`; list/query not enumerable; `get()`-rules don't
over-expose; **no legacy plaintext/downgrade path** (`coupleEncryptionEnabled` holds; no disabled-encryption branch).
- **D3 Negative access tests (EXECUTE LIVE via raw API — do not defer):** a **non-member** account is *denied* reading
messages/answers/dates/entitlements/sessions/capsules, writing plaintext to encrypted fields, self-granting premium,
and any cross-couple access. Run it the **raw-API angle**: mint a non-member ID token (admin custom token →
Identity Toolkit `signInWithCustomToken` REST) and issue Firestore REST GET/PATCH against the couple's docs — expect
App Check `403` or rules `PERMISSION_DENIED` on every attempt. Also issue the **same** reads with a **member** token to
characterize the enforcement layer (App Check vs rules). Any unauthorized `200` with couple data = **P0**.
- **D4 Key exchange / management / recovery (E2EE crux):** couple key client-generated, only leaves device **wrapped**
(KDF from invite seed; server holds only `wrappedCoupleKey`+`kdfSalt`/`kdfParams`+`encryptedRecoveryPhrase`); **KDF
strength**; Tink AEAD = AES-GCM/256 with **AAD=coupleId**, no weak/custom crypto/nonce reuse; keybox/sealed/commitment
integrity; **recovery-wrap server-blind**; **unpair revokes decrypt**; invites CSPRNG + single-use + expiry.
- **D5 App Check / Functions / secrets:** App Check enforced; callables validate auth+membership; webhook authenticity;
admin-only writes rejected from clients; service-account JSONs never committed; no plaintext/secrets in logcat; temp
files deleted.
- **D6 Leak vectors:** no private content in analytics/crash; `allowBackup=false` + backup rules exclude sensitive data;
deep links re-check membership; clipboard user-initiated; consider `FLAG_SECURE`; repo scan for committed secrets.
- **D7 Encryption migration:** test the `encryptionVersion` paths (0 plaintext → 1 migrating → 2 strict) on a legacy
couple — migration completes without exposing plaintext or losing/garbling old content, and a half-migrated couple
is safe (no mixed read failures, no downgrade). This is the riskiest data path for existing users.
### Pass G — Account creation, validation & fake-account abuse (MANDATORY — both the happy path AND the attacks)
Cover **every account-creation avenue a real user takes** and **every fake/abusive creation attempt an attacker would
try.** Use throwaway test accounts (sign-out → fresh sign-up; never `pm clear`). Report-first like every pass.
- **Real creation flows (happy path + validation):** sign-up (email/password and any social/anonymous path), profile
creation, and pairing — both **create-invite** and **accept-invite** sides. Verify field validation (invalid/empty
email, weak/short password, mismatched confirm, name length/emoji/unicode), the **error copy is friendly** (no raw
SDK/Firebase error leaking — cf. A-OBS), loading/disabled states, and that a brand-new unpaired account lands on the
correct "create or accept invite" home (not a broken/blank or paired view).
- **Duplicate / conflicting creation:** sign up with an **already-registered email** (clear "already in use", no crash,
offer sign-in); create a second account while one is signed in; re-run onboarding after completing it; accept an
invite while **already paired** (must be rejected cleanly); two devices accepting the **same invite** (single-use —
the second must fail gracefully).
- **Fake / malicious creation attempts (security — expect DENY, never crash or leak):** create an account that is
**NOT a member** of the test couple and attempt every cross-couple action (read messages/answers/dates/entitlements,
write to the couple, self-grant `premium`/`hasPremium`, join/hijack pairing with a guessed/expired/reused invite
code) — all must be **denied by rules** (this is the live execution of **D3**). Probe **invite-code abuse**: replay a
used code, use an expired code, brute-force/guess attempts (CSPRNG entropy + single-use + expiry must hold). Probe
**App Check**: a request without a valid token is rejected. Confirm a malformed/forged sign-up can't bypass profile
or membership requirements. **Any successful unauthorized create/read/write = P0.**
- **Account lifecycle around creation:** sign-out → sign-in (state restores, no stale couple); **delete account** then
re-create with the same email (clean slate, partner notified/unpaired); an unpaired/just-created account tapping a
stale notification or deep link is handled gracefully (no crash, sane landing).
- **Done = every creation avenue exercised** (happy + duplicate + malicious) with each attack **denied** and each happy
path validated end-to-end; findings filed with exact repro.
### Pass E — Notifications (every type delivers, deep-links, leaks nothing)
For each: trigger fires → delivered to the **right partner (never self)** → in **foreground/background/killed**
correct channel + copy with **no private content****tap opens exactly the right item** (loaded, not generic Home/
dead-end) → no duplicates → rate limiter (20/day,100/week) doesn't drop legit ones.
Inventory (type → trigger → destination), all 17: `chat_message`(onMessageWritten→conversation, foreground→chat-head
bubble), `partner_started_game`/`partner_finished_game`(onGameSessionUpdate→game/results), `partner_answered`
(onAnswerWritten→reveal), `daily_question`(assignDailyQuestion)/`daily_question_reminder`/`daily_reminder`
(dailyQuestionReminder→Today), `date_match`(createDateMatch→match), `partner_joined`+`invite_created`
(acceptInviteCallable→pairing/home), `partner_left`(onCoupleLeave)/`partner_deleted_account`(onUserDelete→home/
relationship settings), `memory_capsule_unlocked`(scheduled→capsule), `challenge_day_ready`(→Connection Challenges),
`outcome_reminder`(scheduledOutcomesReminder), `reengagement`(reengagement/gameRetention), `gentle_reminder`
(sendGentleReminderCallable), `spki`(identify + confirm handled).
- **Tap-to-open:** every notification opens the **specific item** from foreground/background/killed; tapping in-app
doesn't stack/duplicate; logged-out/unpaired tap is graceful. Wrong/dead destination = P1.
- **Scheduled/time-based:** trigger manually (invoke callable/function or seed due condition — user-gated).
- **Foundations:** FCM token registration on sign-in (`TokenRegistrar`) + `onNewToken`; POST_NOTIFICATIONS prompt +
denied path; channels (`di/NotificationModule`); deep-link routing (`MainActivity.deepLinkRouteFromIntent` →
`AppNavigation`); foreground/background split (`core/notifications/AppMessagingService`).
- Build a delivery matrix (type × {foreground,background,killed}) in ClaudeQACoverage.md. Missed delivery or wrong
deep-link = P1; private content in any payload = P0.
### Pass F — Resilience, concurrency, lifecycle & time (cross-cutting; a 2-user realtime app needs these)
- **Concurrency / realtime races (two partners at once):** both answer the daily question simultaneously; both start
a game / swipe a date / react at the same time; partner acts while you're mid-flow. No lost writes, no stuck state,
no duplicate sessions, reveal still correct. (This is where a couples app breaks.)
- **Lifecycle / process death:** background mid-flow + return; force-kill the app and relaunch (Android may kill the
process) — state/auth/draft restore sanely; deep-link/notification after process death still loads (verified for
chat — extend to all). Rotation/config-change doesn't lose Compose state. Low-memory.
- **Network resilience:** offline / flaky / airplane mid-action across answers, games, dates (not just chat media) —
graceful failure + retry/queue, no crash, no silent data loss, recovery on reconnect.
- **Idempotency / rapid input:** double-tap send/submit, rapid nav, double-start — guarded (no double-send, no crash).
- **Time-dependent behavior:** daily-question rollover (6 PM CST assignment), streak day-boundary + repair window,
capsule unlock times, reminder schedules — test across a date change (manipulate device clock / trigger functions).
- **Account/couple lifecycle:** brand-new (empty) account; unpaired state; pair → unpair → re-pair; partner leaves
mid-session; account deletion cascade; same account on two devices. No orphaned/broken state.
- **Crash reporting:** confirm crashes/ANRs are actually captured (Crashlytics) so field issues surface.
### Pass H — Branding & artwork (every screen: could it carry more of the brand? where would art help?)
A consumer-mindset pass focused on **brand presence and delight**, not defects. Walk **every screen and surface** and
ask: *does this feel like Closer (private, warm, equal, intentional — a ritual for two)? Could brand color, the heart
mark, a brand message, or an illustration make it warmer or clearer without clutter?* Output is **artwork descriptions
written as ready-to-paste ChatGPT image-generation prompts** — the user generates the images; we only describe them.
- **First, lock the house style (do this once per round, refresh if the art evolved):** read `docs/brand/visual-identity.md`
+ `docs/brand/asset-system.md` AND open 23 existing illustrations (`illustration_couple_onboarding`,
`illustration_reveal_celebration`, `pack_art_*`) to capture the *actual* look. New screens/features since the last
brand review must be folded in. Keep the canonical **house-style prompt prefix** + palette in the branding deliverable
(`ClaudeBrandingReview.md`) so every prompt reuses it and **all generated art matches the existing artwork.**
- **House style (must hold for every prompt):** flat 2D pastel vector illustration; soft rounded shapes, no harsh
outlines, gentle gradients; palette aubergine `#24122F` / deep purple `#56306F` / lavender `#B98AF4` / soft pink
`#F7C8E4` / soft lavender `#D9B8FF` / blush white `#FFF8FC`; motifs = two-equal-halves heart, paired/sealed cards,
floating hearts + petals, candle/mug/lavender-sprig warmth, moon/quiet-hours, calendar/date-card, capsule; mood =
warm, quiet, equal, intentional. Couple figures balanced + inclusive, faces simple. **Never** show readable answer/
prompt/message text, invite codes, emails, dating-app clichés, stock photos, alarm/urgency/surveillance imagery.
- **Per screen, decide the brand opportunity** (pick the lightest that fits — don't over-decorate):
- none needed (already on-brand, or a dense list/form where art would clutter) — say so;
- **color/typographic** brand touch (palette, heart mark, a rotating privacy message);
- **small glyph** (brand glyph for a relationship concept — describe it for the glyph set);
- **hero/empty-state/celebration illustration** (the high-value case → write the full ChatGPT prompt).
- **Each artwork item records:** screen/route · placement (hero / empty / header / card / celebration) · why it helps ·
filename to match the existing scheme (`illustration_*`, `pack_art_*`, `glyph_*`, `particle_*`) · **the ChatGPT
prompt** (house-style prefix + the specific scene) · aspect ratio/size + light/dark behavior. Cross-check the
brand doc's "Needed additions" / empty-state list and **mark which already have assets vs still need art** (e.g.
Android may still lack illustrations that iOS has).
- **Prioritize** the screens a user feels most: onboarding/pairing, Home, paywall/subscription, reveal/celebration,
empty states (no messages/dates/capsules/history), Memory Lane, Connection Challenges, date match, quiet-hours.
- Branding *defects* (mis-colored, clipped, off-brand, low-contrast art) are bugs → `ClaudeReport.md`. Pure
"works but could be warmer / a feature idea" → `Future.md` `## QA`. New art to create → `ClaudeBrandingReview.md`.
## Reporting → ClaudeReport.md (living QA report)
- Header: date, build, devices, round number + run-state header.
- One section per pass (A/B/C/D/E/F), each a table: **ID | Area | Screen/Route | Mode | Severity | Description | Repro
| Evidence | Suggested fix | Status**.
- Summary: counts by severity. Report only during passes — no fixes recorded until the fix phase.
## Fix phase (only AFTER all passes of the round complete)
- Work strictly by severity: **all P0 → P1 → P2 → P3**.
- **One issue at a time**: implement → `./gradlew :app:assembleDebug` → install both → verify THAT fix live (correct
device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, **content still ciphertext
in Firestore**) → flip its row to **Fixed** + **commit** (one per issue/cluster) → next. Don't start the next until
the current is verified.
- **Couple-shared premium fix**: replace direct `isPremium()` gates with
`CouplePremiumChecker.coupleHasPremium(partnerId)` in every gated VM/screen (partner-entitlement read rule deployed).
**High regression risk** — re-verify each feature in BOTH self-premium and free states.
- Gated actions (entitlement toggles, deploys) are **user-authorized per occurrence**.
- **New issues found while fixing** are logged (new ID), not silently fixed beyond scope — next re-QA round catches them.
**Definition of done:** a **pass** is done when every coverage row is `pass`/`fail→id`; a **round** is done when all
five passes are done; **flawless** = one full round with **zero open P0P2 and Passes D + E fully clean**. Then stop
(P3s optional). Don't re-open a clean pass within the same round.
## Re-QA loop (until flawless)
After the fix phase, re-run Pass A/B/C/D/E/F (regression + confirm fixes). Repeat **fix → re-QA** rounds until a full
round yields zero P0P2 and Passes D+E fully clean.