66 KiB

Raw Blame History

Claude QA Playbook — Full-App QA → Fix → Re-QA until flawless

Reusable QA plan for the Closer app. Run report-only first, fix everything, then re-QA until a clean round. Progress/state is tracked in ClaudeReport.md (issues) + ClaudeQACoverage.md (coverage matrix), which are the authoritative source of truth. See the Continuity section before resuming.

Program roadmap: Part 1 = Android QA (this doc) → Part 2 = build the iOS app to Android's current parity → Part 3 = run these same passes on iOS + a cross-platform (Android↔iOS) pass. Parts 2 & 3 live in ClaudeiOSPlan.md (note: iOS build/run/QA requires macOS — not possible from this Linux box).

Where every finding goes (route it here — exactly one home each)

What you found	Where it goes	Form
A bug — broken / incorrect / crashing / insecure, premium bypass, wrong-or-missing notification, dead-end nav	`ClaudeReport.md`	Table row: stable ID (`A-001`, `E-003`…) + severity (P0–P3) + repro + status
An idea / improvement — works but could be better, confusing copy, missing affordance, rough-but-not-broken flow, "it'd be great if…", feature idea	`Future.md` `## QA`	Short title + what prompted it + suggested improvement
New artwork to create — illustrations, glyphs, image-gen prompts	`ClaudeBrandingReview.md`	House-style prompt + placement
What got tested + its status (pass / fail / todo / deferred)	`ClaudeQACoverage.md`	Coverage cell (the resume anchor)

A branding defect (mis-colored, clipped, off-brand, low-contrast art) is a bug → ClaudeReport.md, not a brand idea — only new art to create goes to ClaudeBrandingReview.md.
Logging an idea in Future.md is never a substitute for filing a real defect: if it's broken, it gets an ID in ClaudeReport.md too.
Bug lifecycle: filed in ClaudeReport.md → fixed → kept one confirmation round → pruned to the archived-ID line (detail lives in git). Future.md ideas sit in the backlog until built. (See Report hygiene under Reporting.)

Context

Drive the real app on both emulators, verify each thing live, report, fix, re-verify. Five QA dimensions:

Couple-shared premium — if EITHER partner is premium, all premium features unlock for both.
Games — each starts, plays, joins, resumes, finishes, and reopens results correctly on both devices.
Full visual pass, light + dark — every screen, text readable, nothing clipped/invisible.
Security & encryption (cornerstone) — every private field is ciphertext at rest, rules hold against non-members, keys/recovery are sound. Findings here default to P0.
Notifications — the full suite: every type delivers to the right partner (foreground/background/killed), deep-links correctly, opens the right destination on both clients, covers all game/join-game flows, handles stale notifications, and leaks no private content.

Scope decisions: exhaustive visual pass (all ~50 screens, both modes); full scope incl. pre-pairing flows (fresh throwaway account); couple-shared everywhere — per-user gates are bugs, fixed by routing through core/billing/CouplePremiumChecker.kt; full notification suite — every type, game + join-game pushes, deep-links, stale-notification handling, and all in-app paths into joining/resuming/results, verified on both clients.

Early known signal: only chat uses CouplePremiumChecker; games/packs/dates/wheel gate on the user's own EntitlementChecker.isPremium() — so premium almost certainly does NOT unlock for the free partner there. Pass A confirms + enumerates this; the fix phase applies couple-shared everywhere.

Execution mode — run to completion (autonomous; do NOT stop)

Do not stop to check in or ask for approval. Run all passes (A–J) → the fix phase → re-QA rounds continuously until a flawless round (zero open P0–P2, Passes D + E clean, every game fully played through, all notification routes verified, navigation/back-stack verified). Don't hand control back early.
Unblock yourself: if anything blocks progress (a stale/blocking session, a crash, a build break, a missing prerequisite state, a broken nav path that prevents reaching a screen), fix it immediately and continue — even though passes are otherwise report-only. Blocking issues are fixed inline so the run can proceed; non-blocking findings are still logged and fixed in the fix phase.
"Once executed, complete it": never declare done before the Definition of Done is met — keep cycling fix → re-QA until flawless, then stop.
Context limits ≠ stopping — do NOT hand back to the user when context fills. The harness auto-summarizes a long conversation and continues in the next window; you continue without the user. (You cannot self-invoke /compact — and you don't need to; auto-compaction handles it.) The committed ClaudeReport.md run-state + ClaudeQACoverage.md are the authoritative state and survive any compaction — after a summary, re-read them and continue at the next chunk. Never pause a run merely because context is getting long; only stop for a true blocker (a denied gated action even with standing auth, or the macOS requirement for iOS).
Commit before anything interruptible so a mid-chunk compaction never loses progress. Keep chunks atomic; if a chunk is cut off mid-way (e.g., a game session left active), the session-start ritual recovers it (clear the stuck session via in-app "End their game", then redo that chunk). Right-sized chunks (see Batch sizing) make this rare.
Don't pause for "by-design vs bug": log the ambiguous finding and keep going (don't unilaterally rewrite deliberate design — the log captures it). Never halt the run to ask.
Only true stop = a gated action you cannot perform. Production deploys, admin Firestore writes/seeds, and entitlement toggles still need per-occurrence authorization (the classifier enforces this regardless of this doc). If one is genuinely required to proceed and is denied, do all other work first, then surface only that single blocker — don't halt the whole run for it.

Methodology (every pass)

Devices: 5554 (QA), 5556 (Sam), paired; one fresh throwaway account for pre-pairing flows.
Drive via adb tap/swipe; resolve coords from uiautomator dump bounds; downscale screenshots to read; scan logcat for FATAL EXCEPTION/ANR on each screen.
Premium toggled via scratchpad/set_premium.js (admin, user-authorized each time).
Theme toggled via Settings → Appearance (Light/Dark) (MainActivity ThemeMode).
REPORT-ONLY during passes — never fix mid-pass.
THINK AS A CONSUMER — approach everything from different angles. Beyond "does it work", constantly ask "is this what a real person would expect / want here? is this delightful, confusing, or annoying?" Come at each flow from multiple angles (first-time user, returning user, the partner who didn't start it, someone tapping fast, someone reading carefully, the skeptic, the impatient). Vary inputs, depths, orders, and entry points (don't repeat one happy path). A thing can be bug-free yet still worse than it should be — notice that too.
CAPTURE IMPROVEMENT / FEATURE IDEAS → Future.md (section ## QA). Bugs (broken/incorrect behavior) go to ClaudeReport.md as always. But anything that works yet could be better — confusing copy, a missing affordance, a rough-but-not-broken flow, a "it'd be great if…" feature idea — append it to Future.md under ## QA with a short title, what prompted it, and the suggested improvement. This is an idea backlog, not the bug log; logging here is never a substitute for filing an actual defect in ClaudeReport.md.
Environment (senior-QA rec): prefer the Firebase Local Emulator Suite or a dedicated staging project over production — isolates test data, makes seeding / entitlement toggles / D3 negative tests free (no gated prod writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but every seed/toggle/deploy hits the gate.)
Device/OS matrix: don't certify on one emulator only — cover minSdk + targetSdk, a small and a large screen, and at least one physical device (App Check / Play Integrity behave differently on emulators).
Automate the regression smoke: capture the smoke checklist as a runnable script (adb/Maestro) so every round re-checks it cheaply instead of by hand.
Test-data hygiene: keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between rounds so they don't masquerade as bugs.
Evidence standard: every filed bug must be reproducible from text alone: build/commit, device, account, theme, app/process state, screen/route, exact tap/input sequence, expected result, actual result, and whether logcat showed a crash/ANR/permission denial. Screenshots/videos are helpful but never the only evidence because session artifacts may not survive compaction.
Flake policy: if something fails once and then passes, do not dismiss it. Repeat from a clean state, vary timing (rapid tap / slow network / background-resume), inspect logs, and file it as intermittent if it cannot be made fully deterministic. Intermittent routing, notification, encryption, duplicate-write, or crash behavior is still a bug.

Living discovery ritual (before each round, and whenever reality disagrees with the docs)

The app is allowed to grow; the QA plan must keep up. Before a pass or chunk, quickly inventory the current code/app surface and reconcile it with ClaudeQACoverage.md:

Routes/screens: inspect core/navigation/AppRoute.kt, navigation graph call sites, Settings sub-pages, dialogs, bottom tabs, deep links, and any new composables reachable by buttons/cards.
Notifications: inspect notification type enums/classes, Cloud Function triggers, Android intent/deep-link handling, notification channels/actions, FCM token registration, and Android runtime notification permission paths.
Features/gates: grep for premium checks, permission requests, media pickers, billing/paywall entry points, destructive actions, account/couple lifecycle actions, and admin/server-only writes.
Assets/content: inventory new drawables, drawable-night* variants, pack art, empty states, strings, feature flags, remote config, and any debug-only screens that should not ship.
Backend/rules: inspect Firestore rules, indexes/queries, Functions triggers/callables, Storage paths, scheduled jobs, and migrations for new data shapes or access paths.
Docs update rule: if the inventory finds a page, feature, notification, asset, state, backend path, or edge case missing from the playbook/coverage, update ClaudeQAPlan.md and ClaudeQACoverage.md before marking the chunk done. If it is product polish, also add it to Future.md; if it needs new artwork, add it to ClaudeBrandingReview.md.

Multi-angle attack mandate (go DEEPER than "does the happy path work")

A capability can pass via the UI yet fail when hit directly. Probe each meaningful capability (read/write a private field, gate a premium feature, deliver/route a notification, start/finish a game, pair/unpair, create an account) from as many independent angles as apply — not just the in-app happy path:

Real UI (play-as-user) — the baseline angle.
Crafted intent / deep-link — fire the exact intent a notification/link carries (bypasses UI nav) to test routing in isolation; also send malformed/missing extras → must route gracefully or no-op, never crash.
Raw API against the DEPLOYED backend — hit Firestore/Storage/Functions REST directly with a real token, as a member AND a non-member, to exercise rules + App Check from OUTSIDE the app. A non-member (or no-App-Check) request must be DENIED — App Check 403 or rules PERMISSION_DENIED. The member request characterizes which layer enforces. Any unauthorized 200 returning couple data = P0.
Admin inspection (ground truth) — read the RAW stored docs/objects (admin bypasses rules) to assert what is actually persisted: ciphertext only, no plaintext, no raw keys/invite-seeds, no private content in pushes.
Concurrency / race — two partners (or two rapid taps) hit the same thing at once.
Killed / cold state — force-stop, then deliver + tap a notification; cold-start straight onto a deep link.
Malformed / abusive input — oversized, empty, rapid-fire, injection-ish, forged FCM payloads, replayed/expired tokens & invite codes.
Offline / flaky — drop network mid-action → graceful failure, recover on reconnect.

Record which angles were tried per area in ClaudeQACoverage.md. For security- or data-sensitive capabilities, "UI happy path only" is not a pass. D3/Pass G negative access MUST be executed live via the raw-API angle each round — never deferred to "only 2 emulators." (Mint a token for a non-member UID via admin → exchange for an ID token via the Identity Toolkit REST signInWithCustomToken → use it as Bearer against the Firestore REST API.)

Continuity & resumability (this effort WILL span many context windows — don't lose state)

State lives in files, not memory:

ClaudeReport.md = the issue log (committed). Each issue row is self-contained in text (repro + expected
- actual) — screenshots are session-only and won't survive a compaction; never rely on a screenshot path alone.
ClaudeQACoverage.md = the coverage matrix: every screen×mode, feature×premium-state, game×lifecycle, notification×{foreground,background,killed}, each todo | pass | fail→id | not implemented→Future.md | blocked→id. The resume anchor.
Future.md (## QA) = the non-bug improvement/idea backlog; ClaudeBrandingReview.md = the branding/artwork review + image-prompt backlog. Both committed alongside the report/coverage.
Persistent memory (memory/): QA methodology + exact commands; emulator↔account↔coupleId mapping; scratchpad/set_premium.js + admin tooling; the couple-shared-premium-everywhere goal + the per-user-gate gap.
Run-state header pinned at the TOP of ClaudeReport.md, always current: Round N | Pass X | Chunk Y | NEXT ACTION: … — first thing to read, last thing to update before stopping.
Stable issue IDs: A-001 / B-002 / C-… / D-… / E-… (pass-letter + number); coverage references the ID for every fail. Never renumber or reuse.
Source of truth: the two MD files are authoritative; the TodoWrite list is scratch for the current chunk only. Update the MD files + run-state header before ending a session.
Living playbook rule: when QA discovers any new app surface or recurring lesson — a new page/route, feature, setting, game state, notification type/action/channel, entry point, background/killed-state behavior, asset/art placement, repeatable bug class, missed edge case, fragile route, confusing state, image/layout failure mode, security angle, or anything else that should be checked every future round — update this ClaudeQAPlan.md in the relevant pass before ending the chunk. Also add the matching row/cell to ClaudeQACoverage.md if it needs recurring verification. Do this even after the immediate bug is filed/fixed so the lesson or newly discovered surface is not lost to memory or git history.
Commit cadence: commit ClaudeReport.md + ClaudeQACoverage.md after each pass and each chunk.
Chunking: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
Session-start ritual: (1) read run-state header + both MD files; (2) adb devices shows both emulators online; (3) installed build == current HEAD (rebuild+reinstall if unsure — never QA a stale APK); (4) continue at the first todo / unverified-fix; (5) if a prior chunk left an active/stuck game session, recover it via in-app "End their game" (log if needed), then redo that chunk.

Batch sizing — sub-batch each pass to ONE context window (Round-1 calibration)

A pass is a category, not a unit of work. Execute each pass as sub-batches (chunks), where a chunk = the largest coherent unit that reliably finishes AND commits within one context window, with margin. End every chunk with a commit + run-state update. If a chunk starts overflowing, split it; if chunks feel trivial, merge them. Why: in Round 1, A & D fit as single batches, but B/C/E were too large → got cut off → deferred. Sub-batching prevents half-done/lost work and gives cleaner per-chunk verification + revertable commits.

Default small: if a chunk requires two-device live driving, screenshots/montage review, logcat checks, or admin/API verification, keep it to one small route family, one game phase, or one notification type. A chunk is too large if it cannot produce a precise coverage update, issue log, and commit before context gets tight. Split before starting rather than leaving a half-tested matrix behind. Prefer Claude-friendly micro-batches: smaller chunks let the agent fully inspect screenshots, tap every CTA, vary app states, update files accurately, and avoid shallow "covered" rows.

Pass	Chunk granularity	~chunks
A Premium	one gated-feature family per chunk if live toggles are needed; otherwise free-state sweep → couple-shared verify	2–4
B Games	one game per chunk max; split complex games into lifecycle/playthrough chunk + join/resume/results/notification-entry chunk	7–14
C Visual	one small route family per chunk (both themes, ~2–3 screens/states, screenshots reviewed + nav/back + image-fit + all CTAs for that family) — never "all screens" or a broad tab at once	16–25
D Security	one security assertion group per chunk: D1 at-rest · D2 rules static · D3 live negative raw API · D4 keys/recovery · D5/D6 leaks · D7 migration	~6
E Notifications	one notification type per chunk with the full contract below; split a type into direction/state subchunks if needed, but do not mark the type pass until both clients + source screens + fg/bg/killed + stale/malformed + payload/back-stack are covered	16–30
F Resilience	one dimension per chunk (concurrency · lifecycle/process-death · network · time · account-lifecycle)	~5
G Account creation	one creation/abuse dimension per chunk (happy/validation · duplicate/conflict · fake-account abuse · lifecycle)	~4
H Branding	one small route family per chunk (~2–3 screens/states) consumer brand walk + ready-to-paste art prompts + existing-image integration verdict	8–14
I Performance	one route-group per chunk — gfxinfo/jank + read-count instrumentation (build the route smoke checklist)	~3
J Accessibility	one a11y setting per chunk (font scale · TalkBack · contrast · targets · keyboard · reduce-motion)	~5

Context-cost tips: prefer code/admin-read audits (cheap) before live UI sweeps; montage screenshots (dark|light pairs) to review many at once; keep one chunk = one TodoWrite focus.

Guardrails & efficiency

Never pm clear / wipe app data — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
Never run seed/build_db.py. Admin seeds/writes, entitlement toggles, and any deploys are user-authorized per occurrence.
By-design vs bug: if a finding may be intended behavior, log it and keep going (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
Pass C parallelism: set 5554 = Dark, 5556 = Light to capture both themes at once.
Never log decrypted message/answer content.

Severity scale (label every issue)

P0 Critical — crash/ANR, data loss, encryption/security leak, feature fully broken, premium bypass.
P1 Major — feature partly broken, premium not unlocking for partner, wrong/missing notification, dead-end nav.
P2 Minor — readability/contrast, clipping/overflow/truncation, theme not adapting, inconsistent styling, wrong/double-back navigation.
P3 Polish — spacing/alignment/copy nits.

QA passes (Round 1 = baseline)

Pass A — Couple-shared premium (target: either partner premium → both unlock)

Test each gated feature in 3 states: neither premium → locked + paywall; partner-only premium → BOTH unlock; self premium → unlock. Toggle Sam premium, confirm QA (free) unlocks; toggle off. Features: Play-hub games (Desire Sync + any premium-badged), Connection Challenges, Memory Lane; Question Packs; Spin the Wheel / Category Picker / Wheel History (+ any premium wheel categories); Date Match / Plan Date / Date Builder; chat media + reactions + any premium chat tools (regression — already couple-shared); Subscription/Settings reflects entitlement. Gated files (for the fix): ui/play/PlayHubViewModel, ui/desiresync/DesireSyncScreen, ui/wheel/{CategoryPicker,SpinWheel,WheelHistory}*, ui/questions/QuestionPackLibrary*, ui/dates/{DateMatch,DateMatches}Screen, ui/memorylane/MemoryLaneScreen, ui/challenges/ConnectionChallengesScreen. Also: any VM/screen calling EntitlementChecker.isPremium() directly (grep for it) is a candidate gate.

Pass B — Games lifecycle (MANDATORY: play each game ONE complete time through)

Games: This or That, How Well Do You Know Me, Desire Sync, Connection Challenges, Memory Lane, Spin the Wheel, + Date Match.

PLAY AS THE USER (mandatory mindset for this pass): drive every game the way a real user would — reach it through the actual in-app navigation a person would tap (Play hub → the game's card → its buttons), not via deep-links, admin pokes, forced state, or any shortcut a user doesn't have. Expect what the user expects: if a tap/button/flow doesn't do the obvious thing, or a screen doesn't behave the way a normal user would assume, that itself is a finding — log it.
When something doesn't work: REPORT FIRST, then a minimal workaround (in that order). Do not silently engineer around breakage by taking extra steps the user wouldn't take. The moment the natural user path fails: (1) log the issue in ClaudeReport.md with severity + the exact user action that failed and what was expected; (2) only then apply the smallest workaround needed to keep the pass moving. The workaround never replaces the report — a flow that needs a workaround to proceed is, by definition, broken and must be filed to fix. If a workaround is impossible, mark the game fail→<id> (blocked) and continue with the next.
A launch/crash check is NOT sufficient. Each game MUST be played one full way through, end-to-end, on BOTH devices — start → answer/interact through every step/round/question on each device → reach the finish/reveal/results screen → confirm the result renders correctly for both partners. Verify each intermediate screen and interaction works (selections register, progress advances, both-answered gating, reveal/scoring/summary correct). Premium games (Desire Sync, Memory Lane) need a premium toggle to play.
The session lifecycle is exercised by the real playthrough: status active→completed; reveal/results correct on both.
GAME JOIN PATHS (mandatory — the second partner must JOIN, not just co-play): the starter begins from real in-app nav; the joiner then enters from every user-facing entry point — notification tap, Play-hub active state, Home active-game card, Today prompt, waiting-room/resume screen, in-app foreground banner, game history/replay, and (after the natural paths) deep-link/crafted intent + cold-start from a push. A game isn't complete unless both partners can start, join, resume, finish, reopen results, and recover from a stale/ended session — with no duplicate sessions, wrong routes, stuck waiting screens, broken back nav, or premium-gate mistakes.
VARY THE STYLE OF PLAY (don't just repeat the happy path): across runs, deliberately exercise different ways a real couple would play each game, because different inputs hit different code paths:
- Different DEPTHS and QUESTION COUNTS — cover the matrix, don't settle for one combo: play each game across every depth/mood (Light, Everyday, Deep, All-topics/shuffle) AND every round length / number of questions (5 / 10 / 15), in different pairings across runs (e.g. Light×5, Deep×15, Everyday×10, All×5) — short and long sessions, shallow and deep content. Different depths surface different question sets, tones, and edge content (e.g. Deep/Desire-Sync sensitive prompts); different counts stress pacing, progress, and the both-answered gate. Also exercise each distinct answer type (A/B, Yes/No, True/False, 1–5 scale, multi-select, free-text).
- Different answer patterns that change the result — all-match vs all-mismatch vs partial; both-yes vs both-no vs split (so reveals show "shared", "all private", "0 matches", "perfect/zero score" — verify each renders right).
- Different turn orders / who-starts — partner A starts vs partner B starts; the guesser opens before vs after the subject finishes; both open simultaneously (race); one device much slower than the other.
- Different exit/resume styles — finish normally; quit mid-game; background mid-game then resume; cold-kill mid-game then reopen; "End their game"; re-open a completed session for the replay/results; play two games back-to-back, and a different game type immediately after.
- Edge inputs — submit with nothing selected (should be blocked), rapid double-taps on answer/confirm/next, spamming the start button, tapping during the reveal animation, switching tabs mid-game, receiving/tapping a notification mid-game. None should crash, duplicate, or desync.
Edges: re-open a completed session, leave mid-game (resume), no stuck session, no crash, logcat clean.
Game start/finish pushes (onGameSessionUpdate) exercised here; full delivery/deep-link audit in Pass E.
Media permissions (CAMERA, RECORD_AUDIO): granted works, denied degrades gracefully.
Done = every game has one verified complete playthrough (a launch-only "opens, no crash" row is partial, not pass). Coverage row format: game × starter × join-entry × premium-state × depth/count × lifecycle-edge × result; only pass when start/join/play/finish/reopen/recover are all verified.

Pass C — Visual pass, light + dark, ALL screens

Every route in core/navigation/AppRoute.kt (~50), in both modes: text contrast/readability (no invisible/ low-contrast), no clipping/overflow/ellipsis breakage, icons visible, backgrounds adapt, controls legible. Groups: auth/onboarding/pairing (fresh acct); Home (solo + paired); Play + every game; Today + reveal/history; Messages (inbox + conversation); Packs; Dates (Match/Builder/Matches/Bucket List); Wheel (picker/session/complete/history); Settings + all sub-pages (Account, Notifications, Appearance, Privacy, Subscription, Relationship, Security, Delete Account); Paywall; Your Progress/Activity; Recovery.

Images must belong to the screen: during the UI sweep, visually inspect every illustration, glyph, banner, empty-state image, pack art, celebration asset, and dark/light variant in context. It should feel intentionally integrated with the page hierarchy, copy, spacing, and action area — not like a forgotten placeholder dropped into an empty slot. Check crop, scale, padding, alignment, corner radius, background/tile treatment, theme variant, loading/fallback state, and whether the image competes with or clarifies the primary task. If it is broken, clipped, low-contrast, off-brand, stale, or placeholder-looking, file a bug in ClaudeReport.md; if the screen works but would benefit from new/better art, log the prompt need in ClaudeBrandingReview.md.
Probe: ui/theme/Theme.kt hardcoded brand colors + chat's custom closerBackgroundBrush — verify dark mode truly adapts; grep screens for hardcoded Color(0x...).
States, not just happy path: empty / loading / error / not-paired / locked-premium / signed-out / stale-or-deleted-target / populated-with-many where they exist; many need data setup (seeding is user-gated) — note unreachable states in coverage rather than skipping silently.
Text/data stress: test long names, long relationship labels, long question/answer text, emoji, multiline content, empty optional fields, many list items, and both partners having similar names. Verify no clipping, overlap, confusing attribution, broken sorting, or hidden actions.
Readability at scale: default font size + spot-check largest system font scale on text-heavy screens. (The full accessibility sweep — large-font on every primary flow, TalkBack labels, touch targets, keyboard, reduce-motion — is Pass J; per-route performance/jank is Pass I.)
Navigation from every entry point: reach each screen from all the places that link to it and confirm it opens correctly each time — e.g. a conversation from the inbox AND from "Discuss" AND from a notification; a game from the Play hub AND from a notification; Paywall from each gated feature; Settings sub-pages; reveal from Today AND from history AND from partner_answered. A screen that works from one entry but breaks/duplicates from another = bug.
Every link, CTA, and mission must prove its destination: actively hunt for dead buttons, wrong targets, generic Home fallbacks, no-op taps, stale routes, and confusing affordances. Example class: a Reveal card saying "Tiny Mission: Send one flirty text" must open the relevant Messages/conversation flow, not do nothing. For every button/card/chip/row, record the expected destination before tapping, then verify the actual destination, state, payload, and back stack. Broken/no-op/wrong-destination CTA = bug (usually P2; P1 if it blocks a core flow).
All routes into a game / join-game state (verify each opens the correct game + session + partner-state + mode + premium/couple-entitlement + back stack): Play-hub cards (incl. premium-gated), active-session banners, Home/Today game prompts, game history, replay/results, waiting screens, notification-opened screens, in-app banners, "join/resume/continue/view results/end (their) game", deep-link/crafted intent, and bottom-tab return into an active game. Wrong/duplicate destination, double-back, stale-session join, dead-end, or a route that bypasses the premium/couple check = bug.
TAKE EVERY AVENUE (exhaustive nav fuzzing — actively hunt for nav bugs, don't just walk the happy path): treat navigation as something to break. On every screen, tap every interactive element — each button, card, row, icon, chip, link, tab, header back-arrow, system back, and any "see all / history / edit / manage" affordance — and follow where it goes. Then try the combinations and sequences a curious user hits:
- Every order: switch bottom tabs in many orders, mid-flow (open a game, jump to Messages, come back); enter a deep screen then tab away then back; open A→B→C then back-back-back.
- Rapid / repeated input: double- and triple-tap navigation targets (especially "open game", "Play now", "Create/Start session", notification taps) to surface double-push/duplicate-screen/stale-route bugs (cf. B-004).
- Interrupt mid-navigation: background/rotate/lock during a transition; tap a notification while already on that screen, on a different screen, and while logged-out/unpaired; cold-start straight onto a deep link.
- Dead-ends & traps: from every screen confirm there's always a way out (back/close/home) — no screen that strands the user, needs two backs, exits the app unexpectedly, loops, or lands blank. Re-check the asymmetric-game waiting screens, replay/results screens, and paywall specifically.
- Log every wrong/duplicate/dead destination with the exact tap sequence to reproduce. Wrong/double-back or dead-end = P2 (P1 if it traps the user or loses their progress).
Back-stack / "double back": from every entry point, system back AND the in-app back arrow return to the correct previous screen — no dead-ends, no exiting the app unexpectedly, and no screen that requires pressing back twice (duplicate/stacked destinations on the back stack = bug). Bottom-tab reselection and deep-link/ notification entries must land with a sane back stack (back → Home, not off the app or a blank screen). Wrong/ double back or a dead-end = P2 (P1 if it traps the user).
UI consistency / polish defects: compare each screen against sibling patterns in the same area and across the app. Headers, labels, status chips, partner names, connected-state copy, spacing, card treatments, and button hierarchy should feel intentional and consistent. Awkward or out-of-place UI such as a Settings relationship row where "Connected with ..." looks visually odd, cramped, misaligned, or unlike the rest of Settings is a finding: file as a bug if it looks broken/inconsistent; log to Future.md only if it is purely a product/content improvement.
D1 At-rest coverage: admin-read RAW docs/objects, assert ciphertext for every private type — chat text + lastMessagePreview (enc:v1:), chat media bytes (Tink 01 69 59 51 f0…), answers (sealed:v1:/enc:v1:), date plans + date_swipes, Memory Lane capsules, Bucket List. Also: wrappedCoupleKey + recovery material never plaintext; invite code (KDF seed) never stored raw; no push payload carries private content.
D2 Rules audit (static): member-only reads, author/server-only writes, ciphertext enforced on every private field, immutability, no premium self-grant, entitlements write:false; re-audit conversations/typing/reactions
- entitlement partner-read; no catch-all match /{document=**}; list/query not enumerable; get()-rules don't over-expose; no legacy plaintext/downgrade path (coupleEncryptionEnabled holds; no disabled-encryption branch).
D3 Negative access tests (EXECUTE LIVE via raw API — do not defer): a non-member account is denied reading messages/answers/dates/entitlements/sessions/capsules, writing plaintext to encrypted fields, self-granting premium, and any cross-couple access. Run it the raw-API angle: mint a non-member ID token (admin custom token → Identity Toolkit signInWithCustomToken REST) and issue Firestore REST GET/PATCH against the couple's docs — expect App Check 403 or rules PERMISSION_DENIED on every attempt. Also issue the same reads with a member token to characterize the enforcement layer (App Check vs rules). Any unauthorized 200 with couple data = P0.
D4 Key exchange / management / recovery (E2EE crux): couple key client-generated, only leaves device wrapped (KDF from invite seed; server holds only wrappedCoupleKey+kdfSalt/kdfParams+encryptedRecoveryPhrase); KDF strength; Tink AEAD = AES-GCM/256 with AAD=coupleId, no weak/custom crypto/nonce reuse; keybox/sealed/commitment integrity; recovery-wrap server-blind; unpair revokes decrypt; invites CSPRNG + single-use + expiry.
D5 App Check / Functions / secrets: App Check enforced; callables validate auth+membership; webhook authenticity; admin-only writes rejected from clients; service-account JSONs never committed; no plaintext/secrets in logcat; temp files deleted.
D6 Leak vectors: no private content in analytics/crash; allowBackup=false + backup rules exclude sensitive data; deep links re-check membership; clipboard user-initiated; consider FLAG_SECURE; repo scan for committed secrets.
D7 Encryption migration: test the encryptionVersion paths (0 plaintext → 1 migrating → 2 strict) on a legacy couple — migration completes without exposing plaintext or losing/garbling old content, and a half-migrated couple is safe (no mixed read failures, no downgrade). This is the riskiest data path for existing users.

Pass G — Account creation, validation & fake-account abuse (MANDATORY — both the happy path AND the attacks)

Cover every account-creation avenue a real user takes and every fake/abusive creation attempt an attacker would try. Use throwaway test accounts (sign-out → fresh sign-up; never pm clear). Report-first like every pass.

Real creation flows (happy path + validation): sign-up (email/password and any social/anonymous path), profile creation, and pairing — both create-invite and accept-invite sides. Verify field validation (invalid/empty email, weak/short password, mismatched confirm, name length/emoji/unicode), the error copy is friendly (no raw SDK/Firebase error leaking — cf. A-OBS), loading/disabled states, and that a brand-new unpaired account lands on the correct "create or accept invite" home (not a broken/blank or paired view).
Duplicate / conflicting creation: sign up with an already-registered email (clear "already in use", no crash, offer sign-in); create a second account while one is signed in; re-run onboarding after completing it; accept an invite while already paired (must be rejected cleanly); two devices accepting the same invite (single-use — the second must fail gracefully).
Fake / malicious creation attempts (security — expect DENY, never crash or leak): create an account that is NOT a member of the test couple and attempt every cross-couple action (read messages/answers/dates/entitlements, write to the couple, self-grant premium/hasPremium, join/hijack pairing with a guessed/expired/reused invite code) — all must be denied by rules (this is the live execution of D3). Probe invite-code abuse: replay a used code, use an expired code, brute-force/guess attempts (CSPRNG entropy + single-use + expiry must hold). Probe App Check: a request without a valid token is rejected. Confirm a malformed/forged sign-up can't bypass profile or membership requirements. Any successful unauthorized create/read/write = P0.
Account lifecycle around creation: sign-out → sign-in (state restores, no stale couple); delete account then re-create with the same email (clean slate, partner notified/unpaired); an unpaired/just-created account tapping a stale notification or deep link is handled gracefully (no crash, sane landing).
Done = every creation avenue exercised (happy + duplicate + malicious) with each attack denied and each happy path validated end-to-end; findings filed with exact repro.

Run the complete suite across both clients (QA→Sam AND Sam→QA). Each type verified end-to-end: trigger fires → delivered to the right partner (never self/non-member/ex-partner) → correct channel + copy with no private content → tap opens exactly the right item (loaded, not generic Home/dead-end) → sane back stack → privacy/authz re-checked on open. No duplicates; rate limiter (20/day, 100/week) doesn't drop legit ones.

Notification chunk contract (small chunks, complete coverage): each chunk owns one notification type (or one explicit subchunk of that type, e.g. chat_message QA→Sam foreground/source-screen sweep, then chat_message Sam→QA background+killed+stale). Before starting, write the chunk's matrix in ClaudeQACoverage.md; after finishing, mark each cell pass | fail→id | blocked→id | not implemented→Future.md. A notification type is not complete until all applicable cells below are covered:
- Directions: QA→Sam and Sam→QA; sender must not receive their own push unless intentionally designed.
- Process states: foreground, background/warm, killed/cold-start, force-stopped if deliverable, screen locked, and resumed after rotation/process recreation when relevant.
- Current screens: Home, Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation, Settings/sub-settings, Paywall, unrelated deep screen, logged-out, unpaired, and stale prior-partner context.
- Entry surfaces: foreground in-app banner/head, Android system tray tap, any push action button, crafted deep-link/intent matching the payload, repeated/double tap, and tap after the target has changed.
- Targets: fresh target, already-open target, completed target, stale/expired/deleted target, unauthorized target, wrong couple/session/item ID, malformed/missing extras, and no-network-on-open.
- Assertions: correct recipient, correct channel/priority/copy, no private payload/log content, exact destination, membership/auth/entitlement re-check, no duplicate route/session, sane back stack, logcat clean, and coverage/docs updated before the chunk ends.
Notification tap crash triage (mandatory): never conclude "the notification didn't open" from UI behavior alone. Before each notification/deep-link tap, clear or timestamp logcat; after the tap, inspect both devices for FATAL EXCEPTION, ANR, ActivityTaskManager errors, RuntimeException, navigation/deep-link exceptions, PERMISSION_DENIED, and swallowed repository/decryption errors. If the app returns Home, stays put, flashes, restarts, or silently fails, classify whether it was wrong routing, missing extras, stale data, permission denial, or a crash. Any notification tap that crashes (example class: tapping a game notification to open Spin the Wheel) is a filed bug with stack trace + exact payload/session/game type, not a vague "didn't open" note.
Both-client × app-state matrix (per type): QA→Sam and Sam→QA, each in foreground / background / killed (cold-start), plus already on the target screen, on a different screen, logged out, unpaired, with a stale/expired/completed/deleted target, and both users opening around the same time. Not a pass unless it works from both clients in every state that applies.
Current-screen/source-screen matrix (per type): do not test notifications only from Home or only from a clean launch. For each notification type, vary where the receiving client is when the notification arrives/taps: Home, Play hub, active game/waiting/results, Today/reveal, Messages inbox, exact conversation, Settings/sub-settings, Paywall, an unrelated deep screen, app backgrounded from each major tab, and app fully closed/killed. Foreground banners, system-tray taps, warm-start onNewIntent, and cold-start launch must all route to the exact target. A tap that lands on generic Home, stays on the old screen, opens the wrong tab, loses extras, duplicates the destination, or needs a second tap is a bug.
Permission/token health: cover Android POST_NOTIFICATIONS granted, denied, "don't ask again"/system-disabled, and re-enabled states; Settings notification toggles; sign-out/sign-in token refresh; same account on two devices; partner/account switch; stale token cleanup; app reinstall/update; and notification channel migration. Denied/system disabled notifications should fail gracefully with in-app state still correct, never with lost data or broken routing after permission is restored.
Six assertions per notification: (1) trigger fires correctly — right event, not early, not twice, sender doesn't get their own (unless intended), retry/idempotency doesn't duplicate; (2) delivered to the right person — correct token, old tokens unused after sign-out/account-switch; (3) copy + channel correct — friendly, right channel/ priority, no raw Firebase error/raw IDs, no private content in text/payload/logs/analytics/crash; (4) tap opens the exact destination — specific conversation/session/capsule/match/question/settings/pairing, never blank, never a crash on missing/stale/malformed/unauthorized data, no duplicate/stacked copies, completed→results/replay, expired/deleted→ graceful fallback; (5) back stack sane — back returns sensibly (Home/prev context), no double-back, no unexpected exit/loop/blank; (6) deep-link re-checks auth + couple membership + pairing + entitlement + target ownership + session status + existence — a non-member/logged-out/stale/unpaired open must NOT reach private content and must fail gracefully.
Inventory (type → Cloud-Function trigger → recipient → destination) — verify each; mark any unimplemented type not implemented→Future.md (don't count as pass): chat_message(onMessageWritten → partner → conversation; foreground→chat-head bubble) · partner_started_game/partner_finished_game(onGameSessionUpdate → partner → game/join · results/reveal) · join_game/game_invite & partner_joined_game (if present → partner/starter → join screen · waiting-room update) · partner_answered(onAnswerWritten → partner → reveal) · game_abandoned/game_ended (if present → partner → safe ended state, not a stuck session) · daily_question(assignDailyQuestion)/daily_question_reminder/daily_reminder(dailyQuestionReminder → Today) · date_match(createDateMatch → match) · date_plan_update (if present → date plan/builder/match) · partner_joined+invite_created(acceptInviteCallable → pairing/home) · partner_left(onCoupleLeave)/partner_deleted_account(onUserDelete → home/relationship settings) · memory_capsule_unlocked(scheduled → capsule) & memory_capsule_created (if present → Memory Lane/locked capsule) · challenge_day_ready(→ Connection Challenges) & challenge_day_completed (if present → challenge progress) · outcome_reminder(scheduledOutcomesReminder) · reengagement(reengagement/gameRetention) · gentle_reminder(sendGentleReminderCallable) · spki(key identity/confirm → security/key screen) · subscription_entitlement_changed & security_recovery (if present).
Game-notification suite (per game): A starts from Play hub → B gets the start/join push (if supported) → B taps and lands on the correct join/waiting/active screen → B can join from there → A sees B joined/answered → both finish → finish push opens the exact results/reveal → re-opening the push after completion opens replay/results (not a dead active session) → if A ends/quits, B is notified or shown a graceful ended state → a stale game push routes to results/history or a clear expired-session message → simultaneous start/join yields one session, neither stuck → premium gate holds (neither-premium push must NOT bypass paywall; either-premium unlocks for both). For each game type, including Spin the Wheel, notification taps must be paired with logcat review so crashes are caught even if the visible symptom looks like a no-op or generic Home fallback.
Join-game navigation suite: every entry that leads to joining/resuming a game opens the correct game + session + partner-state + mode + entitlement + back stack — Play-hub card, active-game banner/card, Home active-game card, Today game prompt, notification tap, in-app foreground banner, game history/replay, partner waiting screen, results/ reveal, "End their game"/stuck-session recovery, deep-link/crafted intent, cold-start from push, bottom-tab return into an active game, any push action buttons, and any "join/resume/continue/view results/play again". No wrong game type, no accidental stale-session join, no duplicate session on double-tap, back returns correctly.
Payload security (P0 on any hit): inspect raw payload + logs — no plaintext message/answer/capsule/date-plan/ bucket-list/swipe content, no raw invite code/seed, no recovery phrase, no wrapped/decrypted key material, no email/name unless intentionally public; payload carries only the minimum routing metadata. Any private content = P0.
Malformed / stale intents: fire crafted deep-links with missing/unknown type, missing/wrong target or couple ID, wrong game type, expired/completed/deleted target, unauthorized couple/session, malformed params, duplicate/rapid taps, a push for another user/previous partner, while logged-out/unpaired, while on the target screen, and during a different active game → never crash/leak, always a graceful fallback + sane back stack.
Scheduled/time-based: trigger manually (invoke callable/function or seed the due condition — user-gated).
Foundations: FCM token registration on sign-in (TokenRegistrar) + onNewToken + token cleanup on sign-out/ account-switch; POST_NOTIFICATIONS prompt + denied path; channels (di/NotificationModule); deep-link routing (MainActivity.deepLinkRouteFromIntent → AppNavigation); foreground/background split (core/notifications/AppMessagingService); no duplicate local+remote notification.
Coverage: record per row type × trigger × recipient × app-state × destination × back-stack × privacy × both-client in ClaudeQACoverage.md; only pass when delivery + routing + back-stack + privacy + both-client are all verified. Missed delivery or wrong deep-link = P1; private content in any payload = P0.

Pass F — Resilience, concurrency, lifecycle & time (cross-cutting; a 2-user realtime app needs these)

Concurrency / realtime races (two partners at once): both answer the daily question simultaneously; both start/join the same game; both swipe a date / react at once; one quits while the other submits; both tap a notification at once; partner acts while you're mid-flow. No lost writes, no stuck state, no duplicate sessions, reveal still correct. (This is where a couples app breaks.)
Lifecycle / process death: background mid-flow + return; force-kill the app and relaunch (Android may kill the process) — state/auth/draft restore sanely; deep-link/notification after process death still loads (verified for chat — extend to all). Rotation/config-change doesn't lose Compose state. Low-memory.
Network resilience: offline / flaky / airplane mid-action across answers, games, dates (not just chat media) — graceful failure + retry/queue, no crash, no silent data loss, recovery on reconnect.
Idempotency / rapid input: double-tap send/submit, rapid nav, double-start, double-join, repeated paywall-unlock taps — guarded (no double-send, no duplicate session, no crash).
Time-dependent behavior: daily-question rollover (6 PM CST assignment), streak day-boundary + repair window, capsule unlock times, reminder schedules, challenge-day availability, timezone change — test across a date change (manipulate device clock / trigger functions).
Account/couple lifecycle: brand-new (empty) account; unpaired state; pair → unpair → re-pair; partner leaves mid-session; account deletion cascade; same account on two devices; stale notifications after unpair/delete are graceful; invite accepted while already paired is rejected cleanly. No orphaned/broken state.
Install/update/migration lifecycle: fresh install, update over an existing signed-in install, app data retained, Room/DataStore/SharedPreferences migrations, notification channel migration, cached encryption/key material, pending deep links/notifications across update, and version-skew between partners if one device updates first. No sign-out loops, stale build routing, lost local state, broken permissions, or migration crashes.
Crash reporting: confirm crashes/ANRs are actually captured (Crashlytics) so field issues surface.

Pass H — Branding & artwork (every screen: could it carry more of the brand? where would art help?)

A consumer-mindset pass focused on brand presence and delight, not defects. Walk every screen and surface and ask: does this feel like Closer (private, warm, equal, intentional — a ritual for two)? Could brand color, the heart mark, a brand message, or an illustration make it warmer or clearer without clutter? Output is artwork descriptions written as ready-to-paste ChatGPT image-generation prompts — the user generates the images; we only describe them.

Existing art integration check: judge the art as part of the whole page, not as a standalone asset. Confirm each image supports the screen's job, aligns with the surrounding typography/actions, has enough breathing room, and uses the right light/dark treatment. Art that looks generic, unfinished, randomly placed, or visually disconnected is a finding even if the bitmap itself is technically valid.
First, lock the house style (do this once per round, refresh if the art evolved): read docs/brand/visual-identity.md
- docs/brand/asset-system.md AND open 2–3 existing illustrations (illustration_couple_onboarding, illustration_reveal_celebration, pack_art_*) to capture the actual look. New screens/features since the last brand review must be folded in. Keep the canonical house-style prompt prefix + palette in the branding deliverable (ClaudeBrandingReview.md) so every prompt reuses it and all generated art matches the existing artwork.
House style (must hold for every prompt): flat 2D pastel vector illustration; soft rounded shapes, no harsh outlines, gentle gradients; palette aubergine #24122F / deep purple #56306F / lavender #B98AF4 / soft pink #F7C8E4 / soft lavender #D9B8FF / blush white #FFF8FC; motifs = two-equal-halves heart, paired/sealed cards, floating hearts + petals, candle/mug/lavender-sprig warmth, moon/quiet-hours, calendar/date-card, capsule; mood = warm, quiet, equal, intentional. Couple figures balanced + inclusive, faces simple. Never show readable answer/ prompt/message text, invite codes, emails, dating-app clichés, stock photos, alarm/urgency/surveillance imagery.
Per screen, decide the brand opportunity (pick the lightest that fits — don't over-decorate):
- none needed (already on-brand, or a dense list/form where art would clutter) — say so;
- color/typographic brand touch (palette, heart mark, a rotating privacy message);
- small glyph (brand glyph for a relationship concept — describe it for the glyph set);
- hero/empty-state/celebration illustration (the high-value case → write the full ChatGPT prompt).
Each artwork item records: screen/route · placement (hero / empty / header / card / celebration) · why it helps · filename to match the existing scheme (illustration_*, pack_art_*, glyph_*, particle_*) · the ChatGPT prompt (house-style prefix + the specific scene) · aspect ratio/size + light/dark behavior. Cross-check the brand doc's "Needed additions" / empty-state list and mark which already have assets vs still need art (e.g. Android may still lack illustrations that iOS has).
Prioritize the screens a user feels most: onboarding/pairing, Home, paywall/subscription, reveal/celebration, empty states (no messages/dates/capsules/history), Memory Lane, Connection Challenges, date match, quiet-hours.
Branding defects (mis-colored, clipped, off-brand, low-contrast art) are bugs → ClaudeReport.md. Pure "works but could be warmer / a feature idea" → Future.md ## QA. New art to create → ClaudeBrandingReview.md.

Pass I — Performance & route efficiency (jank, redundant reads, caching) [FUTURE.md P14]

Before store polish, profile every top route and every high-cardinality list for jank, repeated Firestore reads, missing cache use, and slow navigation. Drive each route as a user and instrument reads/frames.

Frame / jank: scroll every long list (Messages inbox + conversation, Answer History, Question Packs, Past Games, Wheel History, Bucket List, Date deck, Activity/Progress) and open every top route while watching adb shell dumpsys gfxinfo <pkg> framestats (or Perfetto / Studio Profiler) — flag dropped/janky frames, slow first frame, and Choreographer: Skipped N frames / main-thread stalls in logcat. Transitions/animations stay smooth (~60fps).
Redundant Firestore / network reads: count listeners/gets per screen. Switching bottom tabs and returning must not refetch unchanged data; opening a screen twice must not double-read; snapshot listeners detach on leave (no leaked/stacked listeners — a 2-user realtime app accumulates these fast). Watch for N+1 reads on lists.
Caching / lazy-load: static question/category data is cached locally (Room) and not re-fetched each entry; large lists use lazy paging (LazyColumn/paging, not load-all); images cached (Coil); offline reads serve from cache.
Latency: measure cold-start-to-interactive (splash→loader→Home) and tab/route transition latency; flag anything perceptibly slow (>~300ms).
Deliverable: a reusable route smoke-test checklist (every top route × {load time · jank · read count}), captured as a runnable script so each round re-checks cheaply.
Remediation when found: lazy-load/page large lists; cache local question/category data; dedupe + scope snapshot listeners; skip redundant fetches on tab switches; add skeleton/loading states (cf. FUTURE.md P8) over blocking spinners.
Findings: real jank/leak/redundant-read = bug → ClaudeReport.md (P2; P1 if it ANRs or leaks listeners, P0 if it drops data); "could be smoother / add skeletons" → Future.md ## QA.

Every primary flow must be usable with accessibility settings on. Enable each setting and walk the core flows (auth, onboarding, pairing, Home, a full game, daily question + reveal, Messages, Paywall, Settings) end to end. This is the deep home for a11y; the Pass C contrast/font spot-checks feed into it.

Font scaling: adb shell settings put system font_scale 1.3 (then 1.5, 2.0) — every primary flow stays usable: no clipped/overlapping text, no cut-off or hidden buttons/actions (scroll where needed). Acceptance: all primary flows usable at increased font scale without clipped buttons or hidden actions. Restore font_scale 1.0 after.
Screen reader (TalkBack): every interactive element has a meaningful semantics/contentDescription (icon-buttons especially: back, send, like, close, the brand-mark loader, game option cards); decorative images are silenced (clearAndSetSemantics {} / null desc); reading order is logical; no unlabeled "Button"; custom controls (spin wheel, date swipe deck, answer cards) are operable + announced; no focus traps.
Contrast: body text + essential icons meet WCAG AA (4.5:1 body / 3:1 large) in both themes — measure, don't eyeball; re-check the known dim spots (game answer text, muted captions, the C-DS-001 area).
Touch targets: interactive targets ≥ 48dp (icon buttons, chips, nav, close/back, reaction buttons, swipe-deck actions). Flag anything smaller.
Keyboard / external input: with a hardware keyboard, forms (sign-up, message, capsule, profile) tab in a sane order, IME/Enter actions work, focus is visible, no traps.
Reduce-motion: with "Remove animations" (adb shell settings put global animator_duration_scale 0), the loader, celebration particles, reveals, splash handoff, and transitions degrade gracefully and no motion-gated content becomes unreachable (the loader/particles already honor this — verify everywhere). Restore to 1 after.
Remediation: add semantics labels, raise touch targets, fix contrast tokens, guard motion behind the reduce-motion flag.
Findings: missing label / clipped-at-large-font / sub-48dp / failing contrast = bug → ClaudeReport.md (P2; P1 if it blocks a primary flow for assistive-tech users); polish → Future.md ## QA.

Reporting → ClaudeReport.md (living QA report)

Header: date, build, devices, round number + run-state header.
One section per pass (A–J), each a table: ID | Area | Screen/Route | Mode | Severity | Description | Repro | Evidence | Suggested fix | Status.
Summary: counts by severity. Report only during passes — no fixes recorded until the fix phase.

Report hygiene — keep it CLEAN, lean, and never dangling (the report is a current-state doc, not an archive)

The report's job is to show, at a glance, what's wrong right now — not to accumulate a history of everything ever fixed. Stale fixed rows and stacked old run-states make it unreadable and hide the real signal. So:

A Fixed row survives exactly ONE confirmation round, then it's removed. When you fix an issue, mark its row Fixed (with the commit) and keep it through the next re-QA round. Once that round re-verifies it, delete the row — the full root-cause/fix detail already lives in the commit message (the row cites the hash), so nothing is lost. Don't carry confirmed-fixed issues across multiple rounds.
One run-state header, always. Keep only the current Round N | Pass X | Chunk Y | NEXT ACTION block pinned at the top. Don't stack prior rounds' headers — collapse finished rounds into at most a single one-line history entry each (e.g. R6: branding regression — 0 new), or drop them entirely once their fixes are confirmed-and-pruned.
Open issues first; resolved issues compact. Order every pass section open (P0→P3) on top; keep a short Resolved & confirmed (archived — detail in git) line listing only the IDs of older fixed-and-verified issues (not their tables). The big per-issue tables exist only for currently-open and fixed-this-round-pending-confirm issues.
Severity board reflects NOW. One board, current counts; Open is the number that actually matters. When Open hits 0 at every level, the report should be short — current run-state, a 0/0 board, the archived-ID line, and the operational constants (devices/accounts, standing-auth, playbook pointers). If it's long while everything is fixed, it needs pruning.

Coverage-matrix hygiene (`ClaudeQACoverage.md` — a current-status matrix, not a per-round changelog)

Flip, don't stack. When a fix is confirmed, change that row's fail→id to pass and move the ID to an archived line — never leave a confirmed-fixed fail→id dangling, and never keep a contradicting "still owed" note next to a completed row.
One status per cell, current. Each screen/feature/game/notification shows its latest status only; collapse prior rounds' narration into a single one-line round history. Keep an at-a-glance pass-status table at the top.
Keep the resume signal sharp. What a returning session needs is what's left — surface todo/deferred/ blocked items plainly; don't bury them under superseded prose.

Extremely-easy-to-read mandate (applies to ClaudeReport.md, ClaudeQACoverage.md, and Future.md)

Optimize every QA doc for a reader who has 5 seconds to find the current state:

Lead with the answer. Top of the file = current round + the one-line verdict (e.g. "0 open P0–P3; security clean") before any detail.
Tables over prose for issues; short rows. Put long root-cause analysis in the commit, not the row — the row gets a one-sentence description + repro, then the commit hash.
No walls of text. Break run-state into scannable lines; bold the few words that matter; no multi-paragraph headers. If a paragraph is longer than ~3 lines, it's probably commit material, not report material.
Consistent shape every round so a returning reader (or a post-compaction resume) finds things in the same place.

Fix phase (only AFTER all passes of the round complete)

Work strictly by severity: all P0 → P1 → P2 → P3.
One issue at a time: implement → ./gradlew :app:assembleDebug → install both → verify THAT fix live (correct device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, content still ciphertext in Firestore) → flip its row to Fixed + commit (one per issue/cluster) → next. Don't start the next until the current is verified.
Couple-shared premium fix: replace direct isPremium() gates with CouplePremiumChecker.coupleHasPremium(partnerId) in every gated VM/screen (partner-entitlement read rule deployed). High regression risk — re-verify each feature in BOTH self-premium and free states.
Gated actions (entitlement toggles, deploys) are user-authorized per occurrence.
New issues found while fixing are logged (new ID), not silently fixed beyond scope — next re-QA round catches them.

Definition of done: a pass is done when every coverage row is pass/fail→id/not implemented→Future.md/ blocked→id; a round is done when all passes (A–J) are done; flawless = one full round with zero open P0–P2 and Passes D + E fully clean (no open P0/P1 in I/J), every game fully played through, every notification type verified or explicitly not implemented→Future.md, all join-game navigation paths and all back-stack checks verified. Then stop (P3s optional). Don't re-open a clean pass within the same round.

Re-QA loop (until flawless)

After the fix phase, re-run Pass A–J (regression + confirm fixes). Repeat fix → re-QA rounds until a full round yields zero P0–P2 and Passes D+E fully clean.

Prune on confirmation (Report hygiene): the moment a re-QA round re-verifies a Fixed issue, delete its row from ClaudeReport.md (move its ID to the compact Resolved & confirmed (archived — detail in git) line) and collapse that finished round's run-state header. A fixed issue lives in the report for one confirmation round only — never let confirmed-fixed rows or old run-states accumulate. See Report hygiene under Reporting.

66 KiB Raw Blame History Unescape Escape