Closer/ClaudeQAPlan.md

20 KiB
Raw Blame History

Claude QA Playbook — Full-App QA → Fix → Re-QA until flawless

Reusable QA plan for the Closer app. Run report-only first, fix everything, then re-QA until a clean round. Progress/state is tracked in ClaudeReport.md (issues) + ClaudeQACoverage.md (coverage matrix), which are the authoritative source of truth. See the Continuity section before resuming.

Program roadmap: Part 1 = Android QA (this doc) → Part 2 = build the iOS app to Android's current parity → Part 3 = run these same passes on iOS + a cross-platform (Android↔iOS) pass. Parts 2 & 3 live in ClaudeiOSPlan.md (note: iOS build/run/QA requires macOS — not possible from this Linux box).

Context

Drive the real app on both emulators, verify each thing live, report, fix, re-verify. Five QA dimensions:

  1. Couple-shared premium — if EITHER partner is premium, all premium features unlock for both.
  2. Games — each starts, plays, finishes correctly on both devices.
  3. Full visual pass, light + dark — every screen, text readable, nothing clipped/invisible.
  4. Security & encryption (cornerstone) — every private field is ciphertext at rest, rules hold against non-members, keys/recovery are sound. Findings here default to P0.
  5. Notifications — all 17 types deliver to the right partner (foreground/background/killed), deep-link correctly, and leak no private content.

Scope decisions: exhaustive visual pass (all ~50 screens, both modes); full scope incl. pre-pairing flows (fresh throwaway account); couple-shared everywhere — per-user gates are bugs, fixed by routing through core/billing/CouplePremiumChecker.kt.

Early known signal: only chat uses CouplePremiumChecker; games/packs/dates/wheel gate on the user's own EntitlementChecker.isPremium() — so premium almost certainly does NOT unlock for the free partner there. Pass A confirms + enumerates this; the fix phase applies couple-shared everywhere.

Execution mode — run to completion (autonomous; do NOT stop)

  • Do not stop to check in or ask for approval. Run all five passes → the fix phase → re-QA rounds continuously until a flawless round (zero open P0P2, Passes D + E clean, every game fully played through, navigation/back-stack verified). Don't hand control back early.
  • Unblock yourself: if anything blocks progress (a stale/blocking session, a crash, a build break, a missing prerequisite state, a broken nav path that prevents reaching a screen), fix it immediately and continue — even though passes are otherwise report-only. Blocking issues are fixed inline so the run can proceed; non-blocking findings are still logged and fixed in the fix phase.
  • "Once executed, complete it": never declare done before the Definition of Done is met — keep cycling fix → re-QA until flawless, then stop.
  • Context limits ≠ stopping. If a context window fills, that's a checkpoint, not a stop: make sure the run-state header + MD files are current and committed; the resume command continues automatically. Keep the loop alive across sessions until flawless.
  • Don't pause for "by-design vs bug": log the ambiguous finding and keep going (don't unilaterally rewrite deliberate design — the log captures it). Never halt the run to ask.
  • Only true stop = a gated action you cannot perform. Production deploys, admin Firestore writes/seeds, and entitlement toggles still need per-occurrence authorization (the classifier enforces this regardless of this doc). If one is genuinely required to proceed and is denied, do all other work first, then surface only that single blocker — don't halt the whole run for it.

Methodology (every pass)

  • Devices: 5554 (QA), 5556 (Sam), paired; one fresh throwaway account for pre-pairing flows.
  • Drive via adb tap/swipe; resolve coords from uiautomator dump bounds; downscale screenshots to read; scan logcat for FATAL EXCEPTION/ANR on each screen.
  • Premium toggled via scratchpad/set_premium.js (admin, user-authorized each time).
  • Theme toggled via Settings → Appearance (Light/Dark) (MainActivity ThemeMode).
  • REPORT-ONLY during passes — never fix mid-pass.
  • Environment (senior-QA rec): prefer the Firebase Local Emulator Suite or a dedicated staging project over production — isolates test data, makes seeding / entitlement toggles / D3 negative tests free (no gated prod writes), and avoids polluting real users. Caveat: App Check, RevenueCat IAP, and real FCM/APNs push need real services — run those against staging/prod with test accounts. (We've been on prod with test accounts — works, but every seed/toggle/deploy hits the gate.)
  • Device/OS matrix: don't certify on one emulator only — cover minSdk + targetSdk, a small and a large screen, and at least one physical device (App Check / Play Integrity behave differently on emulators).
  • Automate the regression smoke: capture the smoke checklist as a runnable script (adb/Maestro) so every round re-checks it cheaply instead of by hand.
  • Test-data hygiene: keep known test accounts; clean up artifacts (stray messages/reactions/sessions) between rounds so they don't masquerade as bugs.

Continuity & resumability (this effort WILL span many context windows — don't lose state)

State lives in files, not memory:

  • ClaudeReport.md = the issue log (committed). Each issue row is self-contained in text (repro + expected
    • actual) — screenshots are session-only and won't survive a compaction; never rely on a screenshot path alone.
  • ClaudeQACoverage.md = the coverage matrix: every screen×mode, feature×premium-state, game×lifecycle, notification×{foreground,background,killed}, each todo | pass | fail(→issue id). The resume anchor.
  • Persistent memory (memory/): QA methodology + exact commands; emulator↔account↔coupleId mapping; scratchpad/set_premium.js + admin tooling; the couple-shared-premium-everywhere goal + the per-user-gate gap.
  • Run-state header pinned at the TOP of ClaudeReport.md, always current: Round N | Pass X | Chunk Y | NEXT ACTION: … — first thing to read, last thing to update before stopping.
  • Stable issue IDs: A-001 / B-002 / C-… / D-… / E-… (pass-letter + number); coverage references the ID for every fail. Never renumber or reuse.
  • Source of truth: the two MD files are authoritative; the TodoWrite list is scratch for the current chunk only. Update the MD files + run-state header before ending a session.
  • Commit cadence: commit ClaudeReport.md + ClaudeQACoverage.md after each pass and each chunk.
  • Chunking: run small chunks (Pass C one screen-group; Pass A one feature), checkpoint after each.
  • Session-start ritual: (1) read run-state header + both MD files; (2) adb devices shows both emulators online; (3) installed build == current HEAD (rebuild+reinstall if unsure — never QA a stale APK); (4) continue at the first todo / unverified-fix.

Guardrails & efficiency

  • Never pm clear / wipe app data — breaks the App Check debug token. Pre-pairing QA: sign-out → fresh sign-up.
  • Never run seed/build_db.py. Admin seeds/writes, entitlement toggles, and any deploys are user-authorized per occurrence.
  • By-design vs bug: if a finding may be intended behavior, log it and keep going (don't stop to ask; don't unilaterally rewrite deliberate design — the log captures it).
  • Pass C parallelism: set 5554 = Dark, 5556 = Light to capture both themes at once.
  • Never log decrypted message/answer content.

Severity scale (label every issue)

  • P0 Critical — crash/ANR, data loss, encryption/security leak, feature fully broken, premium bypass.
  • P1 Major — feature partly broken, premium not unlocking for partner, wrong/missing notification, dead-end nav.
  • P2 Minor — readability/contrast, clipping/overflow/truncation, theme not adapting, inconsistent styling.
  • P3 Polish — spacing/alignment/copy nits.

QA passes (Round 1 = baseline)

Pass A — Couple-shared premium (target: either partner premium → both unlock)

Test each gated feature in 3 states: neither premium → locked + paywall; partner-only premium → BOTH unlock; self premium → unlock. Toggle Sam premium, confirm QA (free) unlocks; toggle off. Features: Play-hub games (Desire Sync + any premium-badged), Connection Challenges, Memory Lane; Question Packs; Spin the Wheel / Category Picker / Wheel History; Date Match / Plan Date / Date Builder; chat media + reactions (regression — already couple-shared); Subscription/Settings reflects entitlement. Gated files (for the fix): ui/play/PlayHubViewModel, ui/desiresync/DesireSyncScreen, ui/wheel/{CategoryPicker,SpinWheel,WheelHistory}*, ui/questions/QuestionPackLibrary*, ui/dates/{DateMatch,DateMatches}Screen, ui/memorylane/MemoryLaneScreen, ui/challenges/ConnectionChallengesScreen.

Pass B — Games lifecycle (MANDATORY: play each game ONE complete time through)

Games: This or That, How Well Do You Know Me, Desire Sync, Connection Challenges, Memory Lane, Spin the Wheel, + Date Match.

  • A launch/crash check is NOT sufficient. Each game MUST be played one full way through, end-to-end, on BOTH devices — start → answer/interact through every step/round/question on each device → reach the finish/reveal/results screen → confirm the result renders correctly for both partners. Verify each intermediate screen and interaction works (selections register, progress advances, both-answered gating, reveal/scoring/summary correct). Premium games (Desire Sync, Memory Lane) need a premium toggle to play.
  • The session lifecycle is exercised by the real playthrough: status active→completed; reveal/results correct on both.
  • Edges: re-open a completed session, leave mid-game (resume), no stuck session, no crash, logcat clean.
  • Game start/finish pushes (onGameSessionUpdate) exercised here; full delivery/deep-link audit in Pass E.
  • Media permissions (CAMERA, RECORD_AUDIO): granted works, denied degrades gracefully.
  • Done = every game has one verified complete playthrough (a launch-only "opens, no crash" row is partial, not pass).

Pass C — Visual pass, light + dark, ALL screens

Every route in core/navigation/AppRoute.kt (~50), in both modes: text contrast/readability (no invisible/ low-contrast), no clipping/overflow/ellipsis breakage, icons visible, backgrounds adapt, controls legible. Groups: auth/onboarding/pairing (fresh acct); Home (solo + paired); Play + every game; Today + reveal/history; Messages (inbox + conversation); Packs; Dates (Match/Builder/Matches/Bucket List); Wheel (picker/session/complete/history); Settings + all sub-pages (Account, Notifications, Appearance, Privacy, Subscription, Relationship, Security, Delete Account); Paywall; Your Progress/Activity; Recovery.

  • Probe: ui/theme/Theme.kt hardcoded brand colors + chat's custom closerBackgroundBrush — verify dark mode truly adapts; grep screens for hardcoded Color(0x...).
  • States, not just happy path: empty / loading / error / not-paired where they exist; many need data setup (seeding is user-gated) — note unreachable states in coverage rather than skipping silently.
  • Readability at scale: default font size + spot-check largest system font scale on text-heavy screens.
  • Navigation from every entry point: reach each screen from all the places that link to it and confirm it opens correctly each time — e.g. a conversation from the inbox AND from "Discuss" AND from a notification; a game from the Play hub AND from a notification; Paywall from each gated feature; Settings sub-pages; reveal from Today AND from history AND from partner_answered. A screen that works from one entry but breaks/duplicates from another = bug.
  • Back-stack / "double back": from every entry point, system back AND the in-app back arrow return to the correct previous screen — no dead-ends, no exiting the app unexpectedly, and no screen that requires pressing back twice (duplicate/stacked destinations on the back stack = bug). Bottom-tab reselection and deep-link/ notification entries must land with a sane back stack (back → Home, not off the app or a blank screen). Wrong/ double back or a dead-end = P2 (P1 if it traps the user).
  • D1 At-rest coverage: admin-read RAW docs/objects, assert ciphertext for every private type — chat text + lastMessagePreview (enc:v1:), chat media bytes (Tink 01 69 59 51 f0…), answers (sealed:v1:/enc:v1:), date plans + date_swipes, Memory Lane capsules, Bucket List. Also: wrappedCoupleKey + recovery material never plaintext; invite code (KDF seed) never stored raw; no push payload carries private content.
  • D2 Rules audit (static): member-only reads, author/server-only writes, ciphertext enforced on every private field, immutability, no premium self-grant, entitlements write:false; re-audit conversations/typing/reactions
    • entitlement partner-read; no catch-all match /{document=**}; list/query not enumerable; get()-rules don't over-expose; no legacy plaintext/downgrade path (coupleEncryptionEnabled holds; no disabled-encryption branch).
  • D3 Negative access tests: a non-member account is denied reading messages/answers/dates/entitlements, writing plaintext to encrypted fields, self-granting premium, cross-couple access (live rules or rules-emulator).
  • D4 Key exchange / management / recovery (E2EE crux): couple key client-generated, only leaves device wrapped (KDF from invite seed; server holds only wrappedCoupleKey+kdfSalt/kdfParams+encryptedRecoveryPhrase); KDF strength; Tink AEAD = AES-GCM/256 with AAD=coupleId, no weak/custom crypto/nonce reuse; keybox/sealed/commitment integrity; recovery-wrap server-blind; unpair revokes decrypt; invites CSPRNG + single-use + expiry.
  • D5 App Check / Functions / secrets: App Check enforced; callables validate auth+membership; webhook authenticity; admin-only writes rejected from clients; service-account JSONs never committed; no plaintext/secrets in logcat; temp files deleted.
  • D6 Leak vectors: no private content in analytics/crash; allowBackup=false + backup rules exclude sensitive data; deep links re-check membership; clipboard user-initiated; consider FLAG_SECURE; repo scan for committed secrets.
  • D7 Encryption migration: test the encryptionVersion paths (0 plaintext → 1 migrating → 2 strict) on a legacy couple — migration completes without exposing plaintext or losing/garbling old content, and a half-migrated couple is safe (no mixed read failures, no downgrade). This is the riskiest data path for existing users.

For each: trigger fires → delivered to the right partner (never self) → in foreground/background/killed → correct channel + copy with no private contenttap opens exactly the right item (loaded, not generic Home/ dead-end) → no duplicates → rate limiter (20/day,100/week) doesn't drop legit ones. Inventory (type → trigger → destination), all 17: chat_message(onMessageWritten→conversation, foreground→chat-head bubble), partner_started_game/partner_finished_game(onGameSessionUpdate→game/results), partner_answered (onAnswerWritten→reveal), daily_question(assignDailyQuestion)/daily_question_reminder/daily_reminder (dailyQuestionReminder→Today), date_match(createDateMatch→match), partner_joined+invite_created (acceptInviteCallable→pairing/home), partner_left(onCoupleLeave)/partner_deleted_account(onUserDelete→home/ relationship settings), memory_capsule_unlocked(scheduled→capsule), challenge_day_ready(→Connection Challenges), outcome_reminder(scheduledOutcomesReminder), reengagement(reengagement/gameRetention), gentle_reminder (sendGentleReminderCallable), spki(identify + confirm handled).

  • Tap-to-open: every notification opens the specific item from foreground/background/killed; tapping in-app doesn't stack/duplicate; logged-out/unpaired tap is graceful. Wrong/dead destination = P1.
  • Scheduled/time-based: trigger manually (invoke callable/function or seed due condition — user-gated).
  • Foundations: FCM token registration on sign-in (TokenRegistrar) + onNewToken; POST_NOTIFICATIONS prompt + denied path; channels (di/NotificationModule); deep-link routing (MainActivity.deepLinkRouteFromIntentAppNavigation); foreground/background split (core/notifications/AppMessagingService).
  • Build a delivery matrix (type × {foreground,background,killed}) in ClaudeQACoverage.md. Missed delivery or wrong deep-link = P1; private content in any payload = P0.

Pass F — Resilience, concurrency, lifecycle & time (cross-cutting; a 2-user realtime app needs these)

  • Concurrency / realtime races (two partners at once): both answer the daily question simultaneously; both start a game / swipe a date / react at the same time; partner acts while you're mid-flow. No lost writes, no stuck state, no duplicate sessions, reveal still correct. (This is where a couples app breaks.)
  • Lifecycle / process death: background mid-flow + return; force-kill the app and relaunch (Android may kill the process) — state/auth/draft restore sanely; deep-link/notification after process death still loads (verified for chat — extend to all). Rotation/config-change doesn't lose Compose state. Low-memory.
  • Network resilience: offline / flaky / airplane mid-action across answers, games, dates (not just chat media) — graceful failure + retry/queue, no crash, no silent data loss, recovery on reconnect.
  • Idempotency / rapid input: double-tap send/submit, rapid nav, double-start — guarded (no double-send, no crash).
  • Time-dependent behavior: daily-question rollover (6 PM CST assignment), streak day-boundary + repair window, capsule unlock times, reminder schedules — test across a date change (manipulate device clock / trigger functions).
  • Account/couple lifecycle: brand-new (empty) account; unpaired state; pair → unpair → re-pair; partner leaves mid-session; account deletion cascade; same account on two devices. No orphaned/broken state.
  • Crash reporting: confirm crashes/ANRs are actually captured (Crashlytics) so field issues surface.

Reporting → ClaudeReport.md (living QA report)

  • Header: date, build, devices, round number + run-state header.
  • One section per pass (A/B/C/D/E/F), each a table: ID | Area | Screen/Route | Mode | Severity | Description | Repro | Evidence | Suggested fix | Status.
  • Summary: counts by severity. Report only during passes — no fixes recorded until the fix phase.

Fix phase (only AFTER all passes of the round complete)

  • Work strictly by severity: all P0 → P1 → P2 → P3.
  • One issue at a time: implement → ./gradlew :app:assembleDebug → install both → verify THAT fix live (correct device/theme) + regression smoke (launch/no-crash, send text, inbox loads, a game opens, content still ciphertext in Firestore) → flip its row to Fixed + commit (one per issue/cluster) → next. Don't start the next until the current is verified.
  • Couple-shared premium fix: replace direct isPremium() gates with CouplePremiumChecker.coupleHasPremium(partnerId) in every gated VM/screen (partner-entitlement read rule deployed). High regression risk — re-verify each feature in BOTH self-premium and free states.
  • Gated actions (entitlement toggles, deploys) are user-authorized per occurrence.
  • New issues found while fixing are logged (new ID), not silently fixed beyond scope — next re-QA round catches them.

Definition of done: a pass is done when every coverage row is pass/fail→id; a round is done when all five passes are done; flawless = one full round with zero open P0P2 and Passes D + E fully clean. Then stop (P3s optional). Don't re-open a clean pass within the same round.

Re-QA loop (until flawless)

After the fix phase, re-run Pass A/B/C/D/E/F (regression + confirm fixes). Repeat fix → re-QA rounds until a full round yields zero P0P2 and Passes D+E fully clean.