BillTracker/docs/QA_PLAN.md

41 KiB
Raw Blame History

BillTracker — Master QA Plan (living document)

Version target: v0.41.x · Executor: Claude (active) · Last updated: 2026-07-02 (Cycle 1: 13 findings fixed & archived, 0 open — incl. a broken "Send test push")

This is a living, operational QA document, not a static spec. Claude runs it, in batches, actively hunting for bugs/errors/rough edges, fixing them, and archiving each fixed finding to HISTORY.md. Update this document whenever a better approach, a new risk area, or a missed surface is discovered.

The prime directive: don't just confirm the happy path — try to break the product. Every batch should end with the tree green, the Findings Log up to date, and any fixes archived to HISTORY.md.


Table of contents

  1. Execution model — find, then fix, then repeat
  2. Batch plan & progress tracker
  3. Active Findings Log
  4. Archiving fixed findings to HISTORY.md
  5. Environment & setup
  6. Test data strategy
  7. Cross-cutting checks (every page)
  8. Batch playbooks (detailed checklists)
  9. Appendices

0. Execution model — find, then fix, then repeat

Separate finding from fixing. During a QA pass we hunt and log — we do not fix as we go (except show-stoppers, see below). Only after the whole plan has run do we enter a dedicated fix phase and fix every logged finding. Then we run the entire QA plan again from the top. Repeat until a full pass finds zero errors. Two nested loops:

  OUTER — QA CYCLE (repeat until a full pass finds zero findings)
  ┌──────────────────────────────────────────────────────────────────────┐
  │  PHASE 1 · FIND      Run every batch B0→B15 in find-only mode.        │
  │                      Probe hard, LOG everything to the Findings Log.  │
  │                      Do NOT fix (except show-stoppers).               │
  │            ↓                                                          │
  │  PHASE 2 · FIX       QA pass done. Now fix EVERY logged finding —     │
  │                      all of them (S1→IMP). Root-cause, with tests.    │
  │            ↓                                                          │
  │  PHASE 3 · VERIFY    Re-run each fix's repro; `npm run ci` green.     │
  │            ↓                                                          │
  │  PHASE 4 · ARCHIVE   Move every fixed finding to HISTORY.md (§3).     │
  │            ↓                                                          │
  │  PHASE 5 · RE-RUN    Start a new cycle at PHASE 1. If that full pass  │
  │                      logs zero findings → QA is clean, STOP.          │
  └──────────────────────────────────────────────────────────────────────┘

  INNER — per batch during PHASE 1 (find-only)
  PICK next ⬜ batch → SET UP (app, data state, role, console open) →
  PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
  mark batch status in §1 → next batch.  (No fixing here.)

Show-stopper exception. A show-stopper is a finding that blocks continued QA — the app won't boot, you can't log in, or a page crashes so hard you can't test the rest of it. Only these get fixed immediately (mid-pass), because you can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix; then continue the find pass. Everything else is logged and left for Phase 2 — no matter how tempting or trivial.

Discipline (for best results)

  • Phase 1 is log-only. Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
  • Keep each find batch tight and focused — one batch per session — so probing stays thorough.
  • Phase 2 fixes everything, not just S1/S2. Root-cause over surface patch; add/extend a test in tests/ or client/**/*.test.* for every logic bug so it can't silently return.
  • Never leave the repo red at the end of Phase 3 — npm run ci must be green before archiving.
  • Touch product behavior? Run the /verify skill on the affected flow before archiving.
  • The exit is empirical: you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you think it's clean. Log the cycle result in the Cycle Log each time.
  • Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.

1. Batch plan & progress tracker

Batches are ordered foundation-first (baseline & auth before features; features before cross-cutting; regression last). Update Status and Findings every run.

Status key: Not started · 🔄 In progress · Done (green, findings archived) · 🔁 Needs recheck

# Batch Primary surface Data state Status Open / Fixed
B0 Baseline, tooling & coverage recon npm run ci/check, app boots, console clean, re-scan routes/pages/API vs plan & update it, control census any 🔄 0 / 1
B-UI Design-system primitives each client/components/ui/* × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard any 🔄 0 / 0
B1 Auth & authorization login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation multi + single user 0 / 0
B2 Tracker (core) / buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift seeded + adversarial 0 / 0
B3 Bills & schedules /bills CRUD, custom schedules, reorder, merchant rules, historical import adversarial 0 / 0
B4 Subscriptions & Categories /subscriptions, catalog, /categories, groups, reorder seeded 0 / 0
B5 Reporting reconciliation /summary, /calendar, /analytics, /health cross-check totals seeded + large 🔄 0 / 3
B6 Spending /spending YNAB view, averages, cover-overspending, safe-to-spend seeded + edge months 🔄 0 / 1
B7 Debt planning (math) /snowball, /payoff APR/amortization vs hand-calc edge (APR=0, $0 debt) 🔄 0 / 2
B8 Banking & bank sync /bank-transactions, SimpleFIN sync, matching, merchant/store, advisory filter seeded txns 0 / 0
B9 Data lifecycle /data import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip empty + seeded 🔄 0 / 1
B10 Notifications & workers email + ntfy/Gotify/Discord/Telegram, reminders, cron workers seeded 🔄 0 / 1
B11 Admin panel users, login mode, auth methods, backups, cleanup, status, onboarding admin 🔄 0 / 0
B12 Settings, Profile & global UI /settings, /profile, static pages, command palette, sidebar/nav any 🔄 0 / 0
B13 API / backend direct all /api/*: auth, CSRF, validation, rate limits, error shape, IDOR, cents via HTTP client 🔄 0 / 1
B14 Non-functional a11y, performance, PWA/offline, XSS/secrets, timezone/DST large + adversarial 🔄 0 / 3
B15 Regression & sign-off full smoke on production build, exit criteria seeded 🔄 0 / 0

After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new cycle from B0 against the next build/version.

1.1 QA Cycle Log

One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A cycle is only "clean" when its find pass logged zero findings. Keep going until you get a clean cycle.

Cycle Started Build / commit Findings logged Fixed / archived Result
1 2026-07-02 bdbf231→(dev) 13 13 → all fixed, verified & archived (…, +B10-01 broken "Send test push") 0 open. Post seed-fix reconciliation caught the occurrence-gating family — Summary (S2), Analytics, and SimpleFIN bank-tracking all counted non-monthly bills every month; all fixed via resolveDueDate and guarded (probe reconciliation + tests/summaryBankTracking.test.js). Probed B0/B1/B3/B4/B5/B6/B7/B8/B9/B13/B14; solid: auth/isolation, CSRF, payment/date validation, recurrence, matching/dedup, subscription+spending math, XSS, calendar gating. A full re-run (B0→B15) is still required to declare the cycle clean per exit criteria.

Result key: 🔄 in progress · 🔁 findings fixed, re-run required · clean (zero findings — QA complete)


2. Active Findings Log

This is the live log. Record every finding here the moment it's found — before fixing. Keep only Open / Fixing / Fixed rows here. Once a finding is Fixed + verified + archived to HISTORY.md, delete its row from this table (its permanent record is the changelog entry).

Finding ID: QA-B{batch}-{nn} (e.g. QA-B2-01). Severity: S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see Appendix A). Status: 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive.

ID Sev Area (file:line) Summary Status Notes / repro
(none — all Cycle 1 findings fixed, verified & archived to HISTORY.md v0.41.0)

Finding template (paste a new row above; keep the full write-up here until archived):

ID: QA-B?-??
Severity: S1 / S2 / S3 / S4 / IMP
Environment: browser / viewport / theme / role / auth mode / data state
Area: file:line (if known)
Steps to reproduce:
  1.
  2.
Expected:
Actual:
Evidence: console / network / DB row / screenshot
Fix: (what changed, commit) — Verified by: (repro re-run + ci)

Log console errors, failed network requests, and unhandled rejections as findings even if the UI looks fine.

All Cycle 1 write-ups have been archived to HISTORY.md v0.41.0 (see §3).


3. Archiving fixed findings to HISTORY.md

HISTORY.md is the project changelog (version-organized, emoji section headers). When a finding is Fixed and verified, write a concise entry there, then remove the row from the Active Findings Log.

Where: under the current in-progress version heading (e.g. ## v0.41.x). If a QA cycle produces several fixes, group them under a ### 🐛 QA Fixes (bug fixes) or ### 🧹 QA (polish/improvements) section, matching the existing changelog voice.

Entry format (match the terse, specific style already in HISTORY.md):

### 🐛 QA Fixes

- **[Area] Short title** — What was wrong and the user-visible impact, then the
  fix. Reference the file/function and any migration or test added.
  (was QA-B7-03)

Rules

  • One bullet per finding; include the old QA-B?-?? id in parentheses for traceability.
  • If a fix added/changed a test, say which (tests/… or client/…test.*).
  • Don't archive until the fix is verified (repro gone + npm run ci green).
  • IMP items that were implemented are archived the same way; IMP items merely noted stay in the Findings Log (or graduate to FUTURE.md/roadmap.md if deferred).

4. Environment & setup

4.1 Running the app

Mode Command URL
Dev (API + UI, hot reload) npm run dev UI http://localhost:5173 (proxies API → :3000)
API only npm run dev:api http://localhost:3000
Production build npm run build then npm start http://localhost:3000
Docker docker-compose up per compose config
  • Backend: Node/Express on PORT (default 3000). Frontend dev: Vite on 5173.
  • Data: SQLite at db/bills.db (WAL). Back it up before destructive tests (backups/ or a manual copy). Prefer a scratch DB for B9/B11 restore tests.
  • Configure a dedicated test .env from .env.example. Never point tests at production data or a live SimpleFIN account with real credentials.
  • Test commands: npm run ci (check + all tests + build), npm run check (syntax + build), npm run test (server), npm run test:client (vitest).

4.2 Test matrix

Full functional pass across reasonable combinations; smoke (B15) across all.

Dimension Values
Browser Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser)
Viewport Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414
Theme Light, Dark, system-follow
Role user, admin, default admin (first-run)
Auth mode Multi-user, single-user
Density Normal + compact desktop
Network Online, Slow 3G, offline (PWA shell)
Data state Empty, seeded demo, large/stress, adversarial

4.3 Accounts to prepare

  • admin, user, a second user (data-isolation), a single-user-mode instance (separate DB).
  • Demo reference: guest / guest123 (do not run destructive flows on any shared demo server).

4.4 Automated E2E harness (Playwright)

Manual passes prove a button works once; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.

Command What it does
npm run test:e2e run the E2E suite headless (boots the app via webServer)
npm run test:e2e:ui Playwright UI mode — watch/debug interactively
npm run test:e2e:update re-baseline visual-regression screenshots (review the diff before committing)
npm run smoke:prod B15 production-build smoke — builds, boots node server.js (dist/), drives the real artifact so the split vendor chunks are validated at runtime
  • Setup (one-time): npm install then npx playwright install chromium. Config: playwright.config.js; specs in e2e/.
  • Scope: the suite is a thin critical-path smoke, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
  • Don't point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.

5. Test data strategy

  • Empty: brand-new account. Every page must render a sensible empty state — no crash, no NaN, no blank white screen.
  • Seeded: use Data → Seed Demo Data for a realistic mid-size dataset.
  • Large/stress: 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (@tanstack/react-virtual), charts, query perf.
  • Adversarial (deliberately try to break it):
    • Amounts: 0, 0.01, negative, 9,999,999.99, fractional cents.
    • Text: emoji, RTL, <script> XSS probe, 1,000-char strings, leading/trailing spaces, SQL-ish input.
    • Dates: 1st/14th/15th/31st boundaries; 28/29/30/31-day months; Feb 29; month/year crossing; inactive ranges; skipped months; overrides.
    • Transactions: duplicate amount+date, same-day merchant repeats, refunds/negatives.
    • Debt: APR 0%, very high APR, $0 balance, absurd inputs.
    • Non-UTC system timezone + a DST boundary date.

6. Cross-cutting checks (every page)

Run on every page during its batch — don't assume a shared component behaves the same everywhere.

Navigation & routing — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → NotFoundPage · active nav highlighted · simplefinOnly (Banking) gated · Ctrl+K palette finds & opens it.

Buttons & interactions — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · rapid repeated toggling (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then navigate away mid-flight doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).

Forms & validation — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · paste into every field (incl. "$1,234.56" into currency) · browser/password-manager autofill on login & forms · IME/composition (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).

Number inputs (you have ~45 type="number" fields — the highest-risk control type) — scroll-wheel over a focused field must not silently change the value · spinner up/down buttons step correctly and respect min/max · reject/e/+/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no NaN submitted · cents fields never accept >2 decimals.

Per-control state matrix — for each control on the page, verify every applicable state renders and behaves in both light and dark: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).

Note — "sliders": this app has no <input type=range> sliders. The SlidersHorizontal glyph is just the Bills filter-panel button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.

States — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, ErrorBoundary shows a fallback not a white page.

Visual & responsive — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.

Data integrity — money 2-decimals, no float artifacts (9.999999) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).


7. Batch playbooks (detailed checklists)

Each batch below is the detailed script for the matching row in §1. Apply §6 throughout.

B0 — Baseline, tooling & coverage recon

Run FIRST in every cycle. This is where the plan re-syncs with reality — new pages, routes, endpoints, or features added since the last cycle get discovered and folded in before testing, so coverage never silently rots.

Tooling baseline

  • npm run ci — record any failing server/client test or build error as a finding (S1/S2).
  • npm run check — server syntax + build clean.
  • App boots via npm run dev and production npm start; note startup warnings.
  • Load the app; browser console + server logs clean on first load and first navigation.
  • Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.

Coverage recon — enumerate the actual product and diff it against this plan. Run these, then compare the output to the batch playbooks (§7) and the route map:

  • Client routesgrep -nE "<Route" client/App.jsx — every path present here must appear in a batch playbook and Appendix C.
  • Pagesls client/pages/ — every page has an owning batch.
  • Sidebar / nav entriesgrep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx — new nav links (incl. conditional ones like simplefinOnly) are covered.
  • API route mountsgrep -nE "app.use\('/api" server.js — every mounted route group is in B13's list and mapped in Appendix C.
  • Services & componentsls services/ and ls client/components/**/ — new service/component families have a home in a playbook.
  • UI primitivesls client/components/ui/ — every shared primitive is covered by the B-UI playbook; a new primitive gets a row there.
  • Interactive-control census (makes "every button tested" provable) — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: Appendix E). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory: grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/components and grep -rn "onClick=" client/pages/<Page>.jsx.
  • Feature flags / conditional surfaces — search for Only, enabled, featureFlag, env gates that hide/show pages; ensure each state is tested.
  • What changed since last cycle — skim git log/HISTORY.md since the previous cycle's commit (see Cycle Log) for new features/pages.

Update the plan (do this now, not later) — for anything the recon surfaced that isn't already covered:

  • Add it to the relevant batch playbook (or create a new batch and a row in the §1 table).
  • Add/adjust its entry in Appendix C.
  • Note the plan update in the Cycle Log row for this cycle.
  • If a whole surface is missing from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.

B-UI — Design-system primitives

Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once. Drive them wherever they're already mounted (or a scratch page); run each against the per-control state matrix × light/dark × keyboard-only. One finding row per primitive.

Primitive (client/components/ui/) Must verify
button.jsx every variant (default/destructive/outline/ghost/link) + size; disabled truly blocks click; loading state; focus ring; Enter/Space activate
input.jsx text/number/password/date/search/file types; placeholder; disabled/read-only; error styling; paste/autofill; number-input rules above
select.jsx (Radix) opens by mouse and keyboard; type-ahead; long lists scroll; onChange fires in Firefox+Safari; disabled options; value persists; Esc closes
checkbox.jsx / switch.jsx toggles by click and Space; indeterminate (if used); disabled; label click toggles; controlled value round-trips
dialog.jsx / alert-dialog.jsx / confirm-dialog.jsx / input-dialog.jsx open/close; focus trap + restore; Esc closes; overlay click behaves; Cancel actually cancels (no side effect); Confirm fires once; scroll-lock releases
dropdown-menu.jsx keyboard arrow nav; Esc; submenu; disabled items; click-outside closes; no clipping at viewport edge
tabs.jsx arrow-key nav; active state; content swaps; deep-link/refresh keeps tab (if applicable)
tooltip.jsx hover and keyboard-focus show it; dismiss on blur; touch behavior; not a11y-only info trap
table.jsx header/zebra/hover; horizontal scroll on narrow viewport (no page h-scroll); empty state
collapsible.jsx expand/collapse animation; state persists; keyboard operable
sonner.jsx (toast) success/error/loading; stack + dismiss; auto-dismiss timing; doesn't cover primary actions; announced to SR
save-status.jsx idle/saving/saved/error transitions reflect real autosave (useAutoSave.test.jsx)
Skeleton.jsx matches final layout (no jump); no infinite skeleton on error
badge.jsx / card.jsx / separator.jsx / label.jsx contrast in dark mode; label htmlFor focuses its control; no overflow on long text
theme-toggle.jsx light↔dark↔system; applied before first paint (no flash); persists across reload
  • Every primitive above passes its row in light and dark, keyboard-only, at mobile width.
  • Axe scan (see B14) on a page densely using primitives → zero critical violations.

B1 — Auth & authorization

  • Password: valid login → correct landing (Tracker for user, /admin for default admin); wrong password → clear error, no user-enumeration timing/message difference; logout clears session; expired session redirects and preserves state.from; session persists across refresh.
  • Rate limiting: repeated failed logins throttled (loginLimiter/loginUsernameLimiter), clear message, resets.
  • TOTP: enroll (QR + secret), code accepted, backup codes work once, login prompts for TOTP, wrong code rejected+throttled, disable requires re-auth.
  • WebAuthn: register/login/remove passkey in Chrome, Firefox, Safari; password fallback works.
  • OIDC/Authentik: SSO flow creates/links account; admin config errors surface cleanly; oidcLimiter throttles.
  • Roles/guards: user blocked from /admin*, /status (redirect) and admin APIs (403); default admin forced to /admin; single-user bypass correct but admin surfaces still protected; unauth API → 401.
  • Data isolation (critical): user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
  • CSRF: state-changing request without a valid token → rejected.

B2 — Tracker (/)

  • Month nav (prev/next/jump), current month highlighted, data reloads per month.
  • Bills land in correct 114 / 1531 bucket by due date; pin-due sorting works.
  • Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
  • Skip excludes from totals for that month only; unskip restores.
  • Per-month amount override persists, doesn't affect base bill or other months.
  • Notes cell add/edit/clear persists per month.
  • Inactive/date-range bill doesn't show or count outside its range.
  • Balance/starting-amount cards period-aware + editable; income bills / safe-to-spend correct.
  • Overdue command center: accurate list/count, pay/skip actions work.
  • Cash flow card, drift insight, payment ledger (add/edit/delete reconciles), autopay suggestion apply/dismiss.
  • Editable cells autosave; Esc cancels; invalid input handled. Mobile rows equal desktop actions. Compact mode intact.

B3 — Bills (/bills)

  • Create with all fields (name, amount, due date, category, schedule, account, autopay, active range).
  • Edit propagates to Tracker/Summary/Calendar/Analytics; delete confirms + handles orphan payments/history.
  • Custom schedules (weekly/biweekly/monthly/quarterly/annual/custom): next-due & occurrences correct across month/year boundaries.
  • Drag reorder persists (cross-check billReorder.test.js); search/filter panel filters + clears; large-list virtualization smooth.
  • Merchant rules: create/matches/edit/delete; historical import dialog attributes month-crossing payments correctly.
  • BillModal open/close, validation, cancel discards unsaved changes.

B4 — Subscriptions & Categories

  • Subscriptions: add/edit/delete, active/cancelled, renewal & annual→monthly normalization; totals feed Tracker/Summary/Analytics.
  • Catalog: browse/search, add-from-catalog pre-fills.
  • Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (categoryGroups/categoryReorder tests); colors/icons consistent on Tracker/Spending/Analytics.

B5 — Reporting reconciliation

  • Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
  • Calendar plots bills/payments on correct days (timezone: a bill due on the 1st must not render on the 31st); day totals correct.
  • Analytics charts render with data AND empty (no broken SVG/NaN axes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK.
  • Health indicators compute from real data, no crash on empty; recommendations sane.

B6 — Spending (/spending)

  • Category-group view assigned/spent/available math correct; 3-month averages correct.
  • Cover-overspending reallocates funds correctly and is reversible.
  • Safe-to-spend matches Tracker (safeToSpend.test.js); month nav; empty/partial months handled.

B7 — Debt planning (/snowball, /payoff)

  • Add debts (balance/APR/min); snowball vs avalanche ordering correct.
  • Projection + amortization vs a hand-calculated example; APR=0 and already-paid debts correct.
  • Extra-payment/budget updates payoff date + total interest; chart renders; plan history saves/restores; status banner accurate.
  • Edge: single debt, many debts, $0 debt, negative/absurd inputs rejected.

B8 — Banking (/bank-transactions)

  • Ledger loads/virtualizes/filters (date/account/amount/merchant/status).
  • Transaction matching (match/unmatch), auto-match review approve/reject, no double-match (transactionMatchService.test.js).
  • Merchant/store matching rules + confidence/duplicates; advisory non-bill filter flags/hides with override.
  • Matched payments reflect on Tracker/ledger without double-counting; category picker persists.

B9 — Data lifecycle (/data)

  • Imports: spreadsheet (XLSX/CSV) map/preview/commit, malformed rejected, dup/partial handled; transaction CSV (csvTransactionImportService.test.js) dedupe + parsing; SQLite user import version-checked + confirms overwrite; seed demo data safe; import history lists + rollback.
  • Exports: download SQLite round-trips (export → fresh account → import → matches); Excel export opens uncorrupted; ICS calendar feed valid in a client AND properly token-gated (route mounts before auth — verify not open).
  • Backups: manual + scheduled restorable on a scratch instance; permissions not world-readable; old backups pruned (backupAndCleanup.test.js).

B10 — Notifications & workers

  • Each channel (email/SMTP, ntfy, Gotify, Discord, Telegram): test message delivers; bad token/URL → clear error, logged, no secret leak.
  • Reminders fire at configured lead time for upcoming/overdue; no duplicates; paid/skipped excluded; respects per-user prefs.
  • Workers: dailyWorker, bankSyncWorker (interval + guardrails), backupScheduler run on schedule; errors caught/logged, don't crash server, next run unblocked.

B11 — Admin panel (/admin)

  • Onboarding wizard completes without a broken state.
  • Users table: add/edit-role/reset-pw/disable/delete; cannot remove the last admin.
  • Login mode switch single↔multi verified live, no lockout; auth-methods enable/disable + bad config surfaced.
  • Email notif config + test send; bank sync admin (configure/manual/auto/status/revoke).
  • Backups create/list/download/restore/delete; cleanup panel previews impact + confirms (counts match backupAndCleanup.test.js).
  • Privacy admin edits reflect on public /privacy; system status metrics/versions/jobs accurate (statusService.test.js); admin actions rate-limited + audited (auditService — spot-check log).

B12 — Settings, Profile & global UI

  • Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
  • Profile: change password (current required, invalidates sessions), manage 2FA/passkeys, sessions revoke (profileRoute.test.js).
  • Static: About (public + admin, version shown), Privacy, Release Notes (dialog once per user, dismiss persists), Roadmap (admin), NotFound friendly + way home.
  • Global: command palette (Ctrl+K) search/keyboard/Esc, hidden for default admin; sidebar collapse/expand + mobile overlay (check overflow issue in docs/UI_IMPROVEMENTS.md); toasts stack/dismiss; page transitions no flash/double-fetch; theme applied before first paint.

B13 — API / backend direct

Route groups: auth, auth/oidc, admin, tracker, bills, subscriptions, payments, data-sources, transactions, matches, categories, settings, user, calendar, summary, monthly-starting-amounts, analytics, spending, snowball, notifications, status, about, about-admin, privacy, version, profile, export, import/imports.

  • Auth: unauth → 401, wrong role → 403, right role → 200.
  • CSRF: state-changing without valid token rejected; with token succeeds (middleware/csrf.js).
  • Validation: bad/missing body → structured 4xx (middleware/errorFormatter.js, utils/apiError.js), never a raw 500 stack.
  • IDOR/isolation: other user's resource by id → 403/404, no leak.
  • Rate limits: login/admin/export/import/OIDC limiters trigger + reset (middleware/rateLimiter.js).
  • Money in integer cents end-to-end (per docs/cents-migration-plan.md); API and DB agree; no float drift.
  • Idempotency: repeated create doesn't duplicate; concurrent edits resolve sanely.
  • Consistent error JSON + correct status codes; security headers present (middleware/securityHeaders.js); public routes (about/privacy/version/calendar feed) leak nothing sensitive.

B14 — Non-functional

  • a11y (manual): keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix aria-*); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only.
  • a11y (automated): run axe-core on every page (@axe-core/playwright, or jest-axe for component-level) — zero critical/serious violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once.
  • Visual regression: capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright toHaveScreenshot); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed.
  • Performance: initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query staleTime).
  • PWA/offline: installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
  • Security spot-checks: XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom MarkdownText renderer — https-only link hrefs, no dangerouslySetInnerHTML anywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookies HttpOnly/Secure/SameSite; encryptionService protects at-rest secrets, keys not committed. (Depth: SECURITY_AUDIT.md.)
  • Resilience: kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
  • Fault injection (systematic): with a request-interception harness (Playwright page.route, or DevTools network overrides), force each page's API calls to 401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON and confirm the UI shows a recoverable error (toast or ErrorBoundary fallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently.
  • Timezone/locale: non-UTC tz + DST boundary — due dates and calendar stay correct.

B15 — Regression & sign-off

Run on the production build (npm start), not dev:

  • npm run ci green. Log in as user and admin.
  • npm run test:e2e green (Playwright smoke + axe + visual-regression baselines match, §4.4).
  • Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
  • Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
  • Snowball: add debt → projection. Data: seed → export → import round-trip (scratch DB).
  • Admin: open panel, users, system status, run a backup. Banking loads + matches (if SimpleFIN configured).
  • Notifications: one test message on configured channel. Toggle dark mode; mobile viewport; Ctrl+K navigates.
  • Bogus URL → 404; logout → login redirect. Console clean throughout.
  • Confirm exit criteria.

8. Appendices

Appendix A — Severity definitions

Level Definition
S1 Critical Data loss/corruption, security hole, crash/blank page, wrong money math, cannot log in/save.
S2 Major Feature broken/unusable, wrong results, broken navigation, unhandled error.
S3 Minor Works but wrong edge behavior, confusing UX, missing validation message.
S4 Cosmetic Visual/copy/alignment/dark-mode-contrast, non-blocking.
IMP Improvement Not a bug; enhancement or polish idea.

Appendix B — Exit / sign-off criteria

A cycle is release-ready when:

  • All batches B0B15 on the primary matrix (Chrome desktop + mobile, light + dark, user + admin).
  • B15 smoke green on the production build.
  • Zero open S1/S2 in the Findings Log; S3/S4/IMP triaged.
  • npm run ci green; no new console errors.
  • Data export→import round-trip verified with no loss.
  • Auth/authorization + data-isolation all pass.
  • Money and date/period correctness verified vs hand-calculated examples.
  • All fixes for the cycle archived to HISTORY.md; cycle summary recorded (date, build/commit, environment).

Appendix C — Page ↔ route ↔ API quick map

Page Route Primary API
Tracker / /api/tracker, /api/bills, /api/payments, /api/monthly-starting-amounts
Calendar /calendar /api/calendar
Summary /summary /api/summary
Bills /bills /api/bills, /api/categories, /api/matches
Subscriptions / Catalog /subscriptions, /subscriptions/catalog /api/subscriptions
Categories /categories /api/categories
Health /health /api/analytics, /api/summary
Analytics /analytics /api/analytics
Spending /spending /api/spending
Banking /bank-transactions /api/transactions, /api/matches, /api/data-sources
Snowball / Payoff /snowball, /payoff /api/snowball
Settings /settings /api/settings, /api/notifications
Profile /profile /api/profile, /api/user
Data /data /api/import, /api/export, /api/data-sources
Admin /admin, /admin/status /api/admin, /api/status, /api/about-admin
About / Privacy / Release Notes / Roadmap /about, /privacy, /release-notes, /roadmap /api/about, /api/privacy, /api/version

Appendix D — Reference docs

SECURITY_AUDIT.md (security depth) · docs/UI_IMPROVEMENTS.md (known UI issues) · docs/cents-migration-plan.md (money-as-cents) · docs/SIMPLEFIN_CONSUMER_GUARDRAILS.md (sync limits) · docs/CSRF-SPA-Setup.md, docs/RATE_LIMITING_ENHANCEMENT.md (security middleware) · REVIEW.md, DEVELOPMENT_LOG.md, roadmap.md, FUTURE.md (context/known gaps) · HISTORY.md (changelog / fix archive) · playwright.config.js + e2e/ (automated E2E/visual/a11y harness, §4.4).

Appendix E — Per-page control census

The completeness ledger behind "every button, textbox, slider is right." Fill one table per page during B0 and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx + grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx.

Template (copy per page):

Control Type Expected action States checked (default/focus/disabled/error/loading) Keyboard Result
e.g. Quick-pay button button marks bill paid, updates balance cards, undo available default ✓ · disabled-while-saving ✓ Enter ✓ / finding id
e.g. Amount input number per-month override, cents only, no wheel-scroll change default ✓ · error-on-letters ✓ Tab/Esc ✓ / finding id

Pages to census (from client/pages/, keep in sync with Appendix C): Tracker, Calendar, Summary, Bills, Subscriptions, SubscriptionCatalog, Categories, Health, Analytics, Spending, Snowball, Payoff, BankTransactions, Data, Settings, Profile, Admin, Status, About, Privacy, ReleaseNotes, Roadmap, Login, NotFound — plus the shared Sidebar/command-palette/header chrome once.