41 KiB

Raw Blame History

BillTracker — Master QA Plan (living document)

Version target: v0.41.x · Executor: Claude (active) · Last updated: 2026-07-02

This is a living, operational QA document, not a static spec. Claude runs it, in batches, actively hunting for bugs/errors/rough edges, fixing them, and archiving each fixed finding to HISTORY.md. Update this document whenever a better approach, a new risk area, or a missed surface is discovered.

The prime directive: don't just confirm the happy path — try to break the product. Every batch should end with the tree green, the Findings Log up to date, and any fixes archived to HISTORY.md.

Execution model — find, then fix, then repeat
Batch plan & progress tracker
Active Findings Log
Archiving fixed findings to HISTORY.md
Environment & setup
Test data strategy
Cross-cutting checks (every page)
Batch playbooks (detailed checklists)
Appendices

0. Execution model — find, then fix, then repeat

Separate finding from fixing. During a QA pass we hunt and log — we do not fix as we go (except show-stoppers, see below). Only after the whole plan has run do we enter a dedicated fix phase and fix every logged finding. Then we run the entire QA plan again from the top. Repeat until a full pass finds zero errors. Two nested loops:

  OUTER — QA CYCLE (repeat until a full pass finds zero findings)
  ┌──────────────────────────────────────────────────────────────────────┐
  │  PHASE 1 · FIND      Run every batch B0→B15 in find-only mode.        │
  │                      Probe hard, LOG everything to the Findings Log.  │
  │                      Do NOT fix (except show-stoppers).               │
  │            ↓                                                          │
  │  PHASE 2 · FIX       QA pass done. Now fix EVERY logged finding —     │
  │                      all of them (S1→IMP). Root-cause, with tests.    │
  │            ↓                                                          │
  │  PHASE 3 · VERIFY    Re-run each fix's repro; `npm run ci` green.     │
  │            ↓                                                          │
  │  PHASE 4 · ARCHIVE   Move every fixed finding to HISTORY.md (§3).     │
  │            ↓                                                          │
  │  PHASE 5 · RE-RUN    Start a new cycle at PHASE 1. If that full pass  │
  │                      logs zero findings → QA is clean, STOP.          │
  └──────────────────────────────────────────────────────────────────────┘

  INNER — per batch during PHASE 1 (find-only)
  PICK next ⬜ batch → SET UP (app, data state, role, console open) →
  PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
  mark batch status in §1 → next batch.  (No fixing here.)

Show-stopper exception. A show-stopper is a finding that blocks continued QA — the app won't boot, you can't log in, or a page crashes so hard you can't test the rest of it. Only these get fixed immediately (mid-pass), because you can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix; then continue the find pass. Everything else is logged and left for Phase 2 — no matter how tempting or trivial.

Discipline (for best results)

Phase 1 is log-only. Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
Keep each find batch tight and focused — one batch per session — so probing stays thorough.
Phase 2 fixes everything, not just S1/S2. Root-cause over surface patch; add/extend a test in tests/ or client/**/*.test.* for every logic bug so it can't silently return.
Never leave the repo red at the end of Phase 3 — npm run ci must be green before archiving.
Touch product behavior? Run the /verify skill on the affected flow before archiving.
The exit is empirical: you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you think it's clean. Log the cycle result in the Cycle Log each time.
Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.

1. Batch plan & progress tracker

Batches are ordered foundation-first (baseline & auth before features; features before cross-cutting; regression last). Update Status and Findings every run.

Status key: ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck

#	Batch	Primary surface	Data state	Status	Open / Fixed
B0	Baseline, tooling & coverage recon	`npm run ci`/`check`, app boots, console clean, re-scan routes/pages/API vs plan & update it, control census	any	🔄	0 / 1
B-UI	Design-system primitives	each `client/components/ui/*` × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard	any	⬜	0 / 0
B1	Auth & authorization	login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation	multi + single user	⬜	0 / 0
B2	Tracker (core)	`/` buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift	seeded + adversarial	⬜	0 / 0
B3	Bills & schedules	`/bills` CRUD, custom schedules, reorder, merchant rules, historical import	adversarial	⬜	0 / 0
B4	Subscriptions & Categories	`/subscriptions`, catalog, `/categories`, groups, reorder	seeded	⬜	0 / 0
B5	Reporting reconciliation	`/summary`, `/calendar`, `/analytics`, `/health` cross-check totals	seeded + large	⬜	0 / 0
B6	Spending	`/spending` YNAB view, averages, cover-overspending, safe-to-spend	seeded + edge months	🔄	0 / 1
B7	Debt planning (math)	`/snowball`, `/payoff` APR/amortization vs hand-calc	edge (APR=0, $0 debt)	🔄	0 / 2
B8	Banking & bank sync	`/bank-transactions`, SimpleFIN sync, matching, merchant/store, advisory filter	seeded txns	⬜	0 / 0
B9	Data lifecycle	`/data` import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip	empty + seeded	🔄	0 / 1
B10	Notifications & workers	email + ntfy/Gotify/Discord/Telegram, reminders, cron workers	seeded	⬜	0 / 0
B11	Admin panel	users, login mode, auth methods, backups, cleanup, status, onboarding	admin	⬜	0 / 0
B12	Settings, Profile & global UI	`/settings`, `/profile`, static pages, command palette, sidebar/nav	any	⬜	0 / 0
B13	API / backend direct	all `/api/*`: auth, CSRF, validation, rate limits, error shape, IDOR, cents	via HTTP client	🔄	0 / 1
B14	Non-functional	a11y, performance, PWA/offline, XSS/secrets, timezone/DST	large + adversarial	🔄	0 / 3
B15	Regression & sign-off	full smoke on production build, exit criteria	seeded	⬜	0 / 0

After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new cycle from B0 against the next build/version.

1.1 QA Cycle Log

One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A cycle is only "clean" when its find pass logged zero findings. Keep going until you get a clean cycle.

Cycle	Started	Build / commit	Findings logged	Fixed / archived	Result
1	2026-07-02	`bdbf231`→`98c8fab` (dev)	9	9 → all fixed & archived (B9-01, B13-01, B6-01, B7-01, B7-02, B14-01, B14-02, B14-03, B0-01)	🔁 all findings fixed — 0 open; re-run required for a clean pass. Probed B0/B1/B3/B4/B6/B7/B8/B9/B13/B14. Solid: auth-isolation, CSRF, payment/date validation, recurrence (quarterly/annual gating, Feb-31, leap year), transaction matching/dedup, subscription+spending math, XSS. Fixed: seed 100× cents (S2), bill-amount validation, both money-rounding/format bugs, all a11y (8/8 axe), bundle split, unused-dep + dead-code removal.

Result key: 🔄 in progress · 🔁 findings fixed, re-run required · ✅ clean (zero findings — QA complete)

2. Active Findings Log

This is the live log. Record every finding here the moment it's found — before fixing. Keep only Open / Fixing / Fixed rows here. Once a finding is Fixed + verified + archived to HISTORY.md, delete its row from this table (its permanent record is the changelog entry).

Finding ID: QA-B{batch}-{nn} (e.g. QA-B2-01). Severity: S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see Appendix A). Status: 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive.

ID	Sev	Area (`file:line`)	Summary	Status	Notes / repro
(none — all Cycle 1 findings fixed & archived to `HISTORY.md` v0.41.0)

Finding template (paste a new row above; keep the full write-up here until archived):

ID: QA-B?-??
Severity: S1 / S2 / S3 / S4 / IMP
Environment: browser / viewport / theme / role / auth mode / data state
Area: file:line (if known)
Steps to reproduce:
  1.
  2.
Expected:
Actual:
Evidence: console / network / DB row / screenshot
Fix: (what changed, commit) — Verified by: (repro re-run + ci)

Log console errors, failed network requests, and unhandled rejections as findings even if the UI looks fine.

All Cycle 1 write-ups have been archived to HISTORY.md v0.41.0 (see §3).

3. Archiving fixed findings to HISTORY.md

HISTORY.md is the project changelog (version-organized, emoji section headers). When a finding is Fixed and verified, write a concise entry there, then remove the row from the Active Findings Log.

Where: under the current in-progress version heading (e.g. ## v0.41.x). If a QA cycle produces several fixes, group them under a ### 🐛 QA Fixes (bug fixes) or ### 🧹 QA (polish/improvements) section, matching the existing changelog voice.

Entry format (match the terse, specific style already in HISTORY.md):

### 🐛 QA Fixes

- **[Area] Short title** — What was wrong and the user-visible impact, then the
  fix. Reference the file/function and any migration or test added.
  (was QA-B7-03)

Rules

One bullet per finding; include the old QA-B?-?? id in parentheses for traceability.
If a fix added/changed a test, say which (tests/… or client/…test.*).
Don't archive until the fix is verified (repro gone + npm run ci green).
IMP items that were implemented are archived the same way; IMP items merely noted stay in the Findings Log (or graduate to FUTURE.md/roadmap.md if deferred).

4. Environment & setup

4.1 Running the app

Mode	Command	URL
Dev (API + UI, hot reload)	`npm run dev`	UI `http://localhost:5173` (proxies API → `:3000`)
API only	`npm run dev:api`	`http://localhost:3000`
Production build	`npm run build` then `npm start`	`http://localhost:3000`
Docker	`docker-compose up`	per compose config

Backend: Node/Express on PORT (default 3000). Frontend dev: Vite on 5173.
Data: SQLite at db/bills.db (WAL). Back it up before destructive tests (backups/ or a manual copy). Prefer a scratch DB for B9/B11 restore tests.
Configure a dedicated test .env from .env.example. Never point tests at production data or a live SimpleFIN account with real credentials.
Test commands: npm run ci (check + all tests + build), npm run check (syntax + build), npm run test (server), npm run test:client (vitest).

4.2 Test matrix

Full functional pass across reasonable combinations; smoke (B15) across all.

Dimension	Values
Browser	Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser)
Viewport	Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414
Theme	Light, Dark, system-follow
Role	`user`, `admin`, default admin (first-run)
Auth mode	Multi-user, single-user
Density	Normal + compact desktop
Network	Online, Slow 3G, offline (PWA shell)
Data state	Empty, seeded demo, large/stress, adversarial

4.3 Accounts to prepare

admin, user, a second user (data-isolation), a single-user-mode instance (separate DB).
Demo reference: guest / guest123 (do not run destructive flows on any shared demo server).

4.4 Automated E2E harness (Playwright)

Manual passes prove a button works once; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.

Command	What it does
`npm run test:e2e`	run the E2E suite headless (boots the app via `webServer`)
`npm run test:e2e:ui`	Playwright UI mode — watch/debug interactively
`npm run test:e2e:update`	re-baseline visual-regression screenshots (review the diff before committing)

Setup (one-time): npm install then npx playwright install chromium. Config: playwright.config.js; specs in e2e/.
Scope: the suite is a thin critical-path smoke, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
Don't point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.

5. Test data strategy

Empty: brand-new account. Every page must render a sensible empty state — no crash, no NaN, no blank white screen.
Seeded: use Data → Seed Demo Data for a realistic mid-size dataset.
Large/stress: 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (@tanstack/react-virtual), charts, query perf.
Adversarial (deliberately try to break it):
- Amounts: 0, 0.01, negative, 9,999,999.99, fractional cents.
- Text: emoji, RTL, <script> XSS probe, 1,000-char strings, leading/trailing spaces, SQL-ish input.
- Dates: 1st/14th/15th/31st boundaries; 28/29/30/31-day months; Feb 29; month/year crossing; inactive ranges; skipped months; overrides.
- Transactions: duplicate amount+date, same-day merchant repeats, refunds/negatives.
- Debt: APR 0%, very high APR, $0 balance, absurd inputs.
- Non-UTC system timezone + a DST boundary date.

6. Cross-cutting checks (every page)

Run on every page during its batch — don't assume a shared component behaves the same everywhere.

Navigation & routing — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → NotFoundPage · active nav highlighted · simplefinOnly (Banking) gated · Ctrl+K palette finds & opens it.

Buttons & interactions — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · rapid repeated toggling (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then navigate away mid-flight doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).

Forms & validation — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · paste into every field (incl. "$1,234.56" into currency) · browser/password-manager autofill on login & forms · IME/composition (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).

Number inputs (you have ~45 type="number" fields — the highest-risk control type) — scroll-wheel over a focused field must not silently change the value · spinner up/down buttons step correctly and respect min/max · reject/e/+/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no NaN submitted · cents fields never accept >2 decimals.

Per-control state matrix — for each control on the page, verify every applicable state renders and behaves in both light and dark: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).

Note — "sliders": this app has no <input type=range> sliders. The SlidersHorizontal glyph is just the Bills filter-panel button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.

States — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, ErrorBoundary shows a fallback not a white page.

Visual & responsive — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.

Data integrity — money 2-decimals, no float artifacts (9.999999) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).

7. Batch playbooks (detailed checklists)

Each batch below is the detailed script for the matching row in §1. Apply §6 throughout.

B0 — Baseline, tooling & coverage recon

Run FIRST in every cycle. This is where the plan re-syncs with reality — new pages, routes, endpoints, or features added since the last cycle get discovered and folded in before testing, so coverage never silently rots.

Tooling baseline

npm run ci — record any failing server/client test or build error as a finding (S1/S2).
npm run check — server syntax + build clean.
App boots via npm run dev and production npm start; note startup warnings.
Load the app; browser console + server logs clean on first load and first navigation.
Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.

Coverage recon — enumerate the actual product and diff it against this plan. Run these, then compare the output to the batch playbooks (§7) and the route map:

Client routes — grep -nE "<Route" client/App.jsx — every path present here must appear in a batch playbook and Appendix C.
Pages — ls client/pages/ — every page has an owning batch.
Sidebar / nav entries — grep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx — new nav links (incl. conditional ones like simplefinOnly) are covered.
API route mounts — grep -nE "app.use\('/api" server.js — every mounted route group is in B13's list and mapped in Appendix C.
Services & components — ls services/ and ls client/components/**/ — new service/component families have a home in a playbook.
UI primitives — ls client/components/ui/ — every shared primitive is covered by the B-UI playbook; a new primitive gets a row there.
Interactive-control census (makes "every button tested" provable) — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: Appendix E). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory: grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/components and grep -rn "onClick=" client/pages/<Page>.jsx.
Feature flags / conditional surfaces — search for Only, enabled, featureFlag, env gates that hide/show pages; ensure each state is tested.
What changed since last cycle — skim git log/HISTORY.md since the previous cycle's commit (see Cycle Log) for new features/pages.

Update the plan (do this now, not later) — for anything the recon surfaced that isn't already covered:

Add it to the relevant batch playbook (or create a new batch and a row in the §1 table).
Add/adjust its entry in Appendix C.
Note the plan update in the Cycle Log row for this cycle.
If a whole surface is missing from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.

B-UI — Design-system primitives

Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once. Drive them wherever they're already mounted (or a scratch page); run each against the per-control state matrix × light/dark × keyboard-only. One finding row per primitive.

Primitive (`client/components/ui/`)	Must verify
`button.jsx`	every variant (default/destructive/outline/ghost/link) + size; disabled truly blocks click; loading state; focus ring; Enter/Space activate
`input.jsx`	text/number/password/date/search/file types; placeholder; disabled/read-only; error styling; paste/autofill; number-input rules above
`select.jsx` (Radix)	opens by mouse and keyboard; type-ahead; long lists scroll; onChange fires in Firefox+Safari; disabled options; value persists; Esc closes
`checkbox.jsx` / `switch.jsx`	toggles by click and Space; indeterminate (if used); disabled; label click toggles; controlled value round-trips
`dialog.jsx` / `alert-dialog.jsx` / `confirm-dialog.jsx` / `input-dialog.jsx`	open/close; focus trap + restore; Esc closes; overlay click behaves; Cancel actually cancels (no side effect); Confirm fires once; scroll-lock releases
`dropdown-menu.jsx`	keyboard arrow nav; Esc; submenu; disabled items; click-outside closes; no clipping at viewport edge
`tabs.jsx`	arrow-key nav; active state; content swaps; deep-link/refresh keeps tab (if applicable)
`tooltip.jsx`	hover and keyboard-focus show it; dismiss on blur; touch behavior; not a11y-only info trap
`table.jsx`	header/zebra/hover; horizontal scroll on narrow viewport (no page h-scroll); empty state
`collapsible.jsx`	expand/collapse animation; state persists; keyboard operable
`sonner.jsx` (toast)	success/error/loading; stack + dismiss; auto-dismiss timing; doesn't cover primary actions; announced to SR
`save-status.jsx`	idle/saving/saved/error transitions reflect real autosave (`useAutoSave.test.jsx`)
`Skeleton.jsx`	matches final layout (no jump); no infinite skeleton on error
`badge.jsx` / `card.jsx` / `separator.jsx` / `label.jsx`	contrast in dark mode; label `htmlFor` focuses its control; no overflow on long text
`theme-toggle.jsx`	light↔dark↔system; applied before first paint (no flash); persists across reload

Every primitive above passes its row in light and dark, keyboard-only, at mobile width.
Axe scan (see B14) on a page densely using primitives → zero critical violations.

B1 — Auth & authorization

Password: valid login → correct landing (Tracker for user, /admin for default admin); wrong password → clear error, no user-enumeration timing/message difference; logout clears session; expired session redirects and preserves state.from; session persists across refresh.
Rate limiting: repeated failed logins throttled (loginLimiter/loginUsernameLimiter), clear message, resets.
TOTP: enroll (QR + secret), code accepted, backup codes work once, login prompts for TOTP, wrong code rejected+throttled, disable requires re-auth.
WebAuthn: register/login/remove passkey in Chrome, Firefox, Safari; password fallback works.
OIDC/Authentik: SSO flow creates/links account; admin config errors surface cleanly; oidcLimiter throttles.
Roles/guards: user blocked from /admin*, /status (redirect) and admin APIs (403); default admin forced to /admin; single-user bypass correct but admin surfaces still protected; unauth API → 401.
Data isolation (critical): user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
CSRF: state-changing request without a valid token → rejected.

B2 — Tracker (`/`)

Month nav (prev/next/jump), current month highlighted, data reloads per month.
Bills land in correct 1–14 / 15–31 bucket by due date; pin-due sorting works.
Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
Skip excludes from totals for that month only; unskip restores.
Per-month amount override persists, doesn't affect base bill or other months.
Notes cell add/edit/clear persists per month.
Inactive/date-range bill doesn't show or count outside its range.
Balance/starting-amount cards period-aware + editable; income − bills / safe-to-spend correct.
Overdue command center: accurate list/count, pay/skip actions work.
Cash flow card, drift insight, payment ledger (add/edit/delete reconciles), autopay suggestion apply/dismiss.
Editable cells autosave; Esc cancels; invalid input handled. Mobile rows equal desktop actions. Compact mode intact.

B3 — Bills (`/bills`)

Create with all fields (name, amount, due date, category, schedule, account, autopay, active range).
Edit propagates to Tracker/Summary/Calendar/Analytics; delete confirms + handles orphan payments/history.
Custom schedules (weekly/biweekly/monthly/quarterly/annual/custom): next-due & occurrences correct across month/year boundaries.
Drag reorder persists (cross-check billReorder.test.js); search/filter panel filters + clears; large-list virtualization smooth.
Merchant rules: create/matches/edit/delete; historical import dialog attributes month-crossing payments correctly.
BillModal open/close, validation, cancel discards unsaved changes.

B4 — Subscriptions & Categories

Subscriptions: add/edit/delete, active/cancelled, renewal & annual→monthly normalization; totals feed Tracker/Summary/Analytics.
Catalog: browse/search, add-from-catalog pre-fills.
Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (categoryGroups/categoryReorder tests); colors/icons consistent on Tracker/Spending/Analytics.

B5 — Reporting reconciliation

Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
Calendar plots bills/payments on correct days (timezone: a bill due on the 1st must not render on the 31st); day totals correct.
Analytics charts render with data AND empty (no broken SVG/NaN axes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK.
Health indicators compute from real data, no crash on empty; recommendations sane.

B6 — Spending (`/spending`)

Category-group view assigned/spent/available math correct; 3-month averages correct.
Cover-overspending reallocates funds correctly and is reversible.
Safe-to-spend matches Tracker (safeToSpend.test.js); month nav; empty/partial months handled.

B7 — Debt planning (`/snowball`, `/payoff`)

Add debts (balance/APR/min); snowball vs avalanche ordering correct.
Projection + amortization vs a hand-calculated example; APR=0 and already-paid debts correct.
Extra-payment/budget updates payoff date + total interest; chart renders; plan history saves/restores; status banner accurate.
Edge: single debt, many debts, $0 debt, negative/absurd inputs rejected.

B8 — Banking (`/bank-transactions`)

Ledger loads/virtualizes/filters (date/account/amount/merchant/status).
Transaction matching (match/unmatch), auto-match review approve/reject, no double-match (transactionMatchService.test.js).
Merchant/store matching rules + confidence/duplicates; advisory non-bill filter flags/hides with override.
Matched payments reflect on Tracker/ledger without double-counting; category picker persists.

B9 — Data lifecycle (`/data`)

Imports: spreadsheet (XLSX/CSV) map/preview/commit, malformed rejected, dup/partial handled; transaction CSV (csvTransactionImportService.test.js) dedupe + parsing; SQLite user import version-checked + confirms overwrite; seed demo data safe; import history lists + rollback.
Exports: download SQLite round-trips (export → fresh account → import → matches); Excel export opens uncorrupted; ICS calendar feed valid in a client AND properly token-gated (route mounts before auth — verify not open).
Backups: manual + scheduled restorable on a scratch instance; permissions not world-readable; old backups pruned (backupAndCleanup.test.js).

B10 — Notifications & workers

Each channel (email/SMTP, ntfy, Gotify, Discord, Telegram): test message delivers; bad token/URL → clear error, logged, no secret leak.
Reminders fire at configured lead time for upcoming/overdue; no duplicates; paid/skipped excluded; respects per-user prefs.
Workers: dailyWorker, bankSyncWorker (interval + guardrails), backupScheduler run on schedule; errors caught/logged, don't crash server, next run unblocked.

B11 — Admin panel (`/admin`)

Onboarding wizard completes without a broken state.
Users table: add/edit-role/reset-pw/disable/delete; cannot remove the last admin.
Login mode switch single↔multi verified live, no lockout; auth-methods enable/disable + bad config surfaced.
Email notif config + test send; bank sync admin (configure/manual/auto/status/revoke).
Backups create/list/download/restore/delete; cleanup panel previews impact + confirms (counts match backupAndCleanup.test.js).
Privacy admin edits reflect on public /privacy; system status metrics/versions/jobs accurate (statusService.test.js); admin actions rate-limited + audited (auditService — spot-check log).

B12 — Settings, Profile & global UI

Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
Profile: change password (current required, invalidates sessions), manage 2FA/passkeys, sessions revoke (profileRoute.test.js).
Static: About (public + admin, version shown), Privacy, Release Notes (dialog once per user, dismiss persists), Roadmap (admin), NotFound friendly + way home.
Global: command palette (Ctrl+K) search/keyboard/Esc, hidden for default admin; sidebar collapse/expand + mobile overlay (check overflow issue in docs/UI_IMPROVEMENTS.md); toasts stack/dismiss; page transitions no flash/double-fetch; theme applied before first paint.

B13 — API / backend direct

Route groups: auth, auth/oidc, admin, tracker, bills, subscriptions, payments, data-sources, transactions, matches, categories, settings, user, calendar, summary, monthly-starting-amounts, analytics, spending, snowball, notifications, status, about, about-admin, privacy, version, profile, export, import/imports.

Auth: unauth → 401, wrong role → 403, right role → 200.
CSRF: state-changing without valid token rejected; with token succeeds (middleware/csrf.js).
Validation: bad/missing body → structured 4xx (middleware/errorFormatter.js, utils/apiError.js), never a raw 500 stack.
IDOR/isolation: other user's resource by id → 403/404, no leak.
Rate limits: login/admin/export/import/OIDC limiters trigger + reset (middleware/rateLimiter.js).
Money in integer cents end-to-end (per docs/cents-migration-plan.md); API and DB agree; no float drift.
Idempotency: repeated create doesn't duplicate; concurrent edits resolve sanely.
Consistent error JSON + correct status codes; security headers present (middleware/securityHeaders.js); public routes (about/privacy/version/calendar feed) leak nothing sensitive.

B14 — Non-functional

a11y (manual): keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix aria-*); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only.
a11y (automated): run axe-core on every page (@axe-core/playwright, or jest-axe for component-level) — zero critical/serious violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once.
Visual regression: capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright toHaveScreenshot); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed.
Performance: initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query staleTime).
PWA/offline: installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
Security spot-checks: XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom MarkdownText renderer — https-only link hrefs, no dangerouslySetInnerHTML anywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookies HttpOnly/Secure/SameSite; encryptionService protects at-rest secrets, keys not committed. (Depth: SECURITY_AUDIT.md.)
Resilience: kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
Fault injection (systematic): with a request-interception harness (Playwright page.route, or DevTools network overrides), force each page's API calls to 401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON and confirm the UI shows a recoverable error (toast or ErrorBoundary fallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently.
Timezone/locale: non-UTC tz + DST boundary — due dates and calendar stay correct.

B15 — Regression & sign-off

Run on the production build (npm start), not dev:

npm run ci green. Log in as user and admin.
npm run test:e2e green (Playwright smoke + axe + visual-regression baselines match, §4.4).
Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
Snowball: add debt → projection. Data: seed → export → import round-trip (scratch DB).
Admin: open panel, users, system status, run a backup. Banking loads + matches (if SimpleFIN configured).
Notifications: one test message on configured channel. Toggle dark mode; mobile viewport; Ctrl+K navigates.
Bogus URL → 404; logout → login redirect. Console clean throughout.
Confirm exit criteria.

8. Appendices

Appendix A — Severity definitions

Level	Definition
S1 – Critical	Data loss/corruption, security hole, crash/blank page, wrong money math, cannot log in/save.
S2 – Major	Feature broken/unusable, wrong results, broken navigation, unhandled error.
S3 – Minor	Works but wrong edge behavior, confusing UX, missing validation message.
S4 – Cosmetic	Visual/copy/alignment/dark-mode-contrast, non-blocking.
IMP – Improvement	Not a bug; enhancement or polish idea.

Appendix B — Exit / sign-off criteria

A cycle is release-ready when:

All batches B0–B15 ✅ on the primary matrix (Chrome desktop + mobile, light + dark, user + admin).
B15 smoke green on the production build.
Zero open S1/S2 in the Findings Log; S3/S4/IMP triaged.
npm run ci green; no new console errors.
Data export→import round-trip verified with no loss.
Auth/authorization + data-isolation all pass.
Money and date/period correctness verified vs hand-calculated examples.
All fixes for the cycle archived to HISTORY.md; cycle summary recorded (date, build/commit, environment).

Appendix C — Page ↔ route ↔ API quick map

Page	Route	Primary API
Tracker	`/`	`/api/tracker`, `/api/bills`, `/api/payments`, `/api/monthly-starting-amounts`
Calendar	`/calendar`	`/api/calendar`
Summary	`/summary`	`/api/summary`
Bills	`/bills`	`/api/bills`, `/api/categories`, `/api/matches`
Subscriptions / Catalog	`/subscriptions`, `/subscriptions/catalog`	`/api/subscriptions`
Categories	`/categories`	`/api/categories`
Health	`/health`	`/api/analytics`, `/api/summary`
Analytics	`/analytics`	`/api/analytics`
Spending	`/spending`	`/api/spending`
Banking	`/bank-transactions`	`/api/transactions`, `/api/matches`, `/api/data-sources`
Snowball / Payoff	`/snowball`, `/payoff`	`/api/snowball`
Settings	`/settings`	`/api/settings`, `/api/notifications`
Profile	`/profile`	`/api/profile`, `/api/user`
Data	`/data`	`/api/import`, `/api/export`, `/api/data-sources`
Admin	`/admin`, `/admin/status`	`/api/admin`, `/api/status`, `/api/about-admin`
About / Privacy / Release Notes / Roadmap	`/about`, `/privacy`, `/release-notes`, `/roadmap`	`/api/about`, `/api/privacy`, `/api/version`

Appendix D — Reference docs

SECURITY_AUDIT.md (security depth) · docs/UI_IMPROVEMENTS.md (known UI issues) · docs/cents-migration-plan.md (money-as-cents) · docs/SIMPLEFIN_CONSUMER_GUARDRAILS.md (sync limits) · docs/CSRF-SPA-Setup.md, docs/RATE_LIMITING_ENHANCEMENT.md (security middleware) · REVIEW.md, DEVELOPMENT_LOG.md, roadmap.md, FUTURE.md (context/known gaps) · HISTORY.md (changelog / fix archive) · playwright.config.js + e2e/ (automated E2E/visual/a11y harness, §4.4).

Appendix E — Per-page control census

The completeness ledger behind "every button, textbox, slider is right." Fill one table per page during B0 and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx + grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx.

Template (copy per page):

Control	Type	Expected action	States checked (default/focus/disabled/error/loading)	Keyboard	Result
e.g. Quick-pay button	button	marks bill paid, updates balance cards, undo available	default ✓ · disabled-while-saving ✓	Enter ✓	✅ / finding id
e.g. Amount input	number	per-month override, cents only, no wheel-scroll change	default ✓ · error-on-letters ✓	Tab/Esc ✓	✅ / finding id

Pages to census (from client/pages/, keep in sync with Appendix C): Tracker, Calendar, Summary, Bills, Subscriptions, SubscriptionCatalog, Categories, Health, Analytics, Spending, Snowball, Payoff, BankTransactions, Data, Settings, Profile, Admin, Status, About, Privacy, ReleaseNotes, Roadmap, Login, NotFound — plus the shared Sidebar/command-palette/header chrome once.

41 KiB Raw Blame History Unescape Escape