41 KiB
BillTracker — Master QA Plan (living document)
Version target: v0.41.x · Executor: Claude (active) · Last updated: 2026-07-02
This is a living, operational QA document, not a static spec. Claude runs it,
in batches, actively hunting for bugs/errors/rough edges, fixing them, and
archiving each fixed finding to HISTORY.md. Update this document whenever a
better approach, a new risk area, or a missed surface is discovered.
The prime directive: don't just confirm the happy path — try to break the product. Every batch should end with the tree green, the Findings Log up to date, and any fixes archived to
HISTORY.md.
Table of contents
- Execution model — find, then fix, then repeat
- Batch plan & progress tracker
- Active Findings Log
- Archiving fixed findings to HISTORY.md
- Environment & setup
- Test data strategy
- Cross-cutting checks (every page)
- Batch playbooks (detailed checklists)
- Appendices
0. Execution model — find, then fix, then repeat
Separate finding from fixing. During a QA pass we hunt and log — we do not fix as we go (except show-stoppers, see below). Only after the whole plan has run do we enter a dedicated fix phase and fix every logged finding. Then we run the entire QA plan again from the top. Repeat until a full pass finds zero errors. Two nested loops:
OUTER — QA CYCLE (repeat until a full pass finds zero findings)
┌──────────────────────────────────────────────────────────────────────┐
│ PHASE 1 · FIND Run every batch B0→B15 in find-only mode. │
│ Probe hard, LOG everything to the Findings Log. │
│ Do NOT fix (except show-stoppers). │
│ ↓ │
│ PHASE 2 · FIX QA pass done. Now fix EVERY logged finding — │
│ all of them (S1→IMP). Root-cause, with tests. │
│ ↓ │
│ PHASE 3 · VERIFY Re-run each fix's repro; `npm run ci` green. │
│ ↓ │
│ PHASE 4 · ARCHIVE Move every fixed finding to HISTORY.md (§3). │
│ ↓ │
│ PHASE 5 · RE-RUN Start a new cycle at PHASE 1. If that full pass │
│ logs zero findings → QA is clean, STOP. │
└──────────────────────────────────────────────────────────────────────┘
INNER — per batch during PHASE 1 (find-only)
PICK next ⬜ batch → SET UP (app, data state, role, console open) →
PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
mark batch status in §1 → next batch. (No fixing here.)
Show-stopper exception. A show-stopper is a finding that blocks continued QA — the app won't boot, you can't log in, or a page crashes so hard you can't test the rest of it. Only these get fixed immediately (mid-pass), because you can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix; then continue the find pass. Everything else is logged and left for Phase 2 — no matter how tempting or trivial.
Discipline (for best results)
- Phase 1 is log-only. Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
- Keep each find batch tight and focused — one batch per session — so probing stays thorough.
- Phase 2 fixes everything, not just S1/S2. Root-cause over surface patch; add/extend a test in
tests/orclient/**/*.test.*for every logic bug so it can't silently return. - Never leave the repo red at the end of Phase 3 —
npm run cimust be green before archiving. - Touch product behavior? Run the
/verifyskill on the affected flow before archiving. - The exit is empirical: you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you think it's clean. Log the cycle result in the Cycle Log each time.
- Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.
1. Batch plan & progress tracker
Batches are ordered foundation-first (baseline & auth before features; features before cross-cutting; regression last). Update Status and Findings every run.
Status key: ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck
| # | Batch | Primary surface | Data state | Status | Open / Fixed |
|---|---|---|---|---|---|
| B0 | Baseline, tooling & coverage recon | npm run ci/check, app boots, console clean, re-scan routes/pages/API vs plan & update it, control census |
any | 🔄 | 0 / 1 |
| B-UI | Design-system primitives | each client/components/ui/* × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard |
any | ⬜ | 0 / 0 |
| B1 | Auth & authorization | login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation | multi + single user | ⬜ | 0 / 0 |
| B2 | Tracker (core) | / buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift |
seeded + adversarial | ⬜ | 0 / 0 |
| B3 | Bills & schedules | /bills CRUD, custom schedules, reorder, merchant rules, historical import |
adversarial | ⬜ | 0 / 0 |
| B4 | Subscriptions & Categories | /subscriptions, catalog, /categories, groups, reorder |
seeded | ⬜ | 0 / 0 |
| B5 | Reporting reconciliation | /summary, /calendar, /analytics, /health cross-check totals |
seeded + large | ⬜ | 0 / 0 |
| B6 | Spending | /spending YNAB view, averages, cover-overspending, safe-to-spend |
seeded + edge months | 🔄 | 0 / 1 |
| B7 | Debt planning (math) | /snowball, /payoff APR/amortization vs hand-calc |
edge (APR=0, $0 debt) | 🔄 | 0 / 2 |
| B8 | Banking & bank sync | /bank-transactions, SimpleFIN sync, matching, merchant/store, advisory filter |
seeded txns | ⬜ | 0 / 0 |
| B9 | Data lifecycle | /data import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip |
empty + seeded | 🔄 | 0 / 1 |
| B10 | Notifications & workers | email + ntfy/Gotify/Discord/Telegram, reminders, cron workers | seeded | ⬜ | 0 / 0 |
| B11 | Admin panel | users, login mode, auth methods, backups, cleanup, status, onboarding | admin | ⬜ | 0 / 0 |
| B12 | Settings, Profile & global UI | /settings, /profile, static pages, command palette, sidebar/nav |
any | ⬜ | 0 / 0 |
| B13 | API / backend direct | all /api/*: auth, CSRF, validation, rate limits, error shape, IDOR, cents |
via HTTP client | 🔄 | 0 / 1 |
| B14 | Non-functional | a11y, performance, PWA/offline, XSS/secrets, timezone/DST | large + adversarial | 🔄 | 0 / 3 |
| B15 | Regression & sign-off | full smoke on production build, exit criteria | seeded | ⬜ | 0 / 0 |
After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new cycle from B0 against the next build/version.
1.1 QA Cycle Log
One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A cycle is only "clean" when its find pass logged zero findings. Keep going until you get a clean cycle.
| Cycle | Started | Build / commit | Findings logged | Fixed / archived | Result |
|---|---|---|---|---|---|
| 1 | 2026-07-02 | bdbf231→98c8fab (dev) |
9 | 9 → all fixed & archived (B9-01, B13-01, B6-01, B7-01, B7-02, B14-01, B14-02, B14-03, B0-01) | 🔁 all findings fixed — 0 open; re-run required for a clean pass. Probed B0/B1/B3/B4/B6/B7/B8/B9/B13/B14. Solid: auth-isolation, CSRF, payment/date validation, recurrence (quarterly/annual gating, Feb-31, leap year), transaction matching/dedup, subscription+spending math, XSS. Fixed: seed 100× cents (S2), bill-amount validation, both money-rounding/format bugs, all a11y (8/8 axe), bundle split, unused-dep + dead-code removal. |
Result key: 🔄 in progress · 🔁 findings fixed, re-run required · ✅ clean (zero findings — QA complete)
2. Active Findings Log
This is the live log. Record every finding here the moment it's found — before
fixing. Keep only Open / Fixing / Fixed rows here. Once a finding is
Fixed + verified + archived to HISTORY.md, delete its row from this table
(its permanent record is the changelog entry).
Finding ID: QA-B{batch}-{nn} (e.g. QA-B2-01).
Severity: S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see Appendix A).
Status: 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive.
| ID | Sev | Area (file:line) |
Summary | Status | Notes / repro |
|---|---|---|---|---|---|
(none — all Cycle 1 findings fixed & archived to HISTORY.md v0.41.0) |
Finding template (paste a new row above; keep the full write-up here until archived):
ID: QA-B?-??
Severity: S1 / S2 / S3 / S4 / IMP
Environment: browser / viewport / theme / role / auth mode / data state
Area: file:line (if known)
Steps to reproduce:
1.
2.
Expected:
Actual:
Evidence: console / network / DB row / screenshot
Fix: (what changed, commit) — Verified by: (repro re-run + ci)
Log console errors, failed network requests, and unhandled rejections as findings even if the UI looks fine.
All Cycle 1 write-ups have been archived to HISTORY.md v0.41.0 (see §3).
3. Archiving fixed findings to HISTORY.md
HISTORY.md is the project changelog (version-organized, emoji section headers).
When a finding is Fixed and verified, write a concise entry there, then remove
the row from the Active Findings Log.
Where: under the current in-progress version heading (e.g. ## v0.41.x). If a
QA cycle produces several fixes, group them under a ### 🐛 QA Fixes (bug fixes)
or ### 🧹 QA (polish/improvements) section, matching the existing changelog voice.
Entry format (match the terse, specific style already in HISTORY.md):
### 🐛 QA Fixes
- **[Area] Short title** — What was wrong and the user-visible impact, then the
fix. Reference the file/function and any migration or test added.
(was QA-B7-03)
Rules
- One bullet per finding; include the old
QA-B?-??id in parentheses for traceability. - If a fix added/changed a test, say which (
tests/…orclient/…test.*). - Don't archive until the fix is verified (repro gone +
npm run cigreen). - IMP items that were implemented are archived the same way; IMP items merely noted stay in the Findings Log (or graduate to
FUTURE.md/roadmap.mdif deferred).
4. Environment & setup
4.1 Running the app
| Mode | Command | URL |
|---|---|---|
| Dev (API + UI, hot reload) | npm run dev |
UI http://localhost:5173 (proxies API → :3000) |
| API only | npm run dev:api |
http://localhost:3000 |
| Production build | npm run build then npm start |
http://localhost:3000 |
| Docker | docker-compose up |
per compose config |
- Backend: Node/Express on
PORT(default3000). Frontend dev: Vite on5173. - Data: SQLite at
db/bills.db(WAL). Back it up before destructive tests (backups/or a manual copy). Prefer a scratch DB for B9/B11 restore tests. - Configure a dedicated test
.envfrom.env.example. Never point tests at production data or a live SimpleFIN account with real credentials. - Test commands:
npm run ci(check + all tests + build),npm run check(syntax + build),npm run test(server),npm run test:client(vitest).
4.2 Test matrix
Full functional pass across reasonable combinations; smoke (B15) across all.
| Dimension | Values |
|---|---|
| Browser | Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser) |
| Viewport | Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414 |
| Theme | Light, Dark, system-follow |
| Role | user, admin, default admin (first-run) |
| Auth mode | Multi-user, single-user |
| Density | Normal + compact desktop |
| Network | Online, Slow 3G, offline (PWA shell) |
| Data state | Empty, seeded demo, large/stress, adversarial |
4.3 Accounts to prepare
admin,user, a seconduser(data-isolation), a single-user-mode instance (separate DB).- Demo reference:
guest / guest123(do not run destructive flows on any shared demo server).
4.4 Automated E2E harness (Playwright)
Manual passes prove a button works once; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.
| Command | What it does |
|---|---|
npm run test:e2e |
run the E2E suite headless (boots the app via webServer) |
npm run test:e2e:ui |
Playwright UI mode — watch/debug interactively |
npm run test:e2e:update |
re-baseline visual-regression screenshots (review the diff before committing) |
- Setup (one-time):
npm installthennpx playwright install chromium. Config:playwright.config.js; specs ine2e/. - Scope: the suite is a thin critical-path smoke, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
- Don't point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.
5. Test data strategy
- Empty: brand-new account. Every page must render a sensible empty state — no crash, no
NaN, no blank white screen. - Seeded: use Data → Seed Demo Data for a realistic mid-size dataset.
- Large/stress: 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (
@tanstack/react-virtual), charts, query perf. - Adversarial (deliberately try to break it):
- Amounts:
0,0.01, negative,9,999,999.99, fractional cents. - Text: emoji, RTL,
<script>XSS probe, 1,000-char strings, leading/trailing spaces, SQL-ish input. - Dates: 1st/14th/15th/31st boundaries; 28/29/30/31-day months; Feb 29; month/year crossing; inactive ranges; skipped months; overrides.
- Transactions: duplicate amount+date, same-day merchant repeats, refunds/negatives.
- Debt: APR
0%, very high APR,$0balance, absurd inputs. - Non-UTC system timezone + a DST boundary date.
- Amounts:
6. Cross-cutting checks (every page)
Run on every page during its batch — don't assume a shared component behaves the same everywhere.
Navigation & routing — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → NotFoundPage · active nav highlighted · simplefinOnly (Banking) gated · Ctrl+K palette finds & opens it.
Buttons & interactions — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · rapid repeated toggling (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then navigate away mid-flight doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).
Forms & validation — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · paste into every field (incl. "$1,234.56" into currency) · browser/password-manager autofill on login & forms · IME/composition (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).
Number inputs (you have ~45 type="number" fields — the highest-risk control type) — scroll-wheel over a focused field must not silently change the value · spinner up/down buttons step correctly and respect min/max · reject/e/+/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no NaN submitted · cents fields never accept >2 decimals.
Per-control state matrix — for each control on the page, verify every applicable state renders and behaves in both light and dark: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).
Note — "sliders": this app has no
<input type=range>sliders. TheSlidersHorizontalglyph is just the Bills filter-panel button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.
States — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, ErrorBoundary shows a fallback not a white page.
Visual & responsive — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.
Data integrity — money 2-decimals, no float artifacts (9.999999) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).
7. Batch playbooks (detailed checklists)
Each batch below is the detailed script for the matching row in §1. Apply §6 throughout.
B0 — Baseline, tooling & coverage recon
Run FIRST in every cycle. This is where the plan re-syncs with reality — new pages, routes, endpoints, or features added since the last cycle get discovered and folded in before testing, so coverage never silently rots.
Tooling baseline
npm run ci— record any failing server/client test or build error as a finding (S1/S2).npm run check— server syntax + build clean.- App boots via
npm run devand productionnpm start; note startup warnings. - Load the app; browser console + server logs clean on first load and first navigation.
- Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.
Coverage recon — enumerate the actual product and diff it against this plan. Run these, then compare the output to the batch playbooks (§7) and the route map:
- Client routes —
grep -nE "<Route" client/App.jsx— every path present here must appear in a batch playbook and Appendix C. - Pages —
ls client/pages/— every page has an owning batch. - Sidebar / nav entries —
grep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx— new nav links (incl. conditional ones likesimplefinOnly) are covered. - API route mounts —
grep -nE "app.use\('/api" server.js— every mounted route group is in B13's list and mapped in Appendix C. - Services & components —
ls services/andls client/components/**/— new service/component families have a home in a playbook. - UI primitives —
ls client/components/ui/— every shared primitive is covered by the B-UI playbook; a new primitive gets a row there. - Interactive-control census (makes "every button tested" provable) — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: Appendix E). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory:
grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/componentsandgrep -rn "onClick=" client/pages/<Page>.jsx. - Feature flags / conditional surfaces — search for
Only,enabled,featureFlag, env gates that hide/show pages; ensure each state is tested. - What changed since last cycle — skim
git log/HISTORY.mdsince the previous cycle's commit (see Cycle Log) for new features/pages.
Update the plan (do this now, not later) — for anything the recon surfaced that isn't already covered:
- Add it to the relevant batch playbook (or create a new batch and a row in the §1 table).
- Add/adjust its entry in Appendix C.
- Note the plan update in the Cycle Log row for this cycle.
- If a whole surface is missing from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.
B-UI — Design-system primitives
Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once. Drive them wherever they're already mounted (or a scratch page); run each against the per-control state matrix × light/dark × keyboard-only. One finding row per primitive.
Primitive (client/components/ui/) |
Must verify |
|---|---|
button.jsx |
every variant (default/destructive/outline/ghost/link) + size; disabled truly blocks click; loading state; focus ring; Enter/Space activate |
input.jsx |
text/number/password/date/search/file types; placeholder; disabled/read-only; error styling; paste/autofill; number-input rules above |
select.jsx (Radix) |
opens by mouse and keyboard; type-ahead; long lists scroll; onChange fires in Firefox+Safari; disabled options; value persists; Esc closes |
checkbox.jsx / switch.jsx |
toggles by click and Space; indeterminate (if used); disabled; label click toggles; controlled value round-trips |
dialog.jsx / alert-dialog.jsx / confirm-dialog.jsx / input-dialog.jsx |
open/close; focus trap + restore; Esc closes; overlay click behaves; Cancel actually cancels (no side effect); Confirm fires once; scroll-lock releases |
dropdown-menu.jsx |
keyboard arrow nav; Esc; submenu; disabled items; click-outside closes; no clipping at viewport edge |
tabs.jsx |
arrow-key nav; active state; content swaps; deep-link/refresh keeps tab (if applicable) |
tooltip.jsx |
hover and keyboard-focus show it; dismiss on blur; touch behavior; not a11y-only info trap |
table.jsx |
header/zebra/hover; horizontal scroll on narrow viewport (no page h-scroll); empty state |
collapsible.jsx |
expand/collapse animation; state persists; keyboard operable |
sonner.jsx (toast) |
success/error/loading; stack + dismiss; auto-dismiss timing; doesn't cover primary actions; announced to SR |
save-status.jsx |
idle/saving/saved/error transitions reflect real autosave (useAutoSave.test.jsx) |
Skeleton.jsx |
matches final layout (no jump); no infinite skeleton on error |
badge.jsx / card.jsx / separator.jsx / label.jsx |
contrast in dark mode; label htmlFor focuses its control; no overflow on long text |
theme-toggle.jsx |
light↔dark↔system; applied before first paint (no flash); persists across reload |
- Every primitive above passes its row in light and dark, keyboard-only, at mobile width.
- Axe scan (see B14) on a page densely using primitives → zero critical violations.
B1 — Auth & authorization
- Password: valid login → correct landing (Tracker for
user,/adminfor default admin); wrong password → clear error, no user-enumeration timing/message difference; logout clears session; expired session redirects and preservesstate.from; session persists across refresh. - Rate limiting: repeated failed logins throttled (
loginLimiter/loginUsernameLimiter), clear message, resets. - TOTP: enroll (QR + secret), code accepted, backup codes work once, login prompts for TOTP, wrong code rejected+throttled, disable requires re-auth.
- WebAuthn: register/login/remove passkey in Chrome, Firefox, Safari; password fallback works.
- OIDC/Authentik: SSO flow creates/links account; admin config errors surface cleanly;
oidcLimiterthrottles. - Roles/guards:
userblocked from/admin*,/status(redirect) and admin APIs (403); default admin forced to/admin; single-user bypass correct but admin surfaces still protected; unauth API → 401. - Data isolation (critical): user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
- CSRF: state-changing request without a valid token → rejected.
B2 — Tracker (/)
- Month nav (prev/next/jump), current month highlighted, data reloads per month.
- Bills land in correct
1–14/15–31bucket by due date; pin-due sorting works. - Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
- Skip excludes from totals for that month only; unskip restores.
- Per-month amount override persists, doesn't affect base bill or other months.
- Notes cell add/edit/clear persists per month.
- Inactive/date-range bill doesn't show or count outside its range.
- Balance/starting-amount cards period-aware + editable; income − bills / safe-to-spend correct.
- Overdue command center: accurate list/count, pay/skip actions work.
- Cash flow card, drift insight, payment ledger (add/edit/delete reconciles), autopay suggestion apply/dismiss.
- Editable cells autosave; Esc cancels; invalid input handled. Mobile rows equal desktop actions. Compact mode intact.
B3 — Bills (/bills)
- Create with all fields (name, amount, due date, category, schedule, account, autopay, active range).
- Edit propagates to Tracker/Summary/Calendar/Analytics; delete confirms + handles orphan payments/history.
- Custom schedules (weekly/biweekly/monthly/quarterly/annual/custom): next-due & occurrences correct across month/year boundaries.
- Drag reorder persists (cross-check
billReorder.test.js); search/filter panel filters + clears; large-list virtualization smooth. - Merchant rules: create/matches/edit/delete; historical import dialog attributes month-crossing payments correctly.
- BillModal open/close, validation, cancel discards unsaved changes.
B4 — Subscriptions & Categories
- Subscriptions: add/edit/delete, active/cancelled, renewal & annual→monthly normalization; totals feed Tracker/Summary/Analytics.
- Catalog: browse/search, add-from-catalog pre-fills.
- Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (
categoryGroups/categoryReordertests); colors/icons consistent on Tracker/Spending/Analytics.
B5 — Reporting reconciliation
- Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
- Calendar plots bills/payments on correct days (timezone: a bill due on the 1st must not render on the 31st); day totals correct.
- Analytics charts render with data AND empty (no broken SVG/
NaNaxes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK. - Health indicators compute from real data, no crash on empty; recommendations sane.
B6 — Spending (/spending)
- Category-group view assigned/spent/available math correct; 3-month averages correct.
- Cover-overspending reallocates funds correctly and is reversible.
- Safe-to-spend matches Tracker (
safeToSpend.test.js); month nav; empty/partial months handled.
B7 — Debt planning (/snowball, /payoff)
- Add debts (balance/APR/min); snowball vs avalanche ordering correct.
- Projection + amortization vs a hand-calculated example; APR=0 and already-paid debts correct.
- Extra-payment/budget updates payoff date + total interest; chart renders; plan history saves/restores; status banner accurate.
- Edge: single debt, many debts,
$0debt, negative/absurd inputs rejected.
B8 — Banking (/bank-transactions)
- Ledger loads/virtualizes/filters (date/account/amount/merchant/status).
- Transaction matching (match/unmatch), auto-match review approve/reject, no double-match (
transactionMatchService.test.js). - Merchant/store matching rules + confidence/duplicates; advisory non-bill filter flags/hides with override.
- Matched payments reflect on Tracker/ledger without double-counting; category picker persists.
B9 — Data lifecycle (/data)
- Imports: spreadsheet (XLSX/CSV) map/preview/commit, malformed rejected, dup/partial handled; transaction CSV (
csvTransactionImportService.test.js) dedupe + parsing; SQLite user import version-checked + confirms overwrite; seed demo data safe; import history lists + rollback. - Exports: download SQLite round-trips (export → fresh account → import → matches); Excel export opens uncorrupted; ICS calendar feed valid in a client AND properly token-gated (route mounts before auth — verify not open).
- Backups: manual + scheduled restorable on a scratch instance; permissions not world-readable; old backups pruned (
backupAndCleanup.test.js).
B10 — Notifications & workers
- Each channel (email/SMTP, ntfy, Gotify, Discord, Telegram): test message delivers; bad token/URL → clear error, logged, no secret leak.
- Reminders fire at configured lead time for upcoming/overdue; no duplicates; paid/skipped excluded; respects per-user prefs.
- Workers:
dailyWorker,bankSyncWorker(interval + guardrails),backupSchedulerrun on schedule; errors caught/logged, don't crash server, next run unblocked.
B11 — Admin panel (/admin)
- Onboarding wizard completes without a broken state.
- Users table: add/edit-role/reset-pw/disable/delete; cannot remove the last admin.
- Login mode switch single↔multi verified live, no lockout; auth-methods enable/disable + bad config surfaced.
- Email notif config + test send; bank sync admin (configure/manual/auto/status/revoke).
- Backups create/list/download/restore/delete; cleanup panel previews impact + confirms (counts match
backupAndCleanup.test.js). - Privacy admin edits reflect on public
/privacy; system status metrics/versions/jobs accurate (statusService.test.js); admin actions rate-limited + audited (auditService— spot-check log).
B12 — Settings, Profile & global UI
- Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
- Profile: change password (current required, invalidates sessions), manage 2FA/passkeys, sessions revoke (
profileRoute.test.js). - Static: About (public + admin, version shown), Privacy, Release Notes (dialog once per
user, dismiss persists), Roadmap (admin), NotFound friendly + way home. - Global: command palette (
Ctrl+K) search/keyboard/Esc, hidden for default admin; sidebar collapse/expand + mobile overlay (check overflow issue indocs/UI_IMPROVEMENTS.md); toasts stack/dismiss; page transitions no flash/double-fetch; theme applied before first paint.
B13 — API / backend direct
Route groups: auth, auth/oidc, admin, tracker, bills, subscriptions, payments, data-sources, transactions, matches, categories, settings, user, calendar, summary, monthly-starting-amounts, analytics, spending, snowball, notifications, status, about, about-admin, privacy, version, profile, export, import/imports.
- Auth: unauth → 401, wrong role → 403, right role → 200.
- CSRF: state-changing without valid token rejected; with token succeeds (
middleware/csrf.js). - Validation: bad/missing body → structured 4xx (
middleware/errorFormatter.js,utils/apiError.js), never a raw 500 stack. - IDOR/isolation: other user's resource by id → 403/404, no leak.
- Rate limits: login/admin/export/import/OIDC limiters trigger + reset (
middleware/rateLimiter.js). - Money in integer cents end-to-end (per
docs/cents-migration-plan.md); API and DB agree; no float drift. - Idempotency: repeated create doesn't duplicate; concurrent edits resolve sanely.
- Consistent error JSON + correct status codes; security headers present (
middleware/securityHeaders.js); public routes (about/privacy/version/calendar feed) leak nothing sensitive.
B14 — Non-functional
- a11y (manual): keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix
aria-*); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only. - a11y (automated): run axe-core on every page (
@axe-core/playwright, orjest-axefor component-level) — zero critical/serious violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once. - Visual regression: capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright
toHaveScreenshot); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed. - Performance: initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query
staleTime). - PWA/offline: installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
- Security spot-checks: XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom
MarkdownTextrenderer — https-only link hrefs, nodangerouslySetInnerHTMLanywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookiesHttpOnly/Secure/SameSite;encryptionServiceprotects at-rest secrets, keys not committed. (Depth:SECURITY_AUDIT.md.) - Resilience: kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
- Fault injection (systematic): with a request-interception harness (Playwright
page.route, or DevTools network overrides), force each page's API calls to 401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON and confirm the UI shows a recoverable error (toast orErrorBoundaryfallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently. - Timezone/locale: non-UTC tz + DST boundary — due dates and calendar stay correct.
B15 — Regression & sign-off
Run on the production build (npm start), not dev:
npm run cigreen. Log in asuserandadmin.npm run test:e2egreen (Playwright smoke + axe + visual-regression baselines match, §4.4).- Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
- Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
- Snowball: add debt → projection. Data: seed → export → import round-trip (scratch DB).
- Admin: open panel, users, system status, run a backup. Banking loads + matches (if SimpleFIN configured).
- Notifications: one test message on configured channel. Toggle dark mode; mobile viewport;
Ctrl+Knavigates. - Bogus URL → 404; logout → login redirect. Console clean throughout.
- Confirm exit criteria.
8. Appendices
Appendix A — Severity definitions
| Level | Definition |
|---|---|
| S1 – Critical | Data loss/corruption, security hole, crash/blank page, wrong money math, cannot log in/save. |
| S2 – Major | Feature broken/unusable, wrong results, broken navigation, unhandled error. |
| S3 – Minor | Works but wrong edge behavior, confusing UX, missing validation message. |
| S4 – Cosmetic | Visual/copy/alignment/dark-mode-contrast, non-blocking. |
| IMP – Improvement | Not a bug; enhancement or polish idea. |
Appendix B — Exit / sign-off criteria
A cycle is release-ready when:
- All batches B0–B15 ✅ on the primary matrix (Chrome desktop + mobile, light + dark,
user+admin). - B15 smoke green on the production build.
- Zero open S1/S2 in the Findings Log; S3/S4/IMP triaged.
npm run cigreen; no new console errors.- Data export→import round-trip verified with no loss.
- Auth/authorization + data-isolation all pass.
- Money and date/period correctness verified vs hand-calculated examples.
- All fixes for the cycle archived to
HISTORY.md; cycle summary recorded (date, build/commit, environment).
Appendix C — Page ↔ route ↔ API quick map
| Page | Route | Primary API |
|---|---|---|
| Tracker | / |
/api/tracker, /api/bills, /api/payments, /api/monthly-starting-amounts |
| Calendar | /calendar |
/api/calendar |
| Summary | /summary |
/api/summary |
| Bills | /bills |
/api/bills, /api/categories, /api/matches |
| Subscriptions / Catalog | /subscriptions, /subscriptions/catalog |
/api/subscriptions |
| Categories | /categories |
/api/categories |
| Health | /health |
/api/analytics, /api/summary |
| Analytics | /analytics |
/api/analytics |
| Spending | /spending |
/api/spending |
| Banking | /bank-transactions |
/api/transactions, /api/matches, /api/data-sources |
| Snowball / Payoff | /snowball, /payoff |
/api/snowball |
| Settings | /settings |
/api/settings, /api/notifications |
| Profile | /profile |
/api/profile, /api/user |
| Data | /data |
/api/import, /api/export, /api/data-sources |
| Admin | /admin, /admin/status |
/api/admin, /api/status, /api/about-admin |
| About / Privacy / Release Notes / Roadmap | /about, /privacy, /release-notes, /roadmap |
/api/about, /api/privacy, /api/version |
Appendix D — Reference docs
SECURITY_AUDIT.md (security depth) · docs/UI_IMPROVEMENTS.md (known UI issues) · docs/cents-migration-plan.md (money-as-cents) · docs/SIMPLEFIN_CONSUMER_GUARDRAILS.md (sync limits) · docs/CSRF-SPA-Setup.md, docs/RATE_LIMITING_ENHANCEMENT.md (security middleware) · REVIEW.md, DEVELOPMENT_LOG.md, roadmap.md, FUTURE.md (context/known gaps) · HISTORY.md (changelog / fix archive) · playwright.config.js + e2e/ (automated E2E/visual/a11y harness, §4.4).
Appendix E — Per-page control census
The completeness ledger behind "every button, textbox, slider is right." Fill one table per page during B0 and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx + grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx.
Template (copy per page):
| Control | Type | Expected action | States checked (default/focus/disabled/error/loading) | Keyboard | Result |
|---|---|---|---|---|---|
| e.g. Quick-pay button | button | marks bill paid, updates balance cards, undo available | default ✓ · disabled-while-saving ✓ | Enter ✓ | ✅ / finding id |
| e.g. Amount input | number | per-month override, cents only, no wheel-scroll change | default ✓ · error-on-letters ✓ | Tab/Esc ✓ | ✅ / finding id |
Pages to census (from client/pages/, keep in sync with Appendix C): Tracker, Calendar, Summary, Bills, Subscriptions, SubscriptionCatalog, Categories, Health, Analytics, Spending, Snowball, Payoff, BankTransactions, Data, Settings, Profile, Admin, Status, About, Privacy, ReleaseNotes, Roadmap, Login, NotFound — plus the shared Sidebar/command-palette/header chrome once.