BillTracker/docs/QA_PLAN.md

519 lines
41 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BillTracker — Master QA Plan (living document)
**Version target:** v0.41.x · **Executor:** Claude (active) · **Last updated:** 2026-07-02 (Cycle 1: 14 findings fixed & archived, 0 open — incl. broken "Send test push" + email XSS)
This is a **living, operational** QA document, not a static spec. Claude runs it,
in **batches**, actively hunting for bugs/errors/rough edges, **fixing** them, and
**archiving** each fixed finding to `HISTORY.md`. Update this document whenever a
better approach, a new risk area, or a missed surface is discovered.
> **The prime directive:** don't just confirm the happy path — try to *break*
> the product. Every batch should end with the tree green, the Findings Log
> up to date, and any fixes archived to `HISTORY.md`.
---
## Table of contents
1. [Execution model — find, then fix, then repeat](#0-execution-model--find-then-fix-then-repeat)
2. [Batch plan & progress tracker](#1-batch-plan--progress-tracker)
3. [Active Findings Log](#2-active-findings-log)
4. [Archiving fixed findings to HISTORY.md](#3-archiving-fixed-findings-to-historymd)
5. [Environment & setup](#4-environment--setup)
6. [Test data strategy](#5-test-data-strategy)
7. [Cross-cutting checks (every page)](#6-cross-cutting-checks-every-page)
8. [Batch playbooks (detailed checklists)](#7-batch-playbooks-detailed-checklists)
9. [Appendices](#8-appendices)
---
## 0. Execution model — find, then fix, then repeat
**Separate finding from fixing.** During a QA pass we *hunt and log* — we do **not**
fix as we go (except show-stoppers, see below). Only after the whole plan has run
do we enter a dedicated **fix phase** and fix **every** logged finding. Then we run
the **entire** QA plan again from the top. Repeat until a full pass finds **zero**
errors. Two nested loops:
```
OUTER — QA CYCLE (repeat until a full pass finds zero findings)
┌──────────────────────────────────────────────────────────────────────┐
│ PHASE 1 · FIND Run every batch B0→B15 in find-only mode. │
│ Probe hard, LOG everything to the Findings Log. │
│ Do NOT fix (except show-stoppers). │
│ ↓ │
│ PHASE 2 · FIX QA pass done. Now fix EVERY logged finding — │
│ all of them (S1→IMP). Root-cause, with tests. │
│ ↓ │
│ PHASE 3 · VERIFY Re-run each fix's repro; `npm run ci` green. │
│ ↓ │
│ PHASE 4 · ARCHIVE Move every fixed finding to HISTORY.md (§3). │
│ ↓ │
│ PHASE 5 · RE-RUN Start a new cycle at PHASE 1. If that full pass │
│ logs zero findings → QA is clean, STOP. │
└──────────────────────────────────────────────────────────────────────┘
INNER — per batch during PHASE 1 (find-only)
PICK next ⬜ batch → SET UP (app, data state, role, console open) →
PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
mark batch status in §1 → next batch. (No fixing here.)
```
**Show-stopper exception.** A *show-stopper* is a finding that **blocks continued
QA** — the app won't boot, you can't log in, or a page crashes so hard you can't
test the rest of it. Only these get fixed immediately (mid-pass), because you
can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix;
then continue the find pass. **Everything else is logged and left for Phase 2**
no matter how tempting or trivial.
**Discipline (for best results)**
- **Phase 1 is log-only.** Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
- Keep each find batch tight and focused — one batch per session — so probing stays thorough.
- **Phase 2 fixes everything**, not just S1/S2. Root-cause over surface patch; add/extend a test in `tests/` or `client/**/*.test.*` for every logic bug so it can't silently return.
- Never leave the repo red at the end of Phase 3 — `npm run ci` must be green before archiving.
- Touch product behavior? Run the `/verify` skill on the affected flow before archiving.
- **The exit is empirical:** you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you *think* it's clean. Log the cycle result in the [Cycle Log](#11-qa-cycle-log) each time.
- Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.
---
## 1. Batch plan & progress tracker
Batches are ordered **foundation-first** (baseline & auth before features; features
before cross-cutting; regression last). Update **Status** and **Findings** every run.
**Status key:** ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck
| # | Batch | Primary surface | Data state | Status | Open / Fixed |
|---|-------|-----------------|-----------|--------|--------------|
| B0 | Baseline, tooling & **coverage recon** | `npm run ci`/`check`, app boots, console clean, **re-scan routes/pages/API vs plan & update it**, **control census** | any | 🔄 | 0 / 1 |
| B-UI | **Design-system primitives** | each `client/components/ui/*` × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard | any | 🔄 | 0 / 0 |
| B1 | Auth & authorization | login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation | multi + single user | ⬜ | 0 / 0 |
| B2 | Tracker (core) | `/` buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift | seeded + adversarial | ⬜ | 0 / 0 |
| B3 | Bills & schedules | `/bills` CRUD, custom schedules, reorder, merchant rules, historical import | adversarial | ⬜ | 0 / 0 |
| B4 | Subscriptions & Categories | `/subscriptions`, catalog, `/categories`, groups, reorder | seeded | ⬜ | 0 / 0 |
| B5 | Reporting reconciliation | `/summary`, `/calendar`, `/analytics`, `/health` cross-check totals | seeded + large | 🔄 | 0 / 3 |
| B6 | Spending | `/spending` YNAB view, averages, cover-overspending, safe-to-spend | seeded + edge months | 🔄 | 0 / 1 |
| B7 | Debt planning (math) | `/snowball`, `/payoff` APR/amortization vs hand-calc | edge (APR=0, $0 debt) | 🔄 | 0 / 2 |
| B8 | Banking & bank sync | `/bank-transactions`, SimpleFIN sync, matching, merchant/store, advisory filter | seeded txns | ⬜ | 0 / 0 |
| B9 | Data lifecycle | `/data` import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip | empty + seeded | 🔄 | 0 / 1 |
| B10 | Notifications & workers | email + ntfy/Gotify/Discord/Telegram, reminders, cron workers | seeded | 🔄 | 0 / 1 |
| B11 | Admin panel | users, login mode, auth methods, backups, cleanup, status, onboarding | admin | 🔄 | 0 / 0 |
| B12 | Settings, Profile & global UI | `/settings`, `/profile`, static pages, command palette, sidebar/nav | any | 🔄 | 0 / 0 |
| B13 | API / backend direct | all `/api/*`: auth, CSRF, validation, rate limits, error shape, IDOR, cents | via HTTP client | 🔄 | 0 / 1 |
| B14 | Non-functional | a11y, performance, PWA/offline, XSS/secrets, timezone/DST | large + adversarial | 🔄 | 0 / 4 |
| B15 | Regression & sign-off | full smoke on **production build**, exit criteria | seeded | 🔄 | 0 / 0 |
> After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new
> cycle from B0 against the next build/version.
### 1.1 QA Cycle Log
One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A
cycle is only "clean" when its **find pass logged zero findings**. Keep going
until you get a clean cycle.
| Cycle | Started | Build / commit | Findings logged | Fixed / archived | Result |
|-------|---------|----------------|-----------------|------------------|--------|
| 1 | 2026-07-02 | `bdbf231`→(dev) | 13 | **13 → all fixed, verified & archived** (…, +B10-01 broken "Send test push") | ✅ **0 open.** Post seed-fix reconciliation caught the **occurrence-gating family** — Summary (S2), Analytics, and SimpleFIN bank-tracking all counted non-monthly bills every month; all fixed via `resolveDueDate` and guarded (probe reconciliation + `tests/summaryBankTracking.test.js`). Probed B0/B1/B3/B4/B5/B6/B7/B8/B9/B13/B14; solid: auth/isolation, CSRF, payment/date validation, recurrence, matching/dedup, subscription+spending math, XSS, calendar gating. **A full re-run (B0→B15) is still required to declare the cycle clean per exit criteria.** |
**Result key:** 🔄 in progress · 🔁 findings fixed, re-run required · ✅ clean (zero findings — QA complete)
---
## 2. Active Findings Log
**This is the live log.** Record every finding here the moment it's found — before
fixing. Keep only **Open / Fixing / Fixed** rows here. Once a finding is
**Fixed + verified + archived to `HISTORY.md`**, delete its row from this table
(its permanent record is the changelog entry).
**Finding ID:** `QA-B{batch}-{nn}` (e.g. `QA-B2-01`).
**Severity:** S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see [Appendix A](#appendix-a--severity-definitions)).
**Status:** 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive.
| ID | Sev | Area (`file:line`) | Summary | Status | Notes / repro |
|----|-----|--------------------|---------|--------|---------------|
| _(none — all Cycle 1 findings fixed, verified & archived to `HISTORY.md` v0.41.0)_ | | | | | |
**Finding template** (paste a new row above; keep the full write-up here until archived):
```
ID: QA-B?-??
Severity: S1 / S2 / S3 / S4 / IMP
Environment: browser / viewport / theme / role / auth mode / data state
Area: file:line (if known)
Steps to reproduce:
1.
2.
Expected:
Actual:
Evidence: console / network / DB row / screenshot
Fix: (what changed, commit) — Verified by: (repro re-run + ci)
```
Log console errors, failed network requests, and unhandled rejections as findings
**even if the UI looks fine**.
_All Cycle 1 write-ups have been archived to `HISTORY.md` v0.41.0 (see §3)._
---
## 3. Archiving fixed findings to HISTORY.md
`HISTORY.md` is the project changelog (version-organized, emoji section headers).
When a finding is Fixed **and verified**, write a concise entry there, then remove
the row from the Active Findings Log.
**Where:** under the current in-progress version heading (e.g. `## v0.41.x`). If a
QA cycle produces several fixes, group them under a `### 🐛 QA Fixes` (bug fixes)
or `### 🧹 QA` (polish/improvements) section, matching the existing changelog voice.
**Entry format** (match the terse, specific style already in `HISTORY.md`):
```markdown
### 🐛 QA Fixes
- **[Area] Short title** — What was wrong and the user-visible impact, then the
fix. Reference the file/function and any migration or test added.
(was QA-B7-03)
```
**Rules**
- One bullet per finding; include the old `QA-B?-??` id in parentheses for traceability.
- If a fix added/changed a test, say which (`tests/…` or `client/…test.*`).
- Don't archive until the fix is verified (repro gone + `npm run ci` green).
- IMP items that were implemented are archived the same way; IMP items merely *noted* stay in the Findings Log (or graduate to `FUTURE.md`/`roadmap.md` if deferred).
---
## 4. Environment & setup
### 4.1 Running the app
| Mode | Command | URL |
|------|---------|-----|
| Dev (API + UI, hot reload) | `npm run dev` | UI `http://localhost:5173` (proxies API → `:3000`) |
| API only | `npm run dev:api` | `http://localhost:3000` |
| Production build | `npm run build` then `npm start` | `http://localhost:3000` |
| Docker | `docker-compose up` | per compose config |
- Backend: Node/Express on `PORT` (default `3000`). Frontend dev: Vite on `5173`.
- Data: SQLite at `db/bills.db` (WAL). **Back it up before destructive tests** (`backups/` or a manual copy). Prefer a scratch DB for B9/B11 restore tests.
- Configure a dedicated **test** `.env` from `.env.example`. Never point tests at production data or a live SimpleFIN account with real credentials.
- Test commands: `npm run ci` (check + all tests + build), `npm run check` (syntax + build), `npm run test` (server), `npm run test:client` (vitest).
### 4.2 Test matrix
Full functional pass across reasonable combinations; smoke (B15) across all.
| Dimension | Values |
|-----------|--------|
| Browser | Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser) |
| Viewport | Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414 |
| Theme | Light, Dark, system-follow |
| Role | `user`, `admin`, default admin (first-run) |
| Auth mode | Multi-user, single-user |
| Density | Normal + compact desktop |
| Network | Online, Slow 3G, offline (PWA shell) |
| Data state | Empty, seeded demo, large/stress, adversarial |
### 4.3 Accounts to prepare
- `admin`, `user`, a **second** `user` (data-isolation), a single-user-mode instance (separate DB).
- Demo reference: `guest / guest123` (do not run destructive flows on any shared demo server).
### 4.4 Automated E2E harness (Playwright)
Manual passes prove a button works **once**; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.
| Command | What it does |
|---------|--------------|
| `npm run test:e2e` | run the E2E suite headless (boots the app via `webServer`) |
| `npm run test:e2e:ui` | Playwright UI mode — watch/debug interactively |
| `npm run test:e2e:update` | re-baseline visual-regression screenshots (review the diff before committing) |
| `npm run smoke:prod` | **B15 production-build smoke** — builds, boots `node server.js` (dist/), drives the real artifact so the split vendor chunks are validated at runtime |
- **Setup (one-time):** `npm install` then `npx playwright install chromium`. Config: `playwright.config.js`; specs in `e2e/`.
- **Scope:** the suite is a **thin critical-path smoke**, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
- **Don't** point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.
---
## 5. Test data strategy
- **Empty:** brand-new account. Every page must render a sensible empty state — no crash, no `NaN`, no blank white screen.
- **Seeded:** use **Data → Seed Demo Data** for a realistic mid-size dataset.
- **Large/stress:** 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (`@tanstack/react-virtual`), charts, query perf.
- **Adversarial (deliberately try to break it):**
- Amounts: `0`, `0.01`, negative, `9,999,999.99`, fractional cents.
- Text: emoji, RTL, `<script>` XSS probe, 1,000-char strings, leading/trailing spaces, SQL-ish input.
- Dates: 1st/14th/15th/31st boundaries; 28/29/30/31-day months; Feb 29; month/year crossing; inactive ranges; skipped months; overrides.
- Transactions: duplicate amount+date, same-day merchant repeats, refunds/negatives.
- Debt: APR `0%`, very high APR, `$0` balance, absurd inputs.
- Non-UTC system timezone + a DST boundary date.
---
## 6. Cross-cutting checks (every page)
Run on **every** page during its batch — don't assume a shared component behaves the same everywhere.
**Navigation & routing** — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → `NotFoundPage` · active nav highlighted · `simplefinOnly` (Banking) gated · `Ctrl+K` palette finds & opens it.
**Buttons & interactions** — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · **rapid repeated toggling** (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then **navigate away mid-flight** doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).
**Forms & validation** — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · **paste** into every field (incl. `"$1,234.56"` into currency) · **browser/password-manager autofill** on login & forms · **IME/composition** (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).
**Number inputs (you have ~45 `type="number"` fields — the highest-risk control type)** — scroll-wheel over a focused field must **not** silently change the value · spinner up/down buttons step correctly and respect min/max · reject/`e`/`+`/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no `NaN` submitted · cents fields never accept >2 decimals.
**Per-control state matrix** — for each control on the page, verify every applicable state renders and behaves in **both light and dark**: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).
> **Note — "sliders":** this app has **no `<input type=range>` sliders.** The `SlidersHorizontal` glyph is just the Bills **filter-panel** button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.
**States** — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, `ErrorBoundary` shows a fallback not a white page.
**Visual & responsive** — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.
**Data integrity** — money 2-decimals, no float artifacts (`9.999999`) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).
---
## 7. Batch playbooks (detailed checklists)
Each batch below is the detailed script for the matching row in [§1](#1-batch-plan--progress-tracker). Apply [§6](#6-cross-cutting-checks-every-page) throughout.
### B0 — Baseline, tooling & coverage recon
**Run FIRST in every cycle.** This is where the plan re-syncs with reality — new
pages, routes, endpoints, or features added since the last cycle get discovered
and folded in **before** testing, so coverage never silently rots.
**Tooling baseline**
- [ ] `npm run ci` — record any failing server/client test or build error as a finding (S1/S2).
- [ ] `npm run check` — server syntax + build clean.
- [ ] App boots via `npm run dev` **and** production `npm start`; note startup warnings.
- [ ] Load the app; browser console + server logs clean on first load and first navigation.
- [ ] Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.
**Coverage recon — enumerate the *actual* product and diff it against this plan.**
Run these, then compare the output to the batch playbooks (§7) and the [route map](#appendix-c--page--route--api-quick-map):
- [ ] **Client routes**`grep -nE "<Route" client/App.jsx` — every path present here must appear in a batch playbook and Appendix C.
- [ ] **Pages**`ls client/pages/` — every page has an owning batch.
- [ ] **Sidebar / nav entries**`grep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx` — new nav links (incl. conditional ones like `simplefinOnly`) are covered.
- [ ] **API route mounts**`grep -nE "app.use\('/api" server.js` — every mounted route group is in B13's list and mapped in Appendix C.
- [ ] **Services & components**`ls services/` and `ls client/components/**/` — new service/component families have a home in a playbook.
- [ ] **UI primitives**`ls client/components/ui/` — every shared primitive is covered by the [B-UI](#b-ui--design-system-primitives) playbook; a new primitive gets a row there.
- [ ] **Interactive-control census (makes "every button tested" *provable*)** — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: [Appendix E](#appendix-e--per-page-control-census)). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory: `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/components` and `grep -rn "onClick=" client/pages/<Page>.jsx`.
- [ ] **Feature flags / conditional surfaces** — search for `Only`, `enabled`, `featureFlag`, env gates that hide/show pages; ensure each state is tested.
- [ ] **What changed since last cycle** — skim `git log`/`HISTORY.md` since the previous cycle's commit (see [Cycle Log](#11-qa-cycle-log)) for new features/pages.
**Update the plan (do this now, not later)** — for anything the recon surfaced that isn't already covered:
- [ ] Add it to the relevant batch playbook (or create a new batch and a row in the [§1 table](#1-batch-plan--progress-tracker)).
- [ ] Add/adjust its entry in [Appendix C](#appendix-c--page--route--api-quick-map).
- [ ] Note the plan update in the [Cycle Log](#11-qa-cycle-log) row for this cycle.
- [ ] If a whole surface is *missing* from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.
### B-UI — Design-system primitives
**Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once.** Drive them wherever they're already mounted (or a scratch page); run each against the [per-control state matrix](#6-cross-cutting-checks-every-page) × light/dark × keyboard-only. One finding row per primitive.
| Primitive (`client/components/ui/`) | Must verify |
|---|---|
| `button.jsx` | every variant (default/destructive/outline/ghost/link) + size; **disabled truly blocks click**; loading state; focus ring; Enter/Space activate |
| `input.jsx` | text/number/password/date/search/file types; placeholder; disabled/read-only; error styling; paste/autofill; number-input rules above |
| `select.jsx` (Radix) | opens by mouse **and** keyboard; type-ahead; long lists scroll; onChange fires in **Firefox+Safari**; disabled options; value persists; Esc closes |
| `checkbox.jsx` / `switch.jsx` | toggles by click **and** Space; indeterminate (if used); disabled; label click toggles; controlled value round-trips |
| `dialog.jsx` / `alert-dialog.jsx` / `confirm-dialog.jsx` / `input-dialog.jsx` | open/close; **focus trap + restore**; Esc closes; overlay click behaves; **Cancel actually cancels (no side effect)**; Confirm fires once; scroll-lock releases |
| `dropdown-menu.jsx` | keyboard arrow nav; Esc; submenu; disabled items; click-outside closes; no clipping at viewport edge |
| `tabs.jsx` | arrow-key nav; active state; content swaps; deep-link/refresh keeps tab (if applicable) |
| `tooltip.jsx` | hover **and** keyboard-focus show it; dismiss on blur; touch behavior; not a11y-only info trap |
| `table.jsx` | header/zebra/hover; horizontal scroll on narrow viewport (no page h-scroll); empty state |
| `collapsible.jsx` | expand/collapse animation; state persists; keyboard operable |
| `sonner.jsx` (toast) | success/error/loading; **stack + dismiss**; auto-dismiss timing; doesn't cover primary actions; announced to SR |
| `save-status.jsx` | idle/saving/saved/error transitions reflect real autosave (`useAutoSave.test.jsx`) |
| `Skeleton.jsx` | matches final layout (no jump); no infinite skeleton on error |
| `badge.jsx` / `card.jsx` / `separator.jsx` / `label.jsx` | contrast in dark mode; label `htmlFor` focuses its control; no overflow on long text |
| `theme-toggle.jsx` | light↔dark↔system; applied **before first paint** (no flash); persists across reload |
- [ ] Every primitive above passes its row in light **and** dark, keyboard-only, at mobile width.
- [ ] Axe scan (see B14) on a page densely using primitives → zero critical violations.
### B1 — Auth & authorization
- [ ] **Password:** valid login → correct landing (Tracker for `user`, `/admin` for default admin); wrong password → clear error, no user-enumeration timing/message difference; logout clears session; expired session redirects and preserves `state.from`; session persists across refresh.
- [ ] **Rate limiting:** repeated failed logins throttled (`loginLimiter`/`loginUsernameLimiter`), clear message, resets.
- [ ] **TOTP:** enroll (QR + secret), code accepted, backup codes work once, login prompts for TOTP, wrong code rejected+throttled, disable requires re-auth.
- [ ] **WebAuthn:** register/login/remove passkey in Chrome, Firefox, Safari; password fallback works.
- [ ] **OIDC/Authentik:** SSO flow creates/links account; admin config errors surface cleanly; `oidcLimiter` throttles.
- [ ] **Roles/guards:** `user` blocked from `/admin*`, `/status` (redirect) and admin APIs (403); default admin forced to `/admin`; single-user bypass correct but admin surfaces still protected; unauth API → 401.
- [ ] **Data isolation (critical):** user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
- [ ] **CSRF:** state-changing request without a valid token → rejected.
### B2 — Tracker (`/`)
- [ ] Month nav (prev/next/jump), current month highlighted, data reloads per month.
- [ ] Bills land in correct `114` / `1531` bucket by due date; pin-due sorting works.
- [ ] Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
- [ ] Skip excludes from totals for that month only; unskip restores.
- [ ] Per-month amount override persists, doesn't affect base bill or other months.
- [ ] Notes cell add/edit/clear persists per month.
- [ ] Inactive/date-range bill doesn't show or count outside its range.
- [ ] Balance/starting-amount cards period-aware + editable; income bills / safe-to-spend correct.
- [ ] Overdue command center: accurate list/count, pay/skip actions work.
- [ ] Cash flow card, drift insight, payment ledger (add/edit/delete reconciles), autopay suggestion apply/dismiss.
- [ ] Editable cells autosave; Esc cancels; invalid input handled. Mobile rows equal desktop actions. Compact mode intact.
### B3 — Bills (`/bills`)
- [ ] Create with all fields (name, amount, due date, category, schedule, account, autopay, active range).
- [ ] Edit propagates to Tracker/Summary/Calendar/Analytics; delete confirms + handles orphan payments/history.
- [ ] Custom schedules (weekly/biweekly/monthly/quarterly/annual/custom): next-due & occurrences correct across month/year boundaries.
- [ ] Drag reorder persists (cross-check `billReorder.test.js`); search/filter panel filters + clears; large-list virtualization smooth.
- [ ] Merchant rules: create/matches/edit/delete; historical import dialog attributes month-crossing payments correctly.
- [ ] BillModal open/close, validation, cancel discards unsaved changes.
### B4 — Subscriptions & Categories
- [ ] Subscriptions: add/edit/delete, active/cancelled, renewal & annual→monthly normalization; totals feed Tracker/Summary/Analytics.
- [ ] Catalog: browse/search, add-from-catalog pre-fills.
- [ ] Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (`categoryGroups`/`categoryReorder` tests); colors/icons consistent on Tracker/Spending/Analytics.
### B5 — Reporting reconciliation
- [ ] Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
- [ ] Calendar plots bills/payments on correct days (**timezone**: a bill due on the 1st must not render on the 31st); day totals correct.
- [ ] Analytics charts render with data AND empty (no broken SVG/`NaN` axes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK.
- [ ] Health indicators compute from real data, no crash on empty; recommendations sane.
### B6 — Spending (`/spending`)
- [ ] Category-group view assigned/spent/available math correct; 3-month averages correct.
- [ ] Cover-overspending reallocates funds correctly and is reversible.
- [ ] Safe-to-spend matches Tracker (`safeToSpend.test.js`); month nav; empty/partial months handled.
### B7 — Debt planning (`/snowball`, `/payoff`)
- [ ] Add debts (balance/APR/min); snowball vs avalanche ordering correct.
- [ ] Projection + amortization vs a **hand-calculated** example; APR=0 and already-paid debts correct.
- [ ] Extra-payment/budget updates payoff date + total interest; chart renders; plan history saves/restores; status banner accurate.
- [ ] Edge: single debt, many debts, `$0` debt, negative/absurd inputs rejected.
### B8 — Banking (`/bank-transactions`)
- [ ] Ledger loads/virtualizes/filters (date/account/amount/merchant/status).
- [ ] Transaction matching (match/unmatch), auto-match review approve/reject, no double-match (`transactionMatchService.test.js`).
- [ ] Merchant/store matching rules + confidence/duplicates; advisory non-bill filter flags/hides with override.
- [ ] Matched payments reflect on Tracker/ledger without double-counting; category picker persists.
### B9 — Data lifecycle (`/data`)
- [ ] Imports: spreadsheet (XLSX/CSV) map/preview/commit, malformed rejected, dup/partial handled; transaction CSV (`csvTransactionImportService.test.js`) dedupe + parsing; SQLite user import version-checked + confirms overwrite; seed demo data safe; import history lists + rollback.
- [ ] Exports: download SQLite **round-trips** (export → fresh account → import → matches); Excel export opens uncorrupted; ICS calendar feed valid in a client AND properly **token-gated** (route mounts before auth — verify not open).
- [ ] Backups: manual + scheduled restorable on a scratch instance; permissions not world-readable; old backups pruned (`backupAndCleanup.test.js`).
### B10 — Notifications & workers
- [ ] Each channel (email/SMTP, ntfy, Gotify, Discord, Telegram): test message delivers; bad token/URL → clear error, logged, no secret leak.
- [ ] Reminders fire at configured lead time for upcoming/overdue; no duplicates; paid/skipped excluded; respects per-user prefs.
- [ ] Workers: `dailyWorker`, `bankSyncWorker` (interval + guardrails), `backupScheduler` run on schedule; errors caught/logged, don't crash server, next run unblocked.
### B11 — Admin panel (`/admin`)
- [ ] Onboarding wizard completes without a broken state.
- [ ] Users table: add/edit-role/reset-pw/disable/delete; **cannot remove the last admin**.
- [ ] Login mode switch single↔multi verified live, no lockout; auth-methods enable/disable + bad config surfaced.
- [ ] Email notif config + test send; bank sync admin (configure/manual/auto/status/revoke).
- [ ] Backups create/list/download/restore/delete; cleanup panel previews impact + confirms (counts match `backupAndCleanup.test.js`).
- [ ] Privacy admin edits reflect on public `/privacy`; system status metrics/versions/jobs accurate (`statusService.test.js`); admin actions rate-limited + audited (`auditService` — spot-check log).
### B12 — Settings, Profile & global UI
- [ ] Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
- [ ] Profile: change password (current required, invalidates sessions), manage 2FA/passkeys, sessions revoke (`profileRoute.test.js`).
- [ ] Static: About (public + admin, version shown), Privacy, Release Notes (dialog once per `user`, dismiss persists), Roadmap (admin), NotFound friendly + way home.
- [ ] Global: command palette (`Ctrl+K`) search/keyboard/Esc, hidden for default admin; sidebar collapse/expand + mobile overlay (check overflow issue in `docs/UI_IMPROVEMENTS.md`); toasts stack/dismiss; page transitions no flash/double-fetch; theme applied before first paint.
### B13 — API / backend direct
Route groups: `auth`, `auth/oidc`, `admin`, `tracker`, `bills`, `subscriptions`, `payments`, `data-sources`, `transactions`, `matches`, `categories`, `settings`, `user`, `calendar`, `summary`, `monthly-starting-amounts`, `analytics`, `spending`, `snowball`, `notifications`, `status`, `about`, `about-admin`, `privacy`, `version`, `profile`, `export`, `import`/`imports`.
- [ ] Auth: unauth → 401, wrong role → 403, right role → 200.
- [ ] CSRF: state-changing without valid token rejected; with token succeeds (`middleware/csrf.js`).
- [ ] Validation: bad/missing body → structured 4xx (`middleware/errorFormatter.js`, `utils/apiError.js`), never a raw 500 stack.
- [ ] IDOR/isolation: other user's resource by id → 403/404, no leak.
- [ ] Rate limits: login/admin/export/import/OIDC limiters trigger + reset (`middleware/rateLimiter.js`).
- [ ] Money in **integer cents** end-to-end (per `docs/cents-migration-plan.md`); API and DB agree; no float drift.
- [ ] Idempotency: repeated create doesn't duplicate; concurrent edits resolve sanely.
- [ ] Consistent error JSON + correct status codes; security headers present (`middleware/securityHeaders.js`); public routes (`about`/`privacy`/`version`/calendar feed) leak nothing sensitive.
### B14 — Non-functional
- [ ] **a11y (manual):** keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix `aria-*`); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only.
- [ ] **a11y (automated):** run **axe-core** on every page (`@axe-core/playwright`, or `jest-axe` for component-level) — **zero critical/serious** violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once.
- [ ] **Visual regression:** capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright `toHaveScreenshot`); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed.
- [ ] **Performance:** initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query `staleTime`).
- [ ] **PWA/offline:** installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
- [ ] **Security spot-checks:** XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom `MarkdownText` renderer — https-only link hrefs, **no** `dangerouslySetInnerHTML` anywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookies `HttpOnly`/`Secure`/`SameSite`; `encryptionService` protects at-rest secrets, keys not committed. (Depth: `SECURITY_AUDIT.md`.)
- [ ] **Resilience:** kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
- [ ] **Fault injection (systematic):** with a request-interception harness (Playwright `page.route`, or DevTools network overrides), force each page's API calls to **401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON** and confirm the UI shows a recoverable error (toast or `ErrorBoundary` fallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently.
- [ ] **Timezone/locale:** non-UTC tz + DST boundary — due dates and calendar stay correct.
### B15 — Regression & sign-off
Run on the **production build** (`npm start`), not dev:
- [ ] `npm run ci` green. Log in as `user` and `admin`.
- [ ] `npm run test:e2e` green (Playwright smoke + axe + visual-regression baselines match, §4.4).
- [ ] Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
- [ ] Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
- [ ] Snowball: add debt → projection. Data: seed → export → import round-trip (scratch DB).
- [ ] Admin: open panel, users, system status, run a backup. Banking loads + matches (if SimpleFIN configured).
- [ ] Notifications: one test message on configured channel. Toggle dark mode; mobile viewport; `Ctrl+K` navigates.
- [ ] Bogus URL → 404; logout → login redirect. Console clean throughout.
- [ ] Confirm [exit criteria](#appendix-b--exit--sign-off-criteria).
---
## 8. Appendices
### Appendix A — Severity definitions
| Level | Definition |
|-------|------------|
| **S1 Critical** | Data loss/corruption, security hole, crash/blank page, wrong money math, cannot log in/save. |
| **S2 Major** | Feature broken/unusable, wrong results, broken navigation, unhandled error. |
| **S3 Minor** | Works but wrong edge behavior, confusing UX, missing validation message. |
| **S4 Cosmetic** | Visual/copy/alignment/dark-mode-contrast, non-blocking. |
| **IMP Improvement** | Not a bug; enhancement or polish idea. |
### Appendix B — Exit / sign-off criteria
A cycle is release-ready when:
- [ ] All batches B0B15 ✅ on the primary matrix (Chrome desktop + mobile, light + dark, `user` + `admin`).
- [ ] B15 smoke green on the **production build**.
- [ ] **Zero open S1/S2** in the Findings Log; S3/S4/IMP triaged.
- [ ] `npm run ci` green; no new console errors.
- [ ] Data export→import round-trip verified with no loss.
- [ ] Auth/authorization + data-isolation all pass.
- [ ] Money and date/period correctness verified vs hand-calculated examples.
- [ ] All fixes for the cycle archived to `HISTORY.md`; cycle summary recorded (date, build/commit, environment).
### Appendix C — Page ↔ route ↔ API quick map
| Page | Route | Primary API |
|------|-------|-------------|
| Tracker | `/` | `/api/tracker`, `/api/bills`, `/api/payments`, `/api/monthly-starting-amounts` |
| Calendar | `/calendar` | `/api/calendar` |
| Summary | `/summary` | `/api/summary` |
| Bills | `/bills` | `/api/bills`, `/api/categories`, `/api/matches` |
| Subscriptions / Catalog | `/subscriptions`, `/subscriptions/catalog` | `/api/subscriptions` |
| Categories | `/categories` | `/api/categories` |
| Health | `/health` | `/api/analytics`, `/api/summary` |
| Analytics | `/analytics` | `/api/analytics` |
| Spending | `/spending` | `/api/spending` |
| Banking | `/bank-transactions` | `/api/transactions`, `/api/matches`, `/api/data-sources` |
| Snowball / Payoff | `/snowball`, `/payoff` | `/api/snowball` |
| Settings | `/settings` | `/api/settings`, `/api/notifications` |
| Profile | `/profile` | `/api/profile`, `/api/user` |
| Data | `/data` | `/api/import`, `/api/export`, `/api/data-sources` |
| Admin | `/admin`, `/admin/status` | `/api/admin`, `/api/status`, `/api/about-admin` |
| About / Privacy / Release Notes / Roadmap | `/about`, `/privacy`, `/release-notes`, `/roadmap` | `/api/about`, `/api/privacy`, `/api/version` |
### Appendix D — Reference docs
`SECURITY_AUDIT.md` (security depth) · `docs/UI_IMPROVEMENTS.md` (known UI issues) · `docs/cents-migration-plan.md` (money-as-cents) · `docs/SIMPLEFIN_CONSUMER_GUARDRAILS.md` (sync limits) · `docs/CSRF-SPA-Setup.md`, `docs/RATE_LIMITING_ENHANCEMENT.md` (security middleware) · `REVIEW.md`, `DEVELOPMENT_LOG.md`, `roadmap.md`, `FUTURE.md` (context/known gaps) · `HISTORY.md` (changelog / fix archive) · `playwright.config.js` + `e2e/` (automated E2E/visual/a11y harness, §4.4).
### Appendix E — Per-page control census
The completeness ledger behind "every button, textbox, slider is right." Fill one table **per page** during [B0](#b0--baseline-tooling--coverage-recon) and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx` + `grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx`.
**Template** (copy per page):
| Control | Type | Expected action | States checked (default/focus/disabled/error/loading) | Keyboard | Result |
|---------|------|-----------------|-------------------------------------------------------|----------|--------|
| *e.g.* Quick-pay button | button | marks bill paid, updates balance cards, undo available | default ✓ · disabled-while-saving ✓ | Enter ✓ | ✅ / finding id |
| *e.g.* Amount input | number | per-month override, cents only, no wheel-scroll change | default ✓ · error-on-letters ✓ | Tab/Esc ✓ | ✅ / finding id |
**Pages to census** (from `client/pages/`, keep in sync with [Appendix C](#appendix-c--page--route--api-quick-map)): Tracker, Calendar, Summary, Bills, Subscriptions, SubscriptionCatalog, Categories, Health, Analytics, Spending, Snowball, Payoff, BankTransactions, Data, Settings, Profile, Admin, Status, About, Privacy, ReleaseNotes, Roadmap, Login, NotFound — plus the shared **Sidebar/command-palette/header** chrome once.
</content>