BillTracker/docs/QA_PLAN.md

# BillTracker — Master QA Plan (living document)

**Version target:** v0.41.x · **Executor:** Claude (active) · **Last updated:** 2026-07-02 (Cycle 1: 14 findings fixed & archived, 0 open — incl. broken "Send test push" + email XSS)

This is a **living, operational** QA document, not a static spec. Claude runs it,
in **batches**, actively hunting for bugs/errors/rough edges, **fixing** them, and
**archiving** each fixed finding to `HISTORY.md`. Update this document whenever a
better approach, a new risk area, or a missed surface is discovered.

> **The prime directive:** don't just confirm the happy path — try to *break*
> the product. Every batch should end with the tree green, the Findings Log
> up to date, and any fixes archived to `HISTORY.md`.

---

## Table of contents

1. [Execution model — find, then fix, then repeat](#0-execution-model--find-then-fix-then-repeat)
2. [Batch plan & progress tracker](#1-batch-plan--progress-tracker)
3. [Active Findings Log](#2-active-findings-log)
4. [Archiving fixed findings to HISTORY.md](#3-archiving-fixed-findings-to-historymd)
5. [Environment & setup](#4-environment--setup)
6. [Test data strategy](#5-test-data-strategy)
7. [Cross-cutting checks (every page)](#6-cross-cutting-checks-every-page)
8. [Batch playbooks (detailed checklists)](#7-batch-playbooks-detailed-checklists)
9. [Appendices](#8-appendices)

---

## 0. Execution model — find, then fix, then repeat

**Separate finding from fixing.** During a QA pass we *hunt and log* — we do **not**
fix as we go (except show-stoppers, see below). Only after the whole plan has run
do we enter a dedicated **fix phase** and fix **every** logged finding. Then we run
the **entire** QA plan again from the top. Repeat until a full pass finds **zero**
errors. Two nested loops:

```
  OUTER — QA CYCLE (repeat until a full pass finds zero findings)
  ┌──────────────────────────────────────────────────────────────────────┐
  │  PHASE 1 · FIND      Run every batch B0→B15 in find-only mode.        │
  │                      Probe hard, LOG everything to the Findings Log.  │
  │                      Do NOT fix (except show-stoppers).               │
  │            ↓                                                          │
  │  PHASE 2 · FIX       QA pass done. Now fix EVERY logged finding —     │
  │                      all of them (S1→IMP). Root-cause, with tests.    │
  │            ↓                                                          │
  │  PHASE 3 · VERIFY    Re-run each fix's repro; `npm run ci` green.     │
  │            ↓                                                          │
  │  PHASE 4 · ARCHIVE   Move every fixed finding to HISTORY.md (§3).     │
  │            ↓                                                          │
  │  PHASE 5 · RE-RUN    Start a new cycle at PHASE 1. If that full pass  │
  │                      logs zero findings → QA is clean, STOP.          │
  └──────────────────────────────────────────────────────────────────────┘

  INNER — per batch during PHASE 1 (find-only)
  PICK next ⬜ batch → SET UP (app, data state, role, console open) →
  PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
  mark batch status in §1 → next batch.  (No fixing here.)
```

**Show-stopper exception.** A *show-stopper* is a finding that **blocks continued
QA** — the app won't boot, you can't log in, or a page crashes so hard you can't
test the rest of it. Only these get fixed immediately (mid-pass), because you
can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix;
then continue the find pass. **Everything else is logged and left for Phase 2** —
no matter how tempting or trivial.

**Discipline (for best results)**
- **Phase 1 is log-only.** Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
- Keep each find batch tight and focused — one batch per session — so probing stays thorough.
- **Phase 2 fixes everything**, not just S1/S2. Root-cause over surface patch; add/extend a test in `tests/` or `client/**/*.test.*` for every logic bug so it can't silently return.
- Never leave the repo red at the end of Phase 3 — `npm run ci` must be green before archiving.
- Touch product behavior? Run the `/verify` skill on the affected flow before archiving.
- **The exit is empirical:** you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you *think* it's clean. Log the cycle result in the [Cycle Log](#11-qa-cycle-log) each time.
- Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.

---

## 1. Batch plan & progress tracker

Batches are ordered **foundation-first** (baseline & auth before features; features
before cross-cutting; regression last). Update **Status** and **Findings** every run.

**Status key:** ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck

| # | Batch | Primary surface | Data state | Status | Open / Fixed |
|---|-------|-----------------|-----------|--------|--------------|
| B0 | Baseline, tooling & **coverage recon** | `npm run ci`/`check`, app boots, console clean, **re-scan routes/pages/API vs plan & update it**, **control census** | any | 🔄 | 0 / 1 |
| B-UI | **Design-system primitives** | each `client/components/ui/*` × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard | any | 🔄 | 0 / 0 |
| B1 | Auth & authorization | login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation | multi + single user | ⬜ | 0 / 0 |
| B2 | Tracker (core) | `/` buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift | seeded + adversarial | ⬜ | 0 / 0 |
| B3 | Bills & schedules | `/bills` CRUD, custom schedules, reorder, merchant rules, historical import | adversarial | ⬜ | 0 / 0 |
| B4 | Subscriptions & Categories | `/subscriptions`, catalog, `/categories`, groups, reorder | seeded | ⬜ | 0 / 0 |
| B5 | Reporting reconciliation | `/summary`, `/calendar`, `/analytics`, `/health` cross-check totals | seeded + large | 🔄 | 0 / 3 |
| B6 | Spending | `/spending` YNAB view, averages, cover-overspending, safe-to-spend | seeded + edge months | 🔄 | 0 / 1 |
| B7 | Debt planning (math) | `/snowball`, `/payoff` APR/amortization vs hand-calc | edge (APR=0, $0 debt) | 🔄 | 0 / 2 |
| B8 | Banking & bank sync | `/bank-transactions`, SimpleFIN sync, matching, merchant/store, advisory filter | seeded txns | ⬜ | 0 / 0 |
| B9 | Data lifecycle | `/data` import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip | empty + seeded | 🔄 | 0 / 1 |
| B10 | Notifications & workers | email + ntfy/Gotify/Discord/Telegram, reminders, cron workers | seeded | 🔄 | 0 / 1 |
| B11 | Admin panel | users, login mode, auth methods, backups, cleanup, status, onboarding | admin | 🔄 | 0 / 0 |
| B12 | Settings, Profile & global UI | `/settings`, `/profile`, static pages, command palette, sidebar/nav | any | 🔄 | 0 / 0 |
| B13 | API / backend direct | all `/api/*`: auth, CSRF, validation, rate limits, error shape, IDOR, cents | via HTTP client | 🔄 | 0 / 1 |
| B14 | Non-functional | a11y, performance, PWA/offline, XSS/secrets, timezone/DST | large + adversarial | 🔄 | 0 / 4 |
| B15 | Regression & sign-off | full smoke on **production build**, exit criteria | seeded | 🔄 | 0 / 0 |

> After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new
> cycle from B0 against the next build/version.

### 1.1 QA Cycle Log

One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A
cycle is only "clean" when its **find pass logged zero findings**. Keep going
until you get a clean cycle.

| Cycle | Started | Build / commit | Findings logged | Fixed / archived | Result |
|-------|---------|----------------|-----------------|------------------|--------|
| 1 | 2026-07-02 | `bdbf231`→(dev) | 13 | **13 → all fixed, verified & archived** (…, +B10-01 broken "Send test push") | ✅ **0 open.** Post seed-fix reconciliation caught the **occurrence-gating family** — Summary (S2), Analytics, and SimpleFIN bank-tracking all counted non-monthly bills every month; all fixed via `resolveDueDate` and guarded (probe reconciliation + `tests/summaryBankTracking.test.js`). Probed B0/B1/B3/B4/B5/B6/B7/B8/B9/B13/B14; solid: auth/isolation, CSRF, payment/date validation, recurrence, matching/dedup, subscription+spending math, XSS, calendar gating. **A full re-run (B0→B15) is still required to declare the cycle clean per exit criteria.** |

**Result key:** 🔄 in progress · 🔁 findings fixed, re-run required · ✅ clean (zero findings — QA complete)

---

## 2. Active Findings Log

**This is the live log.** Record every finding here the moment it's found — before
fixing. Keep only **Open / Fixing / Fixed** rows here. Once a finding is
**Fixed + verified + archived to `HISTORY.md`**, delete its row from this table
(its permanent record is the changelog entry).

**Finding ID:** `QA-B{batch}-{nn}` (e.g. `QA-B2-01`).
**Severity:** S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see [Appendix A](#appendix-a--severity-definitions)).
**Status:** 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive.

| ID | Sev | Area (`file:line`) | Summary | Status | Notes / repro |
|----|-----|--------------------|---------|--------|---------------|
| _(none — all Cycle 1 findings fixed, verified & archived to `HISTORY.md` v0.41.0)_ | | | | | |

**Finding template** (paste a new row above; keep the full write-up here until archived):

```
ID: QA-B?-??
Severity: S1 / S2 / S3 / S4 / IMP
Environment: browser / viewport / theme / role / auth mode / data state
Area: file:line (if known)
Steps to reproduce:
  1.
  2.
Expected:
Actual:
Evidence: console / network / DB row / screenshot
Fix: (what changed, commit) — Verified by: (repro re-run + ci)
```

Log console errors, failed network requests, and unhandled rejections as findings
**even if the UI looks fine**.

_All Cycle 1 write-ups have been archived to `HISTORY.md` v0.41.0 (see §3)._

---

## 3. Archiving fixed findings to HISTORY.md

`HISTORY.md` is the project changelog (version-organized, emoji section headers).
When a finding is Fixed **and verified**, write a concise entry there, then remove
the row from the Active Findings Log.

**Where:** under the current in-progress version heading (e.g. `## v0.41.x`). If a
QA cycle produces several fixes, group them under a `### 🐛 QA Fixes` (bug fixes)
or `### 🧹 QA` (polish/improvements) section, matching the existing changelog voice.

**Entry format** (match the terse, specific style already in `HISTORY.md`):

```markdown
### 🐛 QA Fixes

- **[Area] Short title** — What was wrong and the user-visible impact, then the
  fix. Reference the file/function and any migration or test added.
  (was QA-B7-03)
```

**Rules**
- One bullet per finding; include the old `QA-B?-??` id in parentheses for traceability.
- If a fix added/changed a test, say which (`tests/…` or `client/…test.*`).
- Don't archive until the fix is verified (repro gone + `npm run ci` green).
- IMP items that were implemented are archived the same way; IMP items merely *noted* stay in the Findings Log (or graduate to `FUTURE.md`/`roadmap.md` if deferred).

---

## 4. Environment & setup

### 4.1 Running the app

| Mode | Command | URL |
|------|---------|-----|
| Dev (API + UI, hot reload) | `npm run dev` | UI `http://localhost:5173` (proxies API → `:3000`) |
| API only | `npm run dev:api` | `http://localhost:3000` |
| Production build | `npm run build` then `npm start` | `http://localhost:3000` |
| Docker | `docker-compose up` | per compose config |

- Backend: Node/Express on `PORT` (default `3000`). Frontend dev: Vite on `5173`.
- Data: SQLite at `db/bills.db` (WAL). **Back it up before destructive tests** (`backups/` or a manual copy). Prefer a scratch DB for B9/B11 restore tests.
- Configure a dedicated **test** `.env` from `.env.example`. Never point tests at production data or a live SimpleFIN account with real credentials.
- Test commands: `npm run ci` (check + all tests + build), `npm run check` (syntax + build), `npm run test` (server), `npm run test:client` (vitest).

### 4.2 Test matrix

Full functional pass across reasonable combinations; smoke (B15) across all.

| Dimension | Values |
|-----------|--------|
| Browser | Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser) |
| Viewport | Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414 |
| Theme | Light, Dark, system-follow |
| Role | `user`, `admin`, default admin (first-run) |
| Auth mode | Multi-user, single-user |
| Density | Normal + compact desktop |
| Network | Online, Slow 3G, offline (PWA shell) |
| Data state | Empty, seeded demo, large/stress, adversarial |

### 4.3 Accounts to prepare
- `admin`, `user`, a **second** `user` (data-isolation), a single-user-mode instance (separate DB).
- Demo reference: `guest / guest123` (do not run destructive flows on any shared demo server).

### 4.4 Automated E2E harness (Playwright)

Manual passes prove a button works **once**; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.

| Command | What it does |
|---------|--------------|
| `npm run test:e2e` | run the E2E suite headless (boots the app via `webServer`) |
| `npm run test:e2e:ui` | Playwright UI mode — watch/debug interactively |
| `npm run test:e2e:update` | re-baseline visual-regression screenshots (review the diff before committing) |
| `npm run smoke:prod` | **B15 production-build smoke** — builds, boots `node server.js` (dist/), drives the real artifact so the split vendor chunks are validated at runtime |

- **Setup (one-time):** `npm install` then `npx playwright install chromium`. Config: `playwright.config.js`; specs in `e2e/`.
- **Scope:** the suite is a **thin critical-path smoke**, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
- **Don't** point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.

---

## 5. Test data strategy

- **Empty:** brand-new account. Every page must render a sensible empty state — no crash, no `NaN`, no blank white screen.
- **Seeded:** use **Data → Seed Demo Data** for a realistic mid-size dataset.
- **Large/stress:** 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (`@tanstack/react-virtual`), charts, query perf.
- **Adversarial (deliberately try to break it):**
  - Amounts: `0`, `0.01`, negative, `9,999,999.99`, fractional cents.
  - Text: emoji, RTL, `<script>` XSS probe, 1,000-char strings, leading/trailing spaces, SQL-ish input.
  - Dates: 1st/14th/15th/31st boundaries; 28/29/30/31-day months; Feb 29; month/year crossing; inactive ranges; skipped months; overrides.
  - Transactions: duplicate amount+date, same-day merchant repeats, refunds/negatives.
  - Debt: APR `0%`, very high APR, `$0` balance, absurd inputs.
  - Non-UTC system timezone + a DST boundary date.

---

## 6. Cross-cutting checks (every page)

Run on **every** page during its batch — don't assume a shared component behaves the same everywhere.

**Navigation & routing** — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → `NotFoundPage` · active nav highlighted · `simplefinOnly` (Banking) gated · `Ctrl+K` palette finds & opens it.

**Buttons & interactions** — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · **rapid repeated toggling** (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then **navigate away mid-flight** doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).

**Forms & validation** — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · **paste** into every field (incl. `"$1,234.56"` into currency) · **browser/password-manager autofill** on login & forms · **IME/composition** (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).

**Number inputs (you have ~45 `type="number"` fields — the highest-risk control type)** — scroll-wheel over a focused field must **not** silently change the value · spinner up/down buttons step correctly and respect min/max · reject/`e`/`+`/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no `NaN` submitted · cents fields never accept >2 decimals.

**Per-control state matrix** — for each control on the page, verify every applicable state renders and behaves in **both light and dark**: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).

> **Note — "sliders":** this app has **no `<input type=range>` sliders.** The `SlidersHorizontal` glyph is just the Bills **filter-panel** button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.

**States** — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, `ErrorBoundary` shows a fallback not a white page.

**Visual & responsive** — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.

**Data integrity** — money 2-decimals, no float artifacts (`9.999999`) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).

---

## 7. Batch playbooks (detailed checklists)

Each batch below is the detailed script for the matching row in [§1](#1-batch-plan--progress-tracker). Apply [§6](#6-cross-cutting-checks-every-page) throughout.

### B0 — Baseline, tooling & coverage recon
**Run FIRST in every cycle.** This is where the plan re-syncs with reality — new
pages, routes, endpoints, or features added since the last cycle get discovered
and folded in **before** testing, so coverage never silently rots.

**Tooling baseline**
- [ ] `npm run ci` — record any failing server/client test or build error as a finding (S1/S2).
- [ ] `npm run check` — server syntax + build clean.
- [ ] App boots via `npm run dev` **and** production `npm start`; note startup warnings.
- [ ] Load the app; browser console + server logs clean on first load and first navigation.
- [ ] Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.

**Coverage recon — enumerate the *actual* product and diff it against this plan.**
Run these, then compare the output to the batch playbooks (§7) and the [route map](#appendix-c--page--route--api-quick-map):
- [ ] **Client routes** — `grep -nE "<Route" client/App.jsx` — every path present here must appear in a batch playbook and Appendix C.
- [ ] **Pages** — `ls client/pages/` — every page has an owning batch.
- [ ] **Sidebar / nav entries** — `grep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx` — new nav links (incl. conditional ones like `simplefinOnly`) are covered.
- [ ] **API route mounts** — `grep -nE "app.use\('/api" server.js` — every mounted route group is in B13's list and mapped in Appendix C.
- [ ] **Services & components** — `ls services/` and `ls client/components/**/` — new service/component families have a home in a playbook.
- [ ] **UI primitives** — `ls client/components/ui/` — every shared primitive is covered by the [B-UI](#b-ui--design-system-primitives) playbook; a new primitive gets a row there.
- [ ] **Interactive-control census (makes "every button tested" *provable*)** — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: [Appendix E](#appendix-e--per-page-control-census)). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory: `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/components` and `grep -rn "onClick=" client/pages/<Page>.jsx`.
- [ ] **Feature flags / conditional surfaces** — search for `Only`, `enabled`, `featureFlag`, env gates that hide/show pages; ensure each state is tested.
- [ ] **What changed since last cycle** — skim `git log`/`HISTORY.md` since the previous cycle's commit (see [Cycle Log](#11-qa-cycle-log)) for new features/pages.

**Update the plan (do this now, not later)** — for anything the recon surfaced that isn't already covered:
- [ ] Add it to the relevant batch playbook (or create a new batch and a row in the [§1 table](#1-batch-plan--progress-tracker)).
- [ ] Add/adjust its entry in [Appendix C](#appendix-c--page--route--api-quick-map).
- [ ] Note the plan update in the [Cycle Log](#11-qa-cycle-log) row for this cycle.
- [ ] If a whole surface is *missing* from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.

### B-UI — Design-system primitives
**Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once.** Drive them wherever they're already mounted (or a scratch page); run each against the [per-control state matrix](#6-cross-cutting-checks-every-page) × light/dark × keyboard-only. One finding row per primitive.

| Primitive (`client/components/ui/`) | Must verify |
|---|---|
| `button.jsx` | every variant (default/destructive/outline/ghost/link) + size; **disabled truly blocks click**; loading state; focus ring; Enter/Space activate |
| `input.jsx` | text/number/password/date/search/file types; placeholder; disabled/read-only; error styling; paste/autofill; number-input rules above |
| `select.jsx` (Radix) | opens by mouse **and** keyboard; type-ahead; long lists scroll; onChange fires in **Firefox+Safari**; disabled options; value persists; Esc closes |
| `checkbox.jsx` / `switch.jsx` | toggles by click **and** Space; indeterminate (if used); disabled; label click toggles; controlled value round-trips |
| `dialog.jsx` / `alert-dialog.jsx` / `confirm-dialog.jsx` / `input-dialog.jsx` | open/close; **focus trap + restore**; Esc closes; overlay click behaves; **Cancel actually cancels (no side effect)**; Confirm fires once; scroll-lock releases |
| `dropdown-menu.jsx` | keyboard arrow nav; Esc; submenu; disabled items; click-outside closes; no clipping at viewport edge |
| `tabs.jsx` | arrow-key nav; active state; content swaps; deep-link/refresh keeps tab (if applicable) |
| `tooltip.jsx` | hover **and** keyboard-focus show it; dismiss on blur; touch behavior; not a11y-only info trap |
| `table.jsx` | header/zebra/hover; horizontal scroll on narrow viewport (no page h-scroll); empty state |
| `collapsible.jsx` | expand/collapse animation; state persists; keyboard operable |
| `sonner.jsx` (toast) | success/error/loading; **stack + dismiss**; auto-dismiss timing; doesn't cover primary actions; announced to SR |
| `save-status.jsx` | idle/saving/saved/error transitions reflect real autosave (`useAutoSave.test.jsx`) |
| `Skeleton.jsx` | matches final layout (no jump); no infinite skeleton on error |
| `badge.jsx` / `card.jsx` / `separator.jsx` / `label.jsx` | contrast in dark mode; label `htmlFor` focuses its control; no overflow on long text |
| `theme-toggle.jsx` | light↔dark↔system; applied **before first paint** (no flash); persists across reload |

- [ ] Every primitive above passes its row in light **and** dark, keyboard-only, at mobile width.
- [ ] Axe scan (see B14) on a page densely using primitives → zero critical violations.

### B1 — Auth & authorization
- [ ] **Password:** valid login → correct landing (Tracker for `user`, `/admin` for default admin); wrong password → clear error, no user-enumeration timing/message difference; logout clears session; expired session redirects and preserves `state.from`; session persists across refresh.
- [ ] **Rate limiting:** repeated failed logins throttled (`loginLimiter`/`loginUsernameLimiter`), clear message, resets.
- [ ] **TOTP:** enroll (QR + secret), code accepted, backup codes work once, login prompts for TOTP, wrong code rejected+throttled, disable requires re-auth.
- [ ] **WebAuthn:** register/login/remove passkey in Chrome, Firefox, Safari; password fallback works.
- [ ] **OIDC/Authentik:** SSO flow creates/links account; admin config errors surface cleanly; `oidcLimiter` throttles.
- [ ] **Roles/guards:** `user` blocked from `/admin*`, `/status` (redirect) and admin APIs (403); default admin forced to `/admin`; single-user bypass correct but admin surfaces still protected; unauth API → 401.
- [ ] **Data isolation (critical):** user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
- [ ] **CSRF:** state-changing request without a valid token → rejected.

### B2 — Tracker (`/`)
- [ ] Month nav (prev/next/jump), current month highlighted, data reloads per month.
- [ ] Bills land in correct `1–14` / `15–31` bucket by due date; pin-due sorting works.
- [ ] Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
- [ ] Skip excludes from totals for that month only; unskip restores.
- [ ] Per-month amount override persists, doesn't affect base bill or other months.
- [ ] Notes cell add/edit/clear persists per month.
- [ ] Inactive/date-range bill doesn't show or count outside its range.
- [ ] Balance/starting-amount cards period-aware + editable; income − bills / safe-to-spend correct.
- [ ] Overdue command center: accurate list/count, pay/skip actions work.
- [ ] Cash flow card, drift insight, payment ledger (add/edit/delete reconciles), autopay suggestion apply/dismiss.
- [ ] Editable cells autosave; Esc cancels; invalid input handled. Mobile rows equal desktop actions. Compact mode intact.

### B3 — Bills (`/bills`)
- [ ] Create with all fields (name, amount, due date, category, schedule, account, autopay, active range).
- [ ] Edit propagates to Tracker/Summary/Calendar/Analytics; delete confirms + handles orphan payments/history.
- [ ] Custom schedules (weekly/biweekly/monthly/quarterly/annual/custom): next-due & occurrences correct across month/year boundaries.
- [ ] Drag reorder persists (cross-check `billReorder.test.js`); search/filter panel filters + clears; large-list virtualization smooth.
- [ ] Merchant rules: create/matches/edit/delete; historical import dialog attributes month-crossing payments correctly.
- [ ] BillModal open/close, validation, cancel discards unsaved changes.

### B4 — Subscriptions & Categories
- [ ] Subscriptions: add/edit/delete, active/cancelled, renewal & annual→monthly normalization; totals feed Tracker/Summary/Analytics.
- [ ] Catalog: browse/search, add-from-catalog pre-fills.
- [ ] Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (`categoryGroups`/`categoryReorder` tests); colors/icons consistent on Tracker/Spending/Analytics.

### B5 — Reporting reconciliation
- [ ] Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
- [ ] Calendar plots bills/payments on correct days (**timezone**: a bill due on the 1st must not render on the 31st); day totals correct.
- [ ] Analytics charts render with data AND empty (no broken SVG/`NaN` axes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK.
- [ ] Health indicators compute from real data, no crash on empty; recommendations sane.

### B6 — Spending (`/spending`)
- [ ] Category-group view assigned/spent/available math correct; 3-month averages correct.
- [ ] Cover-overspending reallocates funds correctly and is reversible.
- [ ] Safe-to-spend matches Tracker (`safeToSpend.test.js`); month nav; empty/partial months handled.

### B7 — Debt planning (`/snowball`, `/payoff`)
- [ ] Add debts (balance/APR/min); snowball vs avalanche ordering correct.
- [ ] Projection + amortization vs a **hand-calculated** example; APR=0 and already-paid debts correct.
- [ ] Extra-payment/budget updates payoff date + total interest; chart renders; plan history saves/restores; status banner accurate.
- [ ] Edge: single debt, many debts, `$0` debt, negative/absurd inputs rejected.

### B8 — Banking (`/bank-transactions`)
- [ ] Ledger loads/virtualizes/filters (date/account/amount/merchant/status).
- [ ] Transaction matching (match/unmatch), auto-match review approve/reject, no double-match (`transactionMatchService.test.js`).
- [ ] Merchant/store matching rules + confidence/duplicates; advisory non-bill filter flags/hides with override.
- [ ] Matched payments reflect on Tracker/ledger without double-counting; category picker persists.

### B9 — Data lifecycle (`/data`)
- [ ] Imports: spreadsheet (XLSX/CSV) map/preview/commit, malformed rejected, dup/partial handled; transaction CSV (`csvTransactionImportService.test.js`) dedupe + parsing; SQLite user import version-checked + confirms overwrite; seed demo data safe; import history lists + rollback.
- [ ] Exports: download SQLite **round-trips** (export → fresh account → import → matches); Excel export opens uncorrupted; ICS calendar feed valid in a client AND properly **token-gated** (route mounts before auth — verify not open).
- [ ] Backups: manual + scheduled restorable on a scratch instance; permissions not world-readable; old backups pruned (`backupAndCleanup.test.js`).

### B10 — Notifications & workers
- [ ] Each channel (email/SMTP, ntfy, Gotify, Discord, Telegram): test message delivers; bad token/URL → clear error, logged, no secret leak.
- [ ] Reminders fire at configured lead time for upcoming/overdue; no duplicates; paid/skipped excluded; respects per-user prefs.
- [ ] Workers: `dailyWorker`, `bankSyncWorker` (interval + guardrails), `backupScheduler` run on schedule; errors caught/logged, don't crash server, next run unblocked.

### B11 — Admin panel (`/admin`)
- [ ] Onboarding wizard completes without a broken state.
- [ ] Users table: add/edit-role/reset-pw/disable/delete; **cannot remove the last admin**.
- [ ] Login mode switch single↔multi verified live, no lockout; auth-methods enable/disable + bad config surfaced.
- [ ] Email notif config + test send; bank sync admin (configure/manual/auto/status/revoke).
- [ ] Backups create/list/download/restore/delete; cleanup panel previews impact + confirms (counts match `backupAndCleanup.test.js`).
- [ ] Privacy admin edits reflect on public `/privacy`; system status metrics/versions/jobs accurate (`statusService.test.js`); admin actions rate-limited + audited (`auditService` — spot-check log).

### B12 — Settings, Profile & global UI
- [ ] Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
- [ ] Profile: change password (current required, invalidates sessions), manage 2FA/passkeys, sessions revoke (`profileRoute.test.js`).
- [ ] Static: About (public + admin, version shown), Privacy, Release Notes (dialog once per `user`, dismiss persists), Roadmap (admin), NotFound friendly + way home.
- [ ] Global: command palette (`Ctrl+K`) search/keyboard/Esc, hidden for default admin; sidebar collapse/expand + mobile overlay (check overflow issue in `docs/UI_IMPROVEMENTS.md`); toasts stack/dismiss; page transitions no flash/double-fetch; theme applied before first paint.

### B13 — API / backend direct
Route groups: `auth`, `auth/oidc`, `admin`, `tracker`, `bills`, `subscriptions`, `payments`, `data-sources`, `transactions`, `matches`, `categories`, `settings`, `user`, `calendar`, `summary`, `monthly-starting-amounts`, `analytics`, `spending`, `snowball`, `notifications`, `status`, `about`, `about-admin`, `privacy`, `version`, `profile`, `export`, `import`/`imports`.
- [ ] Auth: unauth → 401, wrong role → 403, right role → 200.
- [ ] CSRF: state-changing without valid token rejected; with token succeeds (`middleware/csrf.js`).
- [ ] Validation: bad/missing body → structured 4xx (`middleware/errorFormatter.js`, `utils/apiError.js`), never a raw 500 stack.
- [ ] IDOR/isolation: other user's resource by id → 403/404, no leak.
- [ ] Rate limits: login/admin/export/import/OIDC limiters trigger + reset (`middleware/rateLimiter.js`).
- [ ] Money in **integer cents** end-to-end (per `docs/cents-migration-plan.md`); API and DB agree; no float drift.
- [ ] Idempotency: repeated create doesn't duplicate; concurrent edits resolve sanely.
- [ ] Consistent error JSON + correct status codes; security headers present (`middleware/securityHeaders.js`); public routes (`about`/`privacy`/`version`/calendar feed) leak nothing sensitive.

### B14 — Non-functional
- [ ] **a11y (manual):** keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix `aria-*`); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only.
- [ ] **a11y (automated):** run **axe-core** on every page (`@axe-core/playwright`, or `jest-axe` for component-level) — **zero critical/serious** violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once.
- [ ] **Visual regression:** capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright `toHaveScreenshot`); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed.
- [ ] **Performance:** initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query `staleTime`).
- [ ] **PWA/offline:** installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
- [ ] **Security spot-checks:** XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom `MarkdownText` renderer — https-only link hrefs, **no** `dangerouslySetInnerHTML` anywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookies `HttpOnly`/`Secure`/`SameSite`; `encryptionService` protects at-rest secrets, keys not committed. (Depth: `SECURITY_AUDIT.md`.)
- [ ] **Resilience:** kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
- [ ] **Fault injection (systematic):** with a request-interception harness (Playwright `page.route`, or DevTools network overrides), force each page's API calls to **401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON** and confirm the UI shows a recoverable error (toast or `ErrorBoundary` fallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently.
- [ ] **Timezone/locale:** non-UTC tz + DST boundary — due dates and calendar stay correct.

### B15 — Regression & sign-off
Run on the **production build** (`npm start`), not dev:
- [ ] `npm run ci` green. Log in as `user` and `admin`.
- [ ] `npm run test:e2e` green (Playwright smoke + axe + visual-regression baselines match, §4.4).
- [ ] Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
- [ ] Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
- [ ] Snowball: add debt → projection. Data: seed → export → import round-trip (scratch DB).
- [ ] Admin: open panel, users, system status, run a backup. Banking loads + matches (if SimpleFIN configured).
- [ ] Notifications: one test message on configured channel. Toggle dark mode; mobile viewport; `Ctrl+K` navigates.
- [ ] Bogus URL → 404; logout → login redirect. Console clean throughout.
- [ ] Confirm [exit criteria](#appendix-b--exit--sign-off-criteria).

---

## 8. Appendices

### Appendix A — Severity definitions

| Level | Definition |
|-------|------------|
| **S1 – Critical** | Data loss/corruption, security hole, crash/blank page, wrong money math, cannot log in/save. |
| **S2 – Major** | Feature broken/unusable, wrong results, broken navigation, unhandled error. |
| **S3 – Minor** | Works but wrong edge behavior, confusing UX, missing validation message. |
| **S4 – Cosmetic** | Visual/copy/alignment/dark-mode-contrast, non-blocking. |
| **IMP – Improvement** | Not a bug; enhancement or polish idea. |

### Appendix B — Exit / sign-off criteria

A cycle is release-ready when:
- [ ] All batches B0–B15 ✅ on the primary matrix (Chrome desktop + mobile, light + dark, `user` + `admin`).
- [ ] B15 smoke green on the **production build**.
- [ ] **Zero open S1/S2** in the Findings Log; S3/S4/IMP triaged.
- [ ] `npm run ci` green; no new console errors.
- [ ] Data export→import round-trip verified with no loss.
- [ ] Auth/authorization + data-isolation all pass.
- [ ] Money and date/period correctness verified vs hand-calculated examples.
- [ ] All fixes for the cycle archived to `HISTORY.md`; cycle summary recorded (date, build/commit, environment).

### Appendix C — Page ↔ route ↔ API quick map

| Page | Route | Primary API |
|------|-------|-------------|
| Tracker | `/` | `/api/tracker`, `/api/bills`, `/api/payments`, `/api/monthly-starting-amounts` |
| Calendar | `/calendar` | `/api/calendar` |
| Summary | `/summary` | `/api/summary` |
| Bills | `/bills` | `/api/bills`, `/api/categories`, `/api/matches` |
| Subscriptions / Catalog | `/subscriptions`, `/subscriptions/catalog` | `/api/subscriptions` |
| Categories | `/categories` | `/api/categories` |
| Health | `/health` | `/api/analytics`, `/api/summary` |
| Analytics | `/analytics` | `/api/analytics` |
| Spending | `/spending` | `/api/spending` |
| Banking | `/bank-transactions` | `/api/transactions`, `/api/matches`, `/api/data-sources` |
| Snowball / Payoff | `/snowball`, `/payoff` | `/api/snowball` |
| Settings | `/settings` | `/api/settings`, `/api/notifications` |
| Profile | `/profile` | `/api/profile`, `/api/user` |
| Data | `/data` | `/api/import`, `/api/export`, `/api/data-sources` |
| Admin | `/admin`, `/admin/status` | `/api/admin`, `/api/status`, `/api/about-admin` |
| About / Privacy / Release Notes / Roadmap | `/about`, `/privacy`, `/release-notes`, `/roadmap` | `/api/about`, `/api/privacy`, `/api/version` |

### Appendix D — Reference docs
`SECURITY_AUDIT.md` (security depth) · `docs/UI_IMPROVEMENTS.md` (known UI issues) · `docs/cents-migration-plan.md` (money-as-cents) · `docs/SIMPLEFIN_CONSUMER_GUARDRAILS.md` (sync limits) · `docs/CSRF-SPA-Setup.md`, `docs/RATE_LIMITING_ENHANCEMENT.md` (security middleware) · `REVIEW.md`, `DEVELOPMENT_LOG.md`, `roadmap.md`, `FUTURE.md` (context/known gaps) · `HISTORY.md` (changelog / fix archive) · `playwright.config.js` + `e2e/` (automated E2E/visual/a11y harness, §4.4).

### Appendix E — Per-page control census

The completeness ledger behind "every button, textbox, slider is right." Fill one table **per page** during [B0](#b0--baseline-tooling--coverage-recon) and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx` + `grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx`.

**Template** (copy per page):

| Control | Type | Expected action | States checked (default/focus/disabled/error/loading) | Keyboard | Result |
|---------|------|-----------------|-------------------------------------------------------|----------|--------|
| *e.g.* Quick-pay button | button | marks bill paid, updates balance cards, undo available | default ✓ · disabled-while-saving ✓ | Enter ✓ | ✅ / finding id |
| *e.g.* Amount input | number | per-month override, cents only, no wheel-scroll change | default ✓ · error-on-letters ✓ | Tab/Esc ✓ | ✅ / finding id |

**Pages to census** (from `client/pages/`, keep in sync with [Appendix C](#appendix-c--page--route--api-quick-map)): Tracker, Calendar, Summary, Bills, Subscriptions, SubscriptionCatalog, Categories, Health, Analytics, Spending, Snowball, Payoff, BankTransactions, Data, Settings, Profile, Admin, Status, About, Privacy, ReleaseNotes, Roadmap, Login, NotFound — plus the shared **Sidebar/command-palette/header** chrome once.
</content>