PICK next ⬜ batch → SET UP (app, data state, role, console open) →
PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 →
mark batch status in §1 → next batch. (No fixing here.)
```
**Show-stopper exception.** A *show-stopper* is a finding that **blocks continued
QA** — the app won't boot, you can't log in, or a page crashes so hard you can't
test the rest of it. Only these get fixed immediately (mid-pass), because you
can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix;
then continue the find pass. **Everything else is logged and left for Phase 2** —
no matter how tempting or trivial.
**Discipline (for best results)**
- **Phase 1 is log-only.** Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching.
- Keep each find batch tight and focused — one batch per session — so probing stays thorough.
- **Phase 2 fixes everything**, not just S1/S2. Root-cause over surface patch; add/extend a test in `tests/` or `client/**/*.test.*` for every logic bug so it can't silently return.
- Never leave the repo red at the end of Phase 3 — `npm run ci` must be green before archiving.
- Touch product behavior? Run the `/verify` skill on the affected flow before archiving.
- **The exit is empirical:** you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you *think* it's clean. Log the cycle result in the [Cycle Log](#11-qa-cycle-log) each time.
- Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split.
---
## 1. Batch plan & progress tracker
Batches are ordered **foundation-first** (baseline & auth before features; features
before cross-cutting; regression last). Update **Status** and **Findings** every run.
**Status key:** ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck
| # | Batch | Primary surface | Data state | Status | Open / Fixed |
- nested-interactive → /categories (8): each row is <divrole="button"tabindex=0
aria-expanded> AND draggable AND contains nested edit/delete/drag buttons.
- nested-interactive → /snowball (1): a <buttonclass="w-full text-left"
aria-controls aria-expanded> accordion trigger wraps other interactive controls.
Fix (deferred): don't make the whole row/trigger a button — use a dedicated
expand/collapse control and keep the row a plain container, so interactive
children aren't nested inside an interactive ancestor. Re-verify drag + keyboard
after. Guard: e2e/a11y.authed.spec.js (currently red on these two pages by design).
```
---
## 3. Archiving fixed findings to HISTORY.md
`HISTORY.md` is the project changelog (version-organized, emoji section headers).
When a finding is Fixed **and verified**, write a concise entry there, then remove
the row from the Active Findings Log.
**Where:** under the current in-progress version heading (e.g. `## v0.41.x`). If a
QA cycle produces several fixes, group them under a `### 🐛 QA Fixes` (bug fixes)
or `### 🧹 QA` (polish/improvements) section, matching the existing changelog voice.
**Entry format** (match the terse, specific style already in `HISTORY.md`):
```markdown
### 🐛 QA Fixes
- **[Area] Short title** — What was wrong and the user-visible impact, then the
fix. Reference the file/function and any migration or test added.
(was QA-B7-03)
```
**Rules**
- One bullet per finding; include the old `QA-B?-??` id in parentheses for traceability.
- If a fix added/changed a test, say which (`tests/…` or `client/…test.*`).
- Don't archive until the fix is verified (repro gone + `npm run ci` green).
- IMP items that were implemented are archived the same way; IMP items merely *noted* stay in the Findings Log (or graduate to `FUTURE.md`/`roadmap.md` if deferred).
---
## 4. Environment & setup
### 4.1 Running the app
| Mode | Command | URL |
|------|---------|-----|
| Dev (API + UI, hot reload) | `npm run dev` | UI `http://localhost:5173` (proxies API → `:3000`) |
| API only | `npm run dev:api` | `http://localhost:3000` |
| Production build | `npm run build` then `npm start` | `http://localhost:3000` |
| Docker | `docker-compose up` | per compose config |
- Backend: Node/Express on `PORT` (default `3000`). Frontend dev: Vite on `5173`.
- Data: SQLite at `db/bills.db` (WAL). **Back it up before destructive tests** (`backups/` or a manual copy). Prefer a scratch DB for B9/B11 restore tests.
- Configure a dedicated **test**`.env` from `.env.example`. Never point tests at production data or a live SimpleFIN account with real credentials.
- Test commands: `npm run ci` (check + all tests + build), `npm run check` (syntax + build), `npm run test` (server), `npm run test:client` (vitest).
### 4.2 Test matrix
Full functional pass across reasonable combinations; smoke (B15) across all.
| Data state | Empty, seeded demo, large/stress, adversarial |
### 4.3 Accounts to prepare
-`admin`, `user`, a **second**`user` (data-isolation), a single-user-mode instance (separate DB).
- Demo reference: `guest / guest123` (do not run destructive flows on any shared demo server).
### 4.4 Automated E2E harness (Playwright)
Manual passes prove a button works **once**; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free.
| Command | What it does |
|---------|--------------|
| `npm run test:e2e` | run the E2E suite headless (boots the app via `webServer`) |
| `npm run test:e2e:update` | re-baseline visual-regression screenshots (review the diff before committing) |
- **Setup (one-time):** `npm install` then `npx playwright install chromium`. Config: `playwright.config.js`; specs in `e2e/`.
- **Scope:** the suite is a **thin critical-path smoke**, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught.
- **Don't** point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data.
---
## 5. Test data strategy
- **Empty:** brand-new account. Every page must render a sensible empty state — no crash, no `NaN`, no blank white screen.
- **Seeded:** use **Data → Seed Demo Data** for a realistic mid-size dataset.
- Debt: APR `0%`, very high APR, `$0` balance, absurd inputs.
- Non-UTC system timezone + a DST boundary date.
---
## 6. Cross-cutting checks (every page)
Run on **every** page during its batch — don't assume a shared component behaves the same everywhere.
**Navigation & routing** — reachable from nav and by direct URL (deep link) + after hard refresh · back/forward restores state, no stuck spinners · unknown sub-paths → `NotFoundPage` · active nav highlighted · `simplefinOnly` (Banking) gated · `Ctrl+K` palette finds & opens it.
**Buttons & interactions** — every button/link/icon/dropdown/tab/toggle/menu does something or is disabled with a reason · no dead controls · double-click doesn't duplicate records · **rapid repeated toggling** (spam a switch / pay-skip) resolves to one correct state, no stuck spinner · action started then **navigate away mid-flight** doesn't corrupt or throw · destructive actions confirm + cancel · primary action keyboard-reachable (Tab/Enter/Esc).
**Forms & validation** — required fields enforced · numeric/currency reject letters, handle 0/negative/decimal · errors don't wipe entered data · **paste** into every field (incl. `"$1,234.56"` into currency) · **browser/password-manager autofill** on login & forms · **IME/composition** (emoji, CJK) in text fields commits correctly · success shows toast (sonner) and the view updates without manual refresh (React Query invalidation).
**Number inputs (you have ~45 `type="number"` fields — the highest-risk control type)** — scroll-wheel over a focused field must **not** silently change the value · spinner up/down buttons step correctly and respect min/max · reject/`e`/`+`/exponent and multiple decimals · locale decimal comma vs dot · leading zeros · empty field ⇒ no `NaN` submitted · cents fields never accept >2 decimals.
**Per-control state matrix** — for each control on the page, verify every applicable state renders and behaves in **both light and dark**: default · hover · keyboard-focus (visible ring) · active/pressed · disabled (and truly non-interactive) · loading/in-flight · error/invalid · read-only · filled-to-overflow (1,000-char string / max-digit number wraps or truncates, no layout break).
> **Note — "sliders":** this app has **no `<input type=range>` sliders.** The `SlidersHorizontal` glyph is just the Bills **filter-panel** button; the closest real thing to a slider is a number stepper. Test those two surfaces where a slider would otherwise be expected.
**States** — loading skeleton/spinner, no layout jump · helpful empty state · error state (4xx/5xx/offline) recovers, `ErrorBoundary` shows a fallback not a white page.
**Visual & responsive** — correct at desktop/tablet/mobile, no overflow/h-scroll · dark mode contrast, no white flash · compact mode readable · long strings/big numbers wrap/truncate.
**Data integrity** — money 2-decimals, no float artifacts (`9.999999`) · dates in expected tz, period boundaries correct · values agree across pages (a bill total on Tracker == Summary == Analytics).
---
## 7. Batch playbooks (detailed checklists)
Each batch below is the detailed script for the matching row in [§1](#1-batch-plan--progress-tracker). Apply [§6](#6-cross-cutting-checks-every-page) throughout.
### B0 — Baseline, tooling & coverage recon
**Run FIRST in every cycle.** This is where the plan re-syncs with reality — new
pages, routes, endpoints, or features added since the last cycle get discovered
and folded in **before** testing, so coverage never silently rots.
**Tooling baseline**
- [ ]`npm run ci` — record any failing server/client test or build error as a finding (S1/S2).
- [ ]`npm run check` — server syntax + build clean.
- [ ] App boots via `npm run dev`**and** production `npm start`; note startup warnings.
- [ ] Load the app; browser console + server logs clean on first load and first navigation.
- [ ] Confirm which auth mode / seed state the DB is in; snapshot a backup before proceeding.
**Coverage recon — enumerate the *actual* product and diff it against this plan.**
Run these, then compare the output to the batch playbooks (§7) and the [route map](#appendix-c--page--route--api-quick-map):
- [ ]**Client routes** — `grep -nE "<Route" client/App.jsx` — every path present here must appear in a batch playbook and Appendix C.
- [ ]**Pages** — `ls client/pages/` — every page has an owning batch.
- [ ]**Sidebar / nav entries** — `grep -nE "to:|label:|Only" client/components/layout/Sidebar.jsx` — new nav links (incl. conditional ones like `simplefinOnly`) are covered.
- [ ]**API route mounts** — `grep -nE "app.use\('/api" server.js` — every mounted route group is in B13's list and mapped in Appendix C.
- [ ]**Services & components** — `ls services/` and `ls client/components/**/` — new service/component families have a home in a playbook.
- [ ]**UI primitives** — `ls client/components/ui/` — every shared primitive is covered by the [B-UI](#b-ui--design-system-primitives) playbook; a new primitive gets a row there.
- [ ]**Interactive-control census (makes "every button tested" *provable*)** — for each page, enumerate every button, link, toggle/switch, checkbox, select, text/number/date/file input, tab, menu, and filter control, and record it in a per-page control checklist (template: [Appendix E](#appendix-e--per-page-control-census)). A control that isn't on a checklist hasn't been tested — the census is the completeness guarantee the batch playbooks alone don't give you. Quick starting inventory: `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages client/components` and `grep -rn "onClick=" client/pages/<Page>.jsx`.
- [ ]**Feature flags / conditional surfaces** — search for `Only`, `enabled`, `featureFlag`, env gates that hide/show pages; ensure each state is tested.
- [ ]**What changed since last cycle** — skim `git log`/`HISTORY.md` since the previous cycle's commit (see [Cycle Log](#11-qa-cycle-log)) for new features/pages.
**Update the plan (do this now, not later)** — for anything the recon surfaced that isn't already covered:
- [ ] Add it to the relevant batch playbook (or create a new batch and a row in the [§1 table](#1-batch-plan--progress-tracker)).
- [ ] Add/adjust its entry in [Appendix C](#appendix-c--page--route--api-quick-map).
- [ ] Note the plan update in the [Cycle Log](#11-qa-cycle-log) row for this cycle.
- [ ] If a whole surface is *missing* from the product that the plan expected (page removed/renamed), reconcile the plan too — don't test ghosts.
### B-UI — Design-system primitives
**Test each shared control once, thoroughly, in isolation — a bug here breaks every page at once.** Drive them wherever they're already mounted (or a scratch page); run each against the [per-control state matrix](#6-cross-cutting-checks-every-page) × light/dark × keyboard-only. One finding row per primitive.
| Primitive (`client/components/ui/`) | Must verify |
- [ ]**Roles/guards:**`user` blocked from `/admin*`, `/status` (redirect) and admin APIs (403); default admin forced to `/admin`; single-user bypass correct but admin surfaces still protected; unauth API → 401.
- [ ]**Data isolation (critical):** user A cannot read/modify user B's bills, payments, transactions, categories, snowball plans — test by ID enumeration on the API.
- [ ]**CSRF:** state-changing request without a valid token → rejected.
### B2 — Tracker (`/`)
- [ ] Month nav (prev/next/jump), current month highlighted, data reloads per month.
- [ ] Bills land in correct `1–14` / `15–31` bucket by due date; pin-due sorting works.
- [ ] Quick pay marks paid + updates balance cards/progress; undo works; no double-count.
- [ ] Skip excludes from totals for that month only; unskip restores.
- [ ] Per-month amount override persists, doesn't affect base bill or other months.
- [ ] Notes cell add/edit/clear persists per month.
- [ ] Inactive/date-range bill doesn't show or count outside its range.
- [ ] Categories: create/edit/delete (in-use handled: reassign/prevent); groups create/assign/reorder (`categoryGroups`/`categoryReorder` tests); colors/icons consistent on Tracker/Spending/Analytics.
### B5 — Reporting reconciliation
- [ ] Summary totals (paid/unpaid/overdue/remaining) reconcile with Tracker for the same month; income breakdown modal matches.
- [ ] Calendar plots bills/payments on correct days (**timezone**: a bill due on the 1st must not render on the 31st); day totals correct.
- [ ] Analytics charts render with data AND empty (no broken SVG/`NaN` axes); period selectors update all charts; figures reconcile with Summary/Tracker; large dataset perf OK.
- [ ] Health indicators compute from real data, no crash on empty; recommendations sane.
- [ ] Privacy admin edits reflect on public `/privacy`; system status metrics/versions/jobs accurate (`statusService.test.js`); admin actions rate-limited + audited (`auditService` — spot-check log).
### B12 — Settings, Profile & global UI
- [ ] Settings: theme (light/dark/system) persists; notification prefs save + reflect in B10; display/density/period/search-panel prefs persist; invalid rejected.
- [ ]**a11y (manual):** keyboard-only reach/operate every control, visible focus, skip-link works; screen-reader labels/roles (Radix `aria-*`); WCAG-AA contrast light+dark; modals trap+restore focus, Esc closes; errors announced not color-only.
- [ ]**a11y (automated):** run **axe-core** on every page (`@axe-core/playwright`, or `jest-axe` for component-level) — **zero critical/serious** violations; triage moderate. Wire it into the E2E suite so it re-runs every cycle, not just once.
- [ ]**Visual regression:** capture a baseline screenshot per page × {desktop, mobile} × {light, dark} (Playwright `toHaveScreenshot`); diff against baseline each cycle. Every non-trivial pixel diff is either an intended change (update the baseline in the same commit) or a finding — never ignore it. This is what makes "every page looks right" repeatable instead of eyeballed.
- [ ]**Performance:** initial load + lazy route splitting OK on Slow 3G; large lists responsive; no memory leak over 10+ navigations; no duplicate/excess requests (React Query `staleTime`).
- [ ]**PWA/offline:** installs; manifest/icon correct; offline shell loads with graceful messaging; SW updates without stale-cache breakage.
- [ ]**Security spot-checks:** XSS in bill names/notes/category names/imported data escaped everywhere (defense = React auto-escaping + the restrictive custom `MarkdownText` renderer — https-only link hrefs, **no**`dangerouslySetInnerHTML` anywhere; NOT rehype-sanitize, which is unused, see QA-B14-03); no secrets (SimpleFIN token, SMTP creds, OIDC secret) in bundle/responses/logs; cookies `HttpOnly`/`Secure`/`SameSite`; `encryptionService` protects at-rest secrets, keys not committed. (Depth: `SECURITY_AUDIT.md`.)
- [ ]**Resilience:** kill API mid-session → recoverable errors, no data loss on next save; locked/corrupt SQLite surfaces clearly; SimpleFIN/SMTP/push down → graceful degrade; two-tab concurrent edits don't silently clobber.
- [ ]**Fault injection (systematic):** with a request-interception harness (Playwright `page.route`, or DevTools network overrides), force each page's API calls to **401 mid-session / 403 / 429 / 500 / network-timeout / malformed-JSON** and confirm the UI shows a recoverable error (toast or `ErrorBoundary` fallback), never a white screen, stuck spinner, or silent success. Do this per page, not once globally — each page handles failure differently.
- [ ]**Timezone/locale:** non-UTC tz + DST boundary — due dates and calendar stay correct.
### B15 — Regression & sign-off
Run on the **production build** (`npm start`), not dev:
- [ ]`npm run ci` green. Log in as `user` and `admin`.
- [ ]`npm run test:e2e` green (Playwright smoke + axe + visual-regression baselines match, §4.4).
- [ ] Tracker: create bill → quick-pay → skip another → add note; reflected on Summary/Calendar/Analytics.
- [ ] Create a category + subscription → appear on Tracker/Spending; Spending safe-to-spend correct.
The completeness ledger behind "every button, textbox, slider is right." Fill one table **per page** during [B0](#b0--baseline-tooling--coverage-recon) and check every control off during that page's batch. A control not listed here is a control not tested. Build the starting list with `grep -rnoE "type=[\"'][a-z]+[\"']" client/pages/<Page>.jsx` + `grep -n "onClick=\|<Button\|<Select\|<Switch\|<Checkbox" client/pages/<Page>.jsx`.
**Template** (copy per page):
| Control | Type | Expected action | States checked (default/focus/disabled/error/loading) | Keyboard | Result |