# BillTracker — Master QA Plan (living document) **Version target:** v0.41.x · **Executor:** Claude (active) · **Last updated:** 2026-07-02 This is a **living, operational** QA document, not a static spec. Claude runs it, in **batches**, actively hunting for bugs/errors/rough edges, **fixing** them, and **archiving** each fixed finding to `HISTORY.md`. Update this document whenever a better approach, a new risk area, or a missed surface is discovered. > **The prime directive:** don't just confirm the happy path — try to *break* > the product. Every batch should end with the tree green, the Findings Log > up to date, and any fixes archived to `HISTORY.md`. --- ## Table of contents 1. [Execution model — find, then fix, then repeat](#0-execution-model--find-then-fix-then-repeat) 2. [Batch plan & progress tracker](#1-batch-plan--progress-tracker) 3. [Active Findings Log](#2-active-findings-log) 4. [Archiving fixed findings to HISTORY.md](#3-archiving-fixed-findings-to-historymd) 5. [Environment & setup](#4-environment--setup) 6. [Test data strategy](#5-test-data-strategy) 7. [Cross-cutting checks (every page)](#6-cross-cutting-checks-every-page) 8. [Batch playbooks (detailed checklists)](#7-batch-playbooks-detailed-checklists) 9. [Appendices](#8-appendices) --- ## 0. Execution model — find, then fix, then repeat **Separate finding from fixing.** During a QA pass we *hunt and log* — we do **not** fix as we go (except show-stoppers, see below). Only after the whole plan has run do we enter a dedicated **fix phase** and fix **every** logged finding. Then we run the **entire** QA plan again from the top. Repeat until a full pass finds **zero** errors. Two nested loops: ``` OUTER — QA CYCLE (repeat until a full pass finds zero findings) ┌──────────────────────────────────────────────────────────────────────┐ │ PHASE 1 · FIND Run every batch B0→B15 in find-only mode. │ │ Probe hard, LOG everything to the Findings Log. │ │ Do NOT fix (except show-stoppers). │ │ ↓ │ │ PHASE 2 · FIX QA pass done. Now fix EVERY logged finding — │ │ all of them (S1→IMP). Root-cause, with tests. │ │ ↓ │ │ PHASE 3 · VERIFY Re-run each fix's repro; `npm run ci` green. │ │ ↓ │ │ PHASE 4 · ARCHIVE Move every fixed finding to HISTORY.md (§3). │ │ ↓ │ │ PHASE 5 · RE-RUN Start a new cycle at PHASE 1. If that full pass │ │ logs zero findings → QA is clean, STOP. │ └──────────────────────────────────────────────────────────────────────┘ INNER — per batch during PHASE 1 (find-only) PICK next ⬜ batch → SET UP (app, data state, role, console open) → PROBE (actively break it, §5 adversarial inputs) → LOG every finding to §2 → mark batch status in §1 → next batch. (No fixing here.) ``` **Show-stopper exception.** A *show-stopper* is a finding that **blocks continued QA** — the app won't boot, you can't log in, or a page crashes so hard you can't test the rest of it. Only these get fixed immediately (mid-pass), because you can't proceed otherwise. Log it, fix it, verify, and note it was a mid-pass fix; then continue the find pass. **Everything else is logged and left for Phase 2** — no matter how tempting or trivial. **Discipline (for best results)** - **Phase 1 is log-only.** Resist fixing. A clean, complete inventory of findings beats a scattered fix-as-you-go pass and produces better batching. - Keep each find batch tight and focused — one batch per session — so probing stays thorough. - **Phase 2 fixes everything**, not just S1/S2. Root-cause over surface patch; add/extend a test in `tests/` or `client/**/*.test.*` for every logic bug so it can't silently return. - Never leave the repo red at the end of Phase 3 — `npm run ci` must be green before archiving. - Touch product behavior? Run the `/verify` skill on the affected flow before archiving. - **The exit is empirical:** you're done only when an entire find pass (B0→B15) turns up zero new findings — not when you *think* it's clean. Log the cycle result in the [Cycle Log](#11-qa-cycle-log) each time. - Improve THIS plan whenever a pass reveals a missed surface, a better repro, or a batch that should be reordered/split. --- ## 1. Batch plan & progress tracker Batches are ordered **foundation-first** (baseline & auth before features; features before cross-cutting; regression last). Update **Status** and **Findings** every run. **Status key:** ⬜ Not started · 🔄 In progress · ✅ Done (green, findings archived) · 🔁 Needs recheck | # | Batch | Primary surface | Data state | Status | Open / Fixed | |---|-------|-----------------|-----------|--------|--------------| | B0 | Baseline, tooling & **coverage recon** | `npm run ci`/`check`, app boots, console clean, **re-scan routes/pages/API vs plan & update it**, **control census** | any | 🔄 | 0 / 1 | | B-UI | **Design-system primitives** | each `client/components/ui/*` × state matrix (default/hover/focus/active/disabled/loading/error/read-only) × light/dark × keyboard | any | ⬜ | 0 / 0 | | B1 | Auth & authorization | login (pw/OIDC/TOTP/WebAuthn), roles, single-user, CSRF, data isolation | multi + single user | ⬜ | 0 / 0 | | B2 | Tracker (core) | `/` buckets, pay/skip/notes/overrides, balance cards, overdue, ledger, drift | seeded + adversarial | ⬜ | 0 / 0 | | B3 | Bills & schedules | `/bills` CRUD, custom schedules, reorder, merchant rules, historical import | adversarial | ⬜ | 0 / 0 | | B4 | Subscriptions & Categories | `/subscriptions`, catalog, `/categories`, groups, reorder | seeded | ⬜ | 0 / 0 | | B5 | Reporting reconciliation | `/summary`, `/calendar`, `/analytics`, `/health` cross-check totals | seeded + large | ⬜ | 0 / 0 | | B6 | Spending | `/spending` YNAB view, averages, cover-overspending, safe-to-spend | seeded + edge months | 🔄 | 0 / 1 | | B7 | Debt planning (math) | `/snowball`, `/payoff` APR/amortization vs hand-calc | edge (APR=0, $0 debt) | 🔄 | 1 / 1 | | B8 | Banking & bank sync | `/bank-transactions`, SimpleFIN sync, matching, merchant/store, advisory filter | seeded txns | ⬜ | 0 / 0 | | B9 | Data lifecycle | `/data` import (XLSX/CSV/SQLite), export, ICS feed, backups round-trip | empty + seeded | 🔄 | 0 / 1 | | B10 | Notifications & workers | email + ntfy/Gotify/Discord/Telegram, reminders, cron workers | seeded | ⬜ | 0 / 0 | | B11 | Admin panel | users, login mode, auth methods, backups, cleanup, status, onboarding | admin | ⬜ | 0 / 0 | | B12 | Settings, Profile & global UI | `/settings`, `/profile`, static pages, command palette, sidebar/nav | any | ⬜ | 0 / 0 | | B13 | API / backend direct | all `/api/*`: auth, CSRF, validation, rate limits, error shape, IDOR, cents | via HTTP client | 🔄 | 0 / 1 | | B14 | Non-functional | a11y, performance, PWA/offline, XSS/secrets, timezone/DST | large + adversarial | 🔄 | 0 / 3 | | B15 | Regression & sign-off | full smoke on **production build**, exit criteria | seeded | ⬜ | 0 / 0 | > After B15, if any batch is 🔁 or has open S1/S2, loop back. Then start a new > cycle from B0 against the next build/version. ### 1.1 QA Cycle Log One row per full QA cycle (Phase 1 find → Phase 2 fix → … → Phase 5 re-run). A cycle is only "clean" when its **find pass logged zero findings**. Keep going until you get a clean cycle. | Cycle | Started | Build / commit | Findings logged | Fixed / archived | Result | |-------|---------|----------------|-----------------|------------------|--------| | 1 | 2026-07-02 | `bdbf231` (dev) | 9 (find pass ongoing) | 8 → HISTORY v0.41.0 (B9-01, B13-01, B6-01, B14-01, B14-02, B14-03, B0-01, B7-02) | 🔄 in progress — B0/B1/B3/B4/B6/B7/B8/B9/B13/B14 probed. Solid: auth-isolation, CSRF, payment/date validation, **recurrence (quarterly/annual gating, Feb-31 clamp, leap year)**, **transaction matching/dedup**, subscription+spending math, XSS. **Fixed: seed 100× cents (S2), bill-amount validation, negative-money format, all a11y (button-name/svg/aria/nested-interactive — 8/8 pages pass axe), vendor-bundle split, unused-dep + dead-code removal.** Open: 1 (B7-01 rounding S3 [float-inherent, deferred]) | **Result key:** 🔄 in progress · 🔁 findings fixed, re-run required · ✅ clean (zero findings — QA complete) --- ## 2. Active Findings Log **This is the live log.** Record every finding here the moment it's found — before fixing. Keep only **Open / Fixing / Fixed** rows here. Once a finding is **Fixed + verified + archived to `HISTORY.md`**, delete its row from this table (its permanent record is the changelog entry). **Finding ID:** `QA-B{batch}-{nn}` (e.g. `QA-B2-01`). **Severity:** S1 Critical · S2 Major · S3 Minor · S4 Cosmetic · IMP Improvement (see [Appendix A](#appendix-a--severity-definitions)). **Status:** 🔴 Open → 🟡 Fixing → 🟢 Fixed (verified, awaiting archive) → then remove on 📦 Archive. | ID | Sev | Area (`file:line`) | Summary | Status | Notes / repro | |----|-----|--------------------|---------|--------|---------------| | QA-B7-01 | S3 | `utils/money.js:29` | `toCents` mis-rounds fractional cents: `toCents(1.005)` → 100 (`$1.00`) not 101 | 🔴 Open | see write-up (deferred — float-inherent) | **Finding template** (paste a new row above; keep the full write-up here until archived): ``` ID: QA-B?-?? Severity: S1 / S2 / S3 / S4 / IMP Environment: browser / viewport / theme / role / auth mode / data state Area: file:line (if known) Steps to reproduce: 1. 2. Expected: Actual: Evidence: console / network / DB row / screenshot Fix: (what changed, commit) — Verified by: (repro re-run + ci) ``` Log console errors, failed network requests, and unhandled rejections as findings **even if the UI looks fine**. ### Cycle 1 — logged write-ups ``` ID: QA-B7-01 Severity: S3 (minor — wrong edge behavior in money core that advertises exactness) Environment: server-side money math Area: utils/money.js:29 (toCents → Math.round(n * 100)) Steps to reproduce: 1. toCents(1.005) → 100 (i.e. $1.00), not 101 ($1.01). 2. Round-trip fromCents(toCents(1.005)) → 1 (a cent silently lost). Expected: "cent-exact" per the file's own docstring — 1.005 → 101. Actual: float multiply (1.005*100 = 100.4999…) rounds down before Math.round. Evidence: node probe. Other 3-decimal inputs also affected (values near .xx5). Impact: bounded to sub-cent, and only when a 3+ decimal dollar value reaches the boundary (proration/interest), so low severity — but it contradicts the exactness guarantee and is the plan's named "fractional cents" adversarial case. Fix (deferred): round on a string/scaled-integer basis, or add epsilon before round. ``` --- ## 3. Archiving fixed findings to HISTORY.md `HISTORY.md` is the project changelog (version-organized, emoji section headers). When a finding is Fixed **and verified**, write a concise entry there, then remove the row from the Active Findings Log. **Where:** under the current in-progress version heading (e.g. `## v0.41.x`). If a QA cycle produces several fixes, group them under a `### 🐛 QA Fixes` (bug fixes) or `### 🧹 QA` (polish/improvements) section, matching the existing changelog voice. **Entry format** (match the terse, specific style already in `HISTORY.md`): ```markdown ### 🐛 QA Fixes - **[Area] Short title** — What was wrong and the user-visible impact, then the fix. Reference the file/function and any migration or test added. (was QA-B7-03) ``` **Rules** - One bullet per finding; include the old `QA-B?-??` id in parentheses for traceability. - If a fix added/changed a test, say which (`tests/…` or `client/…test.*`). - Don't archive until the fix is verified (repro gone + `npm run ci` green). - IMP items that were implemented are archived the same way; IMP items merely *noted* stay in the Findings Log (or graduate to `FUTURE.md`/`roadmap.md` if deferred). --- ## 4. Environment & setup ### 4.1 Running the app | Mode | Command | URL | |------|---------|-----| | Dev (API + UI, hot reload) | `npm run dev` | UI `http://localhost:5173` (proxies API → `:3000`) | | API only | `npm run dev:api` | `http://localhost:3000` | | Production build | `npm run build` then `npm start` | `http://localhost:3000` | | Docker | `docker-compose up` | per compose config | - Backend: Node/Express on `PORT` (default `3000`). Frontend dev: Vite on `5173`. - Data: SQLite at `db/bills.db` (WAL). **Back it up before destructive tests** (`backups/` or a manual copy). Prefer a scratch DB for B9/B11 restore tests. - Configure a dedicated **test** `.env` from `.env.example`. Never point tests at production data or a live SimpleFIN account with real credentials. - Test commands: `npm run ci` (check + all tests + build), `npm run check` (syntax + build), `npm run test` (server), `npm run test:client` (vitest). ### 4.2 Test matrix Full functional pass across reasonable combinations; smoke (B15) across all. | Dimension | Values | |-----------|--------| | Browser | Chrome/Chromium, Firefox, Safari (WebAuthn differs per browser) | | Viewport | Desktop ≥1280, tablet ~768, mobile ~375 (iPhone SE), ~414 | | Theme | Light, Dark, system-follow | | Role | `user`, `admin`, default admin (first-run) | | Auth mode | Multi-user, single-user | | Density | Normal + compact desktop | | Network | Online, Slow 3G, offline (PWA shell) | | Data state | Empty, seeded demo, large/stress, adversarial | ### 4.3 Accounts to prepare - `admin`, `user`, a **second** `user` (data-isolation), a single-user-mode instance (separate DB). - Demo reference: `guest / guest123` (do not run destructive flows on any shared demo server). ### 4.4 Automated E2E harness (Playwright) Manual passes prove a button works **once**; they don't stop it regressing next cycle. The Playwright suite is the regression net — it drives real clicks in a real browser, and it's where visual-regression, axe-a11y, and fault-injection (§B14) are wired so they re-run every cycle for free. | Command | What it does | |---------|--------------| | `npm run test:e2e` | run the E2E suite headless (boots the app via `webServer`) | | `npm run test:e2e:ui` | Playwright UI mode — watch/debug interactively | | `npm run test:e2e:update` | re-baseline visual-regression screenshots (review the diff before committing) | - **Setup (one-time):** `npm install` then `npx playwright install chromium`. Config: `playwright.config.js`; specs in `e2e/`. - **Scope:** the suite is a **thin critical-path smoke**, not a replacement for the manual playbooks — it locks the happy paths (login → pay bill → skip → note → reconcile), the primitive state matrix, per-page axe scans, and page screenshots. Grow it whenever a manual pass finds a UI regression that a click-test could have caught. - **Don't** point it at production data or a live SimpleFIN account — it runs against a scratch DB with seeded demo data. --- ## 5. Test data strategy - **Empty:** brand-new account. Every page must render a sensible empty state — no crash, no `NaN`, no blank white screen. - **Seeded:** use **Data → Seed Demo Data** for a realistic mid-size dataset. - **Large/stress:** 500+ bills, 5,000+ transactions, 24+ months history — exercises virtualization (`@tanstack/react-virtual`), charts, query perf. - **Adversarial (deliberately try to break it):** - Amounts: `0`, `0.01`, negative, `9,999,999.99`, fractional cents. - Text: emoji, RTL, `