Phase 4: Separate Limit Types — Stop Mixing Tokens, Messages, and Cost #39

New Issue

Closed

opened 2026-05-21 00:49:48 -05:00 by null · 0 comments

null commented

2026-05-21 00:49:48 -05:00

Owner

References: remaining-usage-accuracy-review-plan.md — Phase 4

Summary

Pipeline uses input + output as total_tokens and compares that against tokenLimit or even messageLimit. This mixes incompatible unit types. The reference dashboard's local time-to-limit estimate is output-token driven for model windows. Large input/cache-heavy sessions make remaining usage look worse than it is.

Acceptance Criteria

Message counts are never compared to token totals
Input/cache tokens do not inflate an output-token remaining estimate
Each limit type computes percent only from matching units

Work

Replace generic token_limit with typed limits: output_token_limit, total_token_limit, message_limit, cost_limit_usd, request_limit
Compute each percent only from matching units
For Claude-style local estimates, compute output-token model windows separately for Opus/Sonnet where applicable
Keep old fields in API responses for backwards compatibility (deprecation notice)

Do Not

Do not compare message counts against token totals
Do not compare input+output totals against output-token limits
Do not fold cost, requests, messages, and tokens into one percent
Do not remove old fields abruptly if generated clients or dashboard cards still consume them

Priority: Medium
Phase: 4

References: [`remaining-usage-accuracy-review-plan.md`](docs/remaining-usage-accuracy-review-plan.md) — Phase 4 ## Summary Pipeline uses `input + output` as `total_tokens` and compares that against `tokenLimit` or even `messageLimit`. This mixes incompatible unit types. The reference dashboard's local time-to-limit estimate is output-token driven for model windows. Large input/cache-heavy sessions make remaining usage look worse than it is. ## Acceptance Criteria - Message counts are never compared to token totals - Input/cache tokens do not inflate an output-token remaining estimate - Each limit type computes percent only from matching units ## Work 1. Replace generic `token_limit` with typed limits: `output_token_limit`, `total_token_limit`, `message_limit`, `cost_limit_usd`, `request_limit` 2. Compute each percent only from matching units 3. For Claude-style local estimates, compute output-token model windows separately for Opus/Sonnet where applicable 4. Keep old fields in API responses for backwards compatibility (deprecation notice) ## Do Not - Do not compare message counts against token totals - Do not compare input+output totals against output-token limits - Do not fold cost, requests, messages, and tokens into one percent - Do not remove old fields abruptly if generated clients or dashboard cards still consume them **Priority:** Medium **Phase:** 4