Phase 4: Separate Limit Types — Stop Mixing Tokens, Messages, and Cost #39

Closed
opened 2026-05-21 00:49:48 -05:00 by null · 0 comments
Owner

References: remaining-usage-accuracy-review-plan.md — Phase 4

Summary

Pipeline uses input + output as total_tokens and compares that against tokenLimit or even messageLimit. This mixes incompatible unit types. The reference dashboard's local time-to-limit estimate is output-token driven for model windows. Large input/cache-heavy sessions make remaining usage look worse than it is.

Acceptance Criteria

  • Message counts are never compared to token totals
  • Input/cache tokens do not inflate an output-token remaining estimate
  • Each limit type computes percent only from matching units

Work

  1. Replace generic token_limit with typed limits: output_token_limit, total_token_limit, message_limit, cost_limit_usd, request_limit
  2. Compute each percent only from matching units
  3. For Claude-style local estimates, compute output-token model windows separately for Opus/Sonnet where applicable
  4. Keep old fields in API responses for backwards compatibility (deprecation notice)

Do Not

  • Do not compare message counts against token totals
  • Do not compare input+output totals against output-token limits
  • Do not fold cost, requests, messages, and tokens into one percent
  • Do not remove old fields abruptly if generated clients or dashboard cards still consume them

Priority: Medium
Phase: 4

References: [`remaining-usage-accuracy-review-plan.md`](docs/remaining-usage-accuracy-review-plan.md) — Phase 4 ## Summary Pipeline uses `input + output` as `total_tokens` and compares that against `tokenLimit` or even `messageLimit`. This mixes incompatible unit types. The reference dashboard's local time-to-limit estimate is output-token driven for model windows. Large input/cache-heavy sessions make remaining usage look worse than it is. ## Acceptance Criteria - Message counts are never compared to token totals - Input/cache tokens do not inflate an output-token remaining estimate - Each limit type computes percent only from matching units ## Work 1. Replace generic `token_limit` with typed limits: `output_token_limit`, `total_token_limit`, `message_limit`, `cost_limit_usd`, `request_limit` 2. Compute each percent only from matching units 3. For Claude-style local estimates, compute output-token model windows separately for Opus/Sonnet where applicable 4. Keep old fields in API responses for backwards compatibility (deprecation notice) ## Do Not - Do not compare message counts against token totals - Do not compare input+output totals against output-token limits - Do not fold cost, requests, messages, and tokens into one percent - Do not remove old fields abruptly if generated clients or dashboard cards still consume them **Priority:** Medium **Phase:** 4
null added the
api
backend
frontend
phase:4
priority:medium
usage-accuracy
labels 2026-05-21 00:53:41 -05:00
null closed this issue 2026-05-21 01:41:21 -05:00
Sign in to join this conversation.
No description provided.