Phase 2-C: Backend Codex CLI / OpenAI GPT Session Source #43

New Issue

Closed

opened 2026-05-24 16:55:17 -05:00 by null · 0 comments

null commented

2026-05-24 16:55:17 -05:00

Owner

Source plan: /home/kaspa/.claude/plans/with-our-backend-created-precious-owl.md

Feature context: Feature 2: Tool Use Analytics

Scope

Phase 2-C — Backend: Codex CLI / OpenAI GPT Session Source

Summary

Add a Codex/OpenAI session source that mirrors the Claude Code backend contract as closely as possible: list local Codex CLI sessions, read a session conversation, and aggregate tool analytics for Codex/GPT activity.

Problem

The Claude Code UI now exposes sessions, messages, and tool analytics, but the data model is still Claude-specific. Users who use Codex CLI or OpenAI/GPT-backed agent workflows need the same operational visibility without a second, unrelated experience.

Affected area

Backend service: Codex/OpenAI session reader
API: provider/session source endpoints
Shared schemas: normalized session, message, tool-use, and analytics contracts

Affected files

backend/app/services/codex_session_reader.py — new reader for local Codex CLI history and OpenAI/GPT session sources
backend/app/services/agent_session_sources.py — optional provider registry/shared normalization helpers
backend/app/api/codex_sessions.py — new Codex/OpenAI API routes, or add provider-aware routes if the app standardizes under one endpoint family
backend/app/schemas/agent_sessions.py — shared provider-neutral schemas if extracting from claude_code.py
backend/app/schemas/claude_code.py — only touch if extending existing schemas is safer than introducing shared schemas
backend/tests/test_codex_session_reader.py — parser/fixture coverage
backend/tests/test_codex_sessions_api.py — API behavior coverage

Affected routes or endpoints

Preferred provider-specific routes:

GET /api/v1/codex/sessions
GET /api/v1/codex/sessions/{session_id}
GET /api/v1/codex/sessions/{session_id}/messages
GET /api/v1/codex/analytics/tools?days=7|30|90

Optional provider-neutral routes if the implementation chooses to generalize first:

GET /api/v1/agent-sessions/sources
GET /api/v1/agent-sessions/{source}/sessions
GET /api/v1/agent-sessions/{source}/sessions/{session_id}/messages
GET /api/v1/agent-sessions/{source}/analytics/tools?days=7|30|90

Data-source requirements

Codex CLI:
- Discover local Codex CLI history/session files from ~/.codex when available.
- Add an env override such as CODEX_SESSIONS_PATH so tests and nonstandard installs do not depend on a hard-coded path.
- Detect and gracefully report when Codex CLI is installed but no readable session history exists.
- Parse local session records into the same normalized concepts used by Claude:
  - session id
  - title
  - project/workspace directory
  - model(s)
  - token usage if present
  - message count
  - first/last message timestamps
  - active/completed status
  - tool calls and tool results if present
OpenAI/GPT API:
- Do not attempt to scrape ChatGPT web history.
- Only surface OpenAI/GPT sessions if Pipeline has a local/owned event source: gateway logs, stored API traces, an explicit import file, or a future collector.
- Represent unavailable OpenAI/GPT history as a clear source status, not as an error.
- Preserve provider labels so users can distinguish Codex CLI, OpenAI API, and future GPT sources.

Normalized backend contract

Keep the frontend shape aligned with Claude Code wherever possible:
- session_id
- source (claude_code, codex_cli, openai_api)
- provider_label
- project_dir
- cwd
- title
- models
- tokens
- cost_usd
- billing_source
- message_count
- first_message_at
- last_message_at
- is_active
- entrypoints
- git_branch
Messages should map to the same UI concepts:
- user text blocks
- assistant text blocks
- reasoning/thinking blocks if the source exposes them
- tool calls
- tool results
- token usage per assistant turn when available
Tool analytics should return the same structure as Claude analytics:
- tool_counts
- top_files_read
- top_files_written
- top_commands
- session_count
- date_range_days
Include source metadata in responses when helpful:
- source
- source_status
- source_path
- last_scanned_at

Security and privacy requirements

Treat Codex/OpenAI session logs as sensitive local data.
Reuse the same organization/member auth requirements as Claude Code endpoints.
Redact secrets from tool inputs, commands, env values, headers, URLs, and API request bodies.
Never return raw credential files such as ~/.codex/auth.json.
Do not read outside discovered/explicitly configured session roots.

Expected behavior

Codex CLI sessions appear through a backend API with the same shape as Claude sessions.
Codex/GPT messages can be rendered by the existing frontend message components with minimal branching.
Codex/GPT tool analytics can be rendered by the same analytics components used for Claude.
If Codex history format is missing, unsupported, or unavailable, API returns a graceful empty/source-unavailable response.
OpenAI/GPT API history is only shown when Pipeline has an owned event source; otherwise, it is listed as unavailable with setup guidance.

Steps to reproduce (acceptance criteria)

Configure a fixture Codex sessions path via CODEX_SESSIONS_PATH
Call GET /api/v1/codex/sessions
Response includes normalized sessions with provider/source metadata
Call GET /api/v1/codex/sessions/{session_id}/messages
Response includes normalized user/assistant turns and tool calls when fixture data contains them
Call GET /api/v1/codex/analytics/tools?days=30
Response shape matches Claude analytics and reflects fixture tool usage
Missing Codex history returns an empty/source-unavailable response, not a server error
Tests cover parsing, unavailable source behavior, redaction, pagination, and analytics aggregation

New opportunity to add

Add GET /api/v1/agent-sessions/sources to return source cards for the frontend:
- Claude Code: available/unavailable, session count, last activity
- Codex CLI: available/unavailable, session count, last activity
- OpenAI API: available/unavailable, reason/setup hint
This would let the UI show a polished provider switcher with honest availability instead of hiding missing sources.

Source plan: `/home/kaspa/.claude/plans/with-our-backend-created-precious-owl.md` Feature context: **Feature 2: Tool Use Analytics** ## Scope ### Phase 2-C — Backend: Codex CLI / OpenAI GPT Session Source #### Summary Add a Codex/OpenAI session source that mirrors the Claude Code backend contract as closely as possible: list local Codex CLI sessions, read a session conversation, and aggregate tool analytics for Codex/GPT activity. #### Problem The Claude Code UI now exposes sessions, messages, and tool analytics, but the data model is still Claude-specific. Users who use Codex CLI or OpenAI/GPT-backed agent workflows need the same operational visibility without a second, unrelated experience. #### Affected area - Backend service: Codex/OpenAI session reader - API: provider/session source endpoints - Shared schemas: normalized session, message, tool-use, and analytics contracts #### Affected files - `backend/app/services/codex_session_reader.py` — new reader for local Codex CLI history and OpenAI/GPT session sources - `backend/app/services/agent_session_sources.py` — optional provider registry/shared normalization helpers - `backend/app/api/codex_sessions.py` — new Codex/OpenAI API routes, or add provider-aware routes if the app standardizes under one endpoint family - `backend/app/schemas/agent_sessions.py` — shared provider-neutral schemas if extracting from `claude_code.py` - `backend/app/schemas/claude_code.py` — only touch if extending existing schemas is safer than introducing shared schemas - `backend/tests/test_codex_session_reader.py` — parser/fixture coverage - `backend/tests/test_codex_sessions_api.py` — API behavior coverage #### Affected routes or endpoints Preferred provider-specific routes: - `GET /api/v1/codex/sessions` - `GET /api/v1/codex/sessions/{session_id}` - `GET /api/v1/codex/sessions/{session_id}/messages` - `GET /api/v1/codex/analytics/tools?days=7|30|90` Optional provider-neutral routes if the implementation chooses to generalize first: - `GET /api/v1/agent-sessions/sources` - `GET /api/v1/agent-sessions/{source}/sessions` - `GET /api/v1/agent-sessions/{source}/sessions/{session_id}/messages` - `GET /api/v1/agent-sessions/{source}/analytics/tools?days=7|30|90` #### Data-source requirements - Codex CLI: - Discover local Codex CLI history/session files from `~/.codex` when available. - Add an env override such as `CODEX_SESSIONS_PATH` so tests and nonstandard installs do not depend on a hard-coded path. - Detect and gracefully report when Codex CLI is installed but no readable session history exists. - Parse local session records into the same normalized concepts used by Claude: - session id - title - project/workspace directory - model(s) - token usage if present - message count - first/last message timestamps - active/completed status - tool calls and tool results if present - OpenAI/GPT API: - Do not attempt to scrape ChatGPT web history. - Only surface OpenAI/GPT sessions if Pipeline has a local/owned event source: gateway logs, stored API traces, an explicit import file, or a future collector. - Represent unavailable OpenAI/GPT history as a clear source status, not as an error. - Preserve provider labels so users can distinguish `Codex CLI`, `OpenAI API`, and future GPT sources. #### Normalized backend contract - Keep the frontend shape aligned with Claude Code wherever possible: - `session_id` - `source` (`claude_code`, `codex_cli`, `openai_api`) - `provider_label` - `project_dir` - `cwd` - `title` - `models` - `tokens` - `cost_usd` - `billing_source` - `message_count` - `first_message_at` - `last_message_at` - `is_active` - `entrypoints` - `git_branch` - Messages should map to the same UI concepts: - user text blocks - assistant text blocks - reasoning/thinking blocks if the source exposes them - tool calls - tool results - token usage per assistant turn when available - Tool analytics should return the same structure as Claude analytics: - `tool_counts` - `top_files_read` - `top_files_written` - `top_commands` - `session_count` - `date_range_days` - Include source metadata in responses when helpful: - `source` - `source_status` - `source_path` - `last_scanned_at` #### Security and privacy requirements - Treat Codex/OpenAI session logs as sensitive local data. - Reuse the same organization/member auth requirements as Claude Code endpoints. - Redact secrets from tool inputs, commands, env values, headers, URLs, and API request bodies. - Never return raw credential files such as `~/.codex/auth.json`. - Do not read outside discovered/explicitly configured session roots. #### Expected behavior - Codex CLI sessions appear through a backend API with the same shape as Claude sessions. - Codex/GPT messages can be rendered by the existing frontend message components with minimal branching. - Codex/GPT tool analytics can be rendered by the same analytics components used for Claude. - If Codex history format is missing, unsupported, or unavailable, API returns a graceful empty/source-unavailable response. - OpenAI/GPT API history is only shown when Pipeline has an owned event source; otherwise, it is listed as unavailable with setup guidance. #### Steps to reproduce (acceptance criteria) 1. Configure a fixture Codex sessions path via `CODEX_SESSIONS_PATH` 2. Call `GET /api/v1/codex/sessions` 3. Response includes normalized sessions with provider/source metadata 4. Call `GET /api/v1/codex/sessions/{session_id}/messages` 5. Response includes normalized user/assistant turns and tool calls when fixture data contains them 6. Call `GET /api/v1/codex/analytics/tools?days=30` 7. Response shape matches Claude analytics and reflects fixture tool usage 8. Missing Codex history returns an empty/source-unavailable response, not a server error 9. Tests cover parsing, unavailable source behavior, redaction, pagination, and analytics aggregation #### New opportunity to add - Add `GET /api/v1/agent-sessions/sources` to return source cards for the frontend: - Claude Code: available/unavailable, session count, last activity - Codex CLI: available/unavailable, session count, last activity - OpenAI API: available/unavailable, reason/setup hint - This would let the UI show a polished provider switcher with honest availability instead of hiding missing sources. ---