Codex Logs

Codex Logs provides comprehensive visibility into your AI application’s performance by recording every potential failure detected by Cleanlab’s real-time evaluation system.

Within the Logs section of your Codex Project is a trail of all Cleanlab AI detections made, prioritized by business impact. Each log entry includes detailed evaluation scores, failure classifications, and metadata to help your team efficiently identify and address the most critical issues first.

Codex Logs Interface

What information is logged?

Every log record contains the following categories of information:

1. Core Evaluation Fields

Core fields needed for Codex’s evaluation system:

Query and Response (for all evaluations)
Retrieved Context (for context-based evaluations)
System Prompt and Instructions
Date/Timestamp of the interaction

2. Computed Evaluation Results

Automatically computed by Codex for each response:

Detection Status: Whether response was detected as “Good” or “Bad”
Primary Issue Type (e.g., Search Failure, Hallucination, Unhelpful)
Individual Evaluation Scores, including:
- Trustworthiness
- Helpfulness
- Context Sufficiency
- Groundedness
- Other Custom evaluation scores

3. Remediation Status:

Remediation Usage: Whether a remediation was triggered to protect this response
Addressed Status: Indicates whether each detected failure (log) has been addressed by an existing remediation

4. Custom Metadata

Optional metadata provided in your API calls to enrich logging:

User’s geographic location
Entry point of the user query
Chat history
User feedback
Any additional JSON fields for your specific use case

Logs View Capabilities

The Logs interface includes powerful features for managing AI application safety:

Intelligent Grouping:

Similar queries are automatically grouped together, making it easier to identify and address recurring failure patterns
Groups help SMEs efficiently handle multiple instances of the same underlying issue
Impact scores are aggregated across groups to highlight the most critical failure patterns

Prioritization and Quick Access:

Logs are sorted by highest impact to surface critical issues first
Quick Remediate option available directly from any unaddressed log
One-click access to full context and metadata for each failure

Advanced Filtering:

Filter by failure type (hallucinations, search failures, etc.)
Sort by evaluation scores and impact metrics
Custom metadata filters for your specific use case

Rich Metadata Access:

View source documents and original responses
Access user context and feedback
Track custom metadata for each interaction

Quick Remediate:

Create remediations directly from any unaddressed log entry
Fix issues while having the complete failure context readily available
Automatically protect against similar future failures

Quick Remediate Interface

SME Workflow

The Logs interface serves as SMEs’ primary workspace for improving your AI application. For a complete guide on using Codex to systematically address and prevent AI failures, see our Using Codex as SME tutorial.

What information is logged?​

1. Core Evaluation Fields​

2. Computed Evaluation Results​

3. Remediation Status:​

4. Custom Metadata​

Logs View Capabilities​

SME Workflow​