Codex Logs
Codex Logs provides comprehensive visibility into your AI application’s performance by recording every potential failure detected by Cleanlab’s real-time evaluation system.
Within the Logs section of your Codex Project is a trail of all Cleanlab AI detections made, prioritized by business impact. Each log entry includes detailed evaluation scores, failure classifications, and metadata to help your team efficiently identify and address the most critical issues first.
What information is logged?
Every log record contains the following categories of information:
1. Core Evaluation Fields
Core fields needed for Codex’s evaluation system:
- Query and Response (for all evaluations)
- Retrieved Context (for context-based evaluations)
- System Prompt and Instructions
- Date/Timestamp of the interaction
2. Computed Evaluation Results
Automatically computed by Codex for each response:
- Detection Status: Whether response was detected as “Good” or “Bad”
- Primary Issue Type (e.g., Search Failure, Hallucination, Unhelpful)
- Individual Evaluation Scores, including:
- Trustworthiness
- Helpfulness
- Context Sufficiency
- Groundedness
- Other Custom evaluation scores
3. Remediation Status:
- Remediation Usage: Whether a remediation was triggered to protect this response
- Addressed Status: Indicates whether each detected failure (log) has been addressed by an existing remediation
4. Custom Metadata
Optional metadata provided in your API calls to enrich logging:
- User’s geographic location
- Entry point of the user query
- Chat history
- User feedback
- Any additional JSON fields for your specific use case
Logs View Capabilities
The Logs interface includes powerful features for managing AI application safety:
Intelligent Grouping:
- Similar queries are automatically grouped together, making it easier to identify and address recurring failure patterns
- Groups help SMEs efficiently handle multiple instances of the same underlying issue
- Impact scores are aggregated across groups to highlight the most critical failure patterns
Prioritization and Quick Access:
- Logs are sorted by highest impact to surface critical issues first
- Quick Remediate option available directly from any unaddressed log
- One-click access to full context and metadata for each failure
Advanced Filtering:
- Filter by failure type (hallucinations, search failures, etc.)
- Sort by evaluation scores and impact metrics
- Custom metadata filters for your specific use case
Rich Metadata Access:
- View source documents and original responses
- Access user context and feedback
- Track custom metadata for each interaction
Quick Remediate:
- Create remediations directly from any unaddressed log entry
- Fix issues while having the complete failure context readily available
- Automatically protect against similar future failures
SME Workflow
The Logs interface serves as SMEs’ primary workspace for improving your AI application. For a complete guide on using Codex to systematically address and prevent AI failures, see our Using Codex as SME tutorial.