Skip to main content

Codex Logs

Codex Logs provides comprehensive visibility into your AI application’s performance by recording every potential failure detected by Cleanlab’s real-time evaluation system.

Within the Logs section of your Codex Project is a trail of all Cleanlab AI detections made, prioritized by business impact. Each log entry includes detailed evaluation scores, failure classifications, and metadata to help your team efficiently identify and address the most critical issues first.

Codex Logs Interface

What information is logged?

Every log record contains the following categories of information:

1. Core Evaluation Fields

Core fields needed for Codex’s evaluation system:

  • Query and Response (for all evaluations)
  • Retrieved Context (for context-based evaluations)
  • System Prompt and Instructions
  • Date/Timestamp of the interaction

2. Computed Evaluation Results

Automatically computed by Codex for each response:

  • Detection Status: Whether response was detected as “Good” or “Bad”
  • Primary Issue Type (e.g., Search Failure, Hallucination, Unhelpful)
  • Individual Evaluation Scores, including:
    • Trustworthiness
    • Helpfulness
    • Context Sufficiency
    • Groundedness
    • Other Custom evaluation scores

3. Remediation Status:

  • Remediation Usage: Whether a remediation was triggered to protect this response
  • Addressed Status: Indicates whether each detected failure (log) has been addressed by an existing remediation

4. Custom Metadata

Optional metadata provided in your API calls to enrich logging:

  • User’s geographic location
  • Entry point of the user query
  • Chat history
  • User feedback
  • Any additional JSON fields for your specific use case

Logs View Capabilities

The Logs interface includes powerful features for managing AI application safety:

Intelligent Grouping:

  • Similar queries are automatically grouped together, making it easier to identify and address recurring failure patterns
  • Groups help SMEs efficiently handle multiple instances of the same underlying issue
  • Impact scores are aggregated across groups to highlight the most critical failure patterns

Prioritization and Quick Access:

  • Logs are sorted by highest impact to surface critical issues first
  • Quick Remediate option available directly from any unaddressed log
  • One-click access to full context and metadata for each failure

Advanced Filtering:

  • Filter by failure type (hallucinations, search failures, etc.)
  • Sort by evaluation scores and impact metrics
  • Custom metadata filters for your specific use case

Rich Metadata Access:

  • View source documents and original responses
  • Access user context and feedback
  • Track custom metadata for each interaction

Quick Remediate:

  • Create remediations directly from any unaddressed log entry
  • Fix issues while having the complete failure context readily available
  • Automatically protect against similar future failures

Quick Remediate Interface

SME Workflow

The Logs interface serves as SMEs’ primary workspace for improving your AI application. For a complete guide on using Codex to systematically address and prevent AI failures, see our Using Codex as SME tutorial.