Skip to main content

Codex Logs

Codex’s web app integrates with Cleanlab’s real-time hallucination detection, evaluations, and interventions to provide a source of truth for all interactions and context between Codex and your AI application.

Within the Logs section of the Codex Web App, you can filter, sort, and view every response evaluated by Codex, including its labels, evaluation scores, and any custom metadata you’ve logged.

Codex Logs Interface

What information is logged?

Every log record contains three general categories of information:

1. Default Metadata of your AI application’s inputs and outputs

With every interaction between Codex and your AI application, Codex logs the context used for Cleanlab’s real-time hallucination detection and evaluations. This includes:

  • Date/Timestamp
  • The user’s query and system prompt
  • The retrieved documents used in the LLM’s context
  • The AI application’s response

2. Codex’s Real-Time Evaluation Labels and Scores

Each of your AI application’s responses evaluated by Codex includes:

  • Information about each evaluation run (e.g., trustworthiness, helpfulness, instruction-adherence)
  • The scores from those evaluations
  • The primary issue for why it was a bad response
  • Labels for all evaluations, including standard and custom evals
  • [Coming soon] Good/bad response classification

3. Custom Metadata

You can configure additional metadata to supplement the AI application’s interaction information. This metadata is available in Codex’s Logging interface for SMEs to reference when providing answers or for metadata-based filtering in Analytics. Examples include:

  • User’s geographic location
  • Entry point of the user query
  • Chat history
  • User feedback (e.g., thumbs up/down)

[Coming soon] 4. Remediation Information

When Codex uses an SME-approved answer to mitigate a “detected-bad” response, the Logs capture this interaction. You’ll see:

  • Whether the interaction had a Codex “hit” at query time
  • The Codex answer provided to the AI application instead of the original bad response
  • Whether the entry is actively “covered” by a Codex Answer (i.e., if Codex would intervene if this query were repeated)
  • All associated metadata (default, evaluation, and custom)

How should my SMEs use this Logs interface?

The Logs interface serves as SMEs’ primary workspace for improving your AI application. Here, you can systematically address problematic responses by providing high-quality answers that Codex will use to improve future interactions.

Entry Point for Providing Answers

The Logs view is your starting point for improving the AI application. By default, logs are:

  • Grouped into similar queries (described more in Grouping)
  • Sorted by Highest Impact, helping you focus on the most critical issues first (described more in Ranking)

Filtering and Sorting

Leverage powerful filtering and sorting capabilities to efficiently address issues:

  • Filter by Evaluation Failure: Focus on specific problem types (e.g., hallucinations, search failures, instruction adherence)
  • Sort by Evaluation Score: Prioritize the most critical cases

Viewing Logged Metadata

Access metadata in both the Editor experience and through the Metadata button for any interaction. Use the metadata view to:

  • Review source documents used in the response
  • Examine the original AI application’s response
  • Check user context and feedback
  • Reference any custom metadata relevant to the user’s question