Skip to main content

Guardrails & Evaluations

Guardrails and Evaluations are part of the Cleanlab safety layer that runs on every input/output of LLMs in your AI application, providing real-time protection and monitoring.

Overview

Guardrails are safety mechanisms that block AI outputs when they fail to meet specified criteria. When a guardrail is triggered, the AI response is prevented from reaching the user, and an alternative response (such as an SME-provided answer) is served instead.

Evaluations are monitoring tools used for analytics and visibility. They score AI responses based on specific criteria but do not block outputs. Evaluations help you understand AI performance patterns and identify areas for improvement.

Key Differences

AspectGuardrailsEvaluations
PurposeBlock unsafe AI outputsMonitor and analyze AI performance
ActionPrevents response from reaching userScores response without blocking
Use CaseSafety and complianceAnalytics and insights
ImpactDirect user experienceObservability and reporting

Configuring Guardrails

  1. Navigate to the Guardrails page in your Cleanlab AI Platform project to view and manage all configured guardrails

  2. Click “Create Guardrail” on the Guardrails page

  3. Fill in the required configuration:

    Basic Configuration:

    • Name: A descriptive name for your guardrail (e.g., “Brand Safety”, “PII Protection”)
    • Criteria: The full prompt that defines what the guardrail should detect
    • Threshold: The score below/above which the guardrail will trigger
    • Directionality:
      • Below (default): Cases scoring lower than threshold are unsafe
      • Above: Cases scoring higher than threshold are unsafe
    • Should Escalate: Toggle whether guardrail failures require SME attention
      • When enabled, flags when this guardrail fails, that this user query or request will be logged in Cleanlab Issues as an Unaddressed case for SME review

    Identifiers:

    • Query ID: Field name for user input (e.g., “User Query”)
    • Context ID: Field name for retrieved context (e.g., “Context”, “None”)
    • Response ID: Field name for AI response (e.g., “AI Response”)

Guardrail Types

Evaluation-Based Guardrails

These use AI-powered scoring to determine if responses meet safety criteria.

By default your project will come configured with the trustworthiness guardrail that uses Cleanlab’s proprietary uncertainty estimation and custom LLM evaluations to detect unreliable AI responses.

Example Configuration:

Name: Brand Safety
Criteria: Determine whether the AI Response represents ACME Inc. well and meets brand safety criteria...
Threshold: 0.7
Directionality: Below
Query ID: User Query
Context ID: None
Response ID: AI Response

Deterministic Guardrails

These use rule-based logic for specific safety checks (see Deterministic Guardrails documentation for details).

Configuring Evaluations

  1. Navigate to the Evaluations page to configure monitoring and analytics for your AI responses

  2. Click “Create Evaluation” on the Evaluations page

  3. Configure the evaluation parameters:

    Configuration Fields:

    • Name: Descriptive name (e.g., “Helpfulness”, “Accuracy”)
    • Criteria: The evaluation prompt that defines scoring logic
    • Threshold: Score threshold for categorization
    • Directionality: Below/Above threshold for “good” scores
    • Should Escalate: Whether evaluation failures require SME review
      • When enabled, flags when this evaluation fails, that this user query or request will be logged in Cleanlab Issues as an Unaddressed case for SME review
    • Identifiers: Query, Context, and Response field mappings

Default Evaluations

To help you get started, we provide several pre-configured Evaluations (intended primarily for root causing AI issues in RAG applications):

  • Context Sufficiency: Measures whether the retrieved context is sufficient to answer the user’s query
  • Groundedness: Evaluates whether the AI response is grounded in the provided context rather than hallucinating
  • Query Ease: Assesses the complexity and clarity of user queries
  • Helpfulness: Measures how helpful and relevant the AI response is to the user’s query