Guardrails & Evaluations
Guardrails and Evaluations are part of the Cleanlab safety layer that runs on every input/output of LLMs in your AI application, providing real-time protection and monitoring.
Overview
Guardrails are safety mechanisms that block AI outputs when they fail to meet specified criteria. When a guardrail is triggered, the AI response is prevented from reaching the user, and an alternative response (such as an SME-provided answer) is served instead.
Evaluations are monitoring tools used for analytics and visibility. They score AI responses based on specific criteria but do not block outputs. Evaluations help you understand AI performance patterns and identify areas for improvement.
Key Differences
Aspect | Guardrails | Evaluations |
---|---|---|
Purpose | Block unsafe AI outputs | Monitor and analyze AI performance |
Action | Prevents response from reaching user | Scores response without blocking |
Use Case | Safety and compliance | Analytics and insights |
Impact | Direct user experience | Observability and reporting |
Configuring Guardrails
-
Navigate to the Guardrails page in your Cleanlab AI Platform project to view and manage all configured guardrails
-
Click “Create Guardrail” on the Guardrails page
-
Fill in the required configuration:
Basic Configuration:
- Name: A descriptive name for your guardrail (e.g., “Brand Safety”, “PII Protection”)
- Criteria: Text describing what factors the guardrail should consider (and what is considered good vs bad)
- Threshold: The score below/above which the guardrail will trigger
- Directionality:
- Below (default): Cases scoring lower than threshold are unsafe
- Above: Cases scoring higher than threshold are unsafe
- Should Escalate: Toggle whether guardrail failures warrant subsequent SME review
- If enabled, then when this guardrail fails, the corresponding user query will be logged/prioritized in Cleanlab Issues as an Unaddressed case for subsequent SME review
Identifiers:
- Query ID: Field name for user input (e.g., “User Query”)
- Context ID: Field name for retrieved context (e.g., “Context”, “None”)
- Response ID: Field name for AI response (e.g., “AI Response”)
Guardrail Types
Evaluation-Based Guardrails
These use AI-powered scoring to determine if responses meet certain criteria.
By default, your project will come configured with the trustworthiness guardrail that uses Cleanlab’s proprietary LLM uncertainty estimation to detect potentially wrong AI responses.
Example Configuration:
Name: Brand Safety
Criteria: Determine whether the AI Response represents ACME Inc. well and meets brand safety criteria...
Threshold: 0.7
Directionality: Below
Query ID: User Query
Context ID: None
Response ID: AI Response
Deterministic Guardrails
These use rule-based logic for specific safety checks (see Deterministic Guardrails documentation for details).
Configuring Evaluations
-
Navigate to the Evaluations page to configure monitoring and analytics for your AI responses
-
Click “Create Evaluation” on the Evaluations page
-
Configure the evaluation parameters:
Configuration Fields:
- Name: Descriptive name (e.g., “Helpfulness”, “Accuracy”)
- Criteria: Text describing what factors the evaluation should consider (and what is considered good vs bad)
- Threshold: Score threshold for categorization
- Directionality: Below/Above threshold for “good” scores
- Should Escalate: Whether evaluation failures warrant subsequent SME review
- If enabled, then when this evaluation fails, the corresponding user query will be logged/prioritized in Cleanlab Issues as an Unaddressed case for SME review
- Identifiers: Query, Context, and Response field mappings (only specify those fields this Evaluation should depend on)
Default Evaluations
To help you get started, we provide several pre-configured Evaluations (intended primarily for root causing AI issues in RAG applications):
- Context Sufficiency: Measures whether the retrieved context is sufficient to answer the user’s query
- Groundedness: Evaluates whether the AI response is grounded in the provided context rather than hallucinating
- Query Ease: Assesses the complexity and clarity of user queries
- Helpfulness: Measures how helpful and relevant the AI response is to the user’s query
Learn more about Evaluations/Guardrails (writing criteria, specifying identifiers, etc).