Skip to main content

Components of Real-Time AI Safety

Cleanlab provides a safety layer that sits on top of your existing AI application to ensure reliable, safe responses in real-time.

What is a Real-time Safety Layer?

A safety layer acts as a protective barrier between your AI system and users, ensuring responses meet quality and safety standards. Cleanlab’s safety layer includes these safety components:

Expert Answers

Expert answers that are curated by your experts and served directly for highly specific, challenging, or sensitive questions, bypassing any risks of AI hallucinations and served deterministically (and optionally verbatim) to users.

Guardrails

Guardrails are automated checks that evaluate AI responses against predefined safety and quality criteria before they reach users. Every LLM response is validated against these guardrails, and responses that fail validation are blocked to prevent potentially harmful, inaccurate, or inappropriate content from being shared.

Safe Fallback Responses

Safe fallback responses are pre-defined, harmless messages that are automatically served when the AI system cannot provide a safe or reliable answer. When responses are blocked by guardrails, these fallbacks maintain good user experience while preventing harmful or incorrect information from being shared.

AI Remediations

Remediations are humam-intervened corrections of problematic AI application behavior, used to improve and patch gaps in the AI’s performance. When your original AI response fails Cleanlab’s guardrails or validatios, the safety layer can opportunistically uses human-provided remediations to improve the outcome for the end user.

Implementation

Here’s pseudocode showing how the safety layer works:

Standard RAG system:

context = KnowledgeBase.retrieve(query)
response = LLM.generate(prompt=query+context)

RAG system with safety layer:

context = KnowledgeBase.retrieve(query)

# Check for expert answer first to save latency
expert_answer = Codex.get_expert_answer(query)
if expert_answer:
response = expert_answer
else:
response = LLM.generate(prompt=query+context)

# Log and validate the response
validation_result = Codex.validate(query, response)

# Check if validation fails (guardrails hit or low trustworthiness)
if validation_result.is_bad:
# Use safeback response for failed validation
response = "I don't have enough information to answer that question accurately."
elif validation_result.remediation:
# Use remediation if available and validation passes
response = validation_result.remediation

Cleanlab offers different approaches for implementing this safety layer, which are described in the Integration Tutorials page.

Contact us to learn more: support@cleanlab.ai