Populating Codex with manual Questions and Answers
Note: This is not the primary way to use the Codex Web App. That is covered in the tutorial: Using Codex as an SME. There, SMEs enter desired answers to queries appearing in the Codex Project – queries which are logged whenever Codex detects they would be poorly handled by your RAG app.
This tutorial covers an auxiliary workflow where you manually add question/answer pairs into the Project for reasons including:
- You’ve already thought of certain user queries for which you’d like to specify/control your AI application’s response.
- You see a question in Codex and realize there are other related (but not semantically identical) questions that your RAG app will not properly handle.
- You have a historical database of user queries and their desired answers (e.g. responses from human SMEs before your AI application existed).
- You are testing Codex.
After you have created a Project and are viewing its entries, you can click the Add new
button and then manually input any question/answer pair.
After you Save
this entry, new user queries that are similar this question should be properly answered by your RAG app.
Instead of manually populating Question/Answer pairs one by one, you can populate them programatically via the Codex client library method: cleanlab_codex.project.add_entries()
.
Considerations
To write questions effectively: be concise and use general enough phrasing so the question is not overly specific to any one particular user of your RAG app.
For tips on writing effective answers and testing your RAG app after you’ve added new Codex entries, see our tutorial: Using Codex as an SME.
Whether your RAG app responds with your written Answer for user queries similar to your written Question will depend on:
- How Codex is integrated into your RAG application.
- Whether your RAG application can already handle such queries by relying on its Knowledge Base.
- The similarity threshold configuration of this Codex Project.
To ensure your RAG app always gives back your written Answer for similar queries, integrate Codex as-a-Cache which is consulted before running Retrieval-Augmented Generation steps (rather than after these steps like in our Codex as-a-Backup or as-a-Tool integrations)