Enrichment

`module` `cleanlab_studio.studio.enrichment`

Methods for interfacing with Enrichment Projects.

This module is not meant to be imported and used directly. Instead, use Studio.get_enrichment_project() to instantiate an EnrichmentProject object.

Global Variables

ROW_ID_COLUMN_NAME
REGEX_PARAMETER_ERROR_MESSAGE
CLEANLAB_ROW_ID_COLUMN_NAME
CHECK_READY_INTERVAL

`class` `EnrichmentJobStatusEnum`

An enumeration.

`class` `EnrichmentProject`

Represents an Enrichment Project instance, which is bound to a Cleanlab Studio account.

EnrichmentProjects should be instantiated using the Studio.get_enrichment_project() method.

`method` `init`

__init__(
    api_key: 'str',
    id: 'str',
    name: 'str',
    created_at: 'Optional[Union[str, datetime]]' = None
) → None

Initialize an EnrichmentProject.

Objects of this class are not meant to be constructed directly. Instead, use Studio.get_enrichment_project().

`property` created_at

(datetime.datetime) When the Enrichment Project was created.

`property` id

(str) ID of the Enrichment Project.

`property` name

(str) Name of the Enrichment Project.

`property` ready

Check if the latest enrichment job is ready or not.

If one ran a preview after the last run, this method will raise an error since the latest job is a preview.

`property` updated_at

(datetime.datetime) When the Enrichment Project was last updated.

`method` `download_results`

download_results(
    job_id: 'Optional[str]' = None,
    include_original_dataset: 'Optional[bool]' = False
) → EnrichmentResults

Retrieve the results of an enrichment job.

This method fetches the results of a specified enrichment job. If no job_id is provided, it will default to retrieving the results of the latest job.

Args:

job_id (str, optional): The ID of the job to retrieve results from. If not provided, the latest job will be used.
include_original_dataset (bool, optional): If True, the original dataset will be included in the returned results. Defaults to False.

`method` `export_results_as_csv`

export_results_as_csv(job_id: 'Optional[str]' = None) → None

Download the results of a job.

`method` `list_all_jobs`

list_all_jobs() → List[EnrichmentJob]

List all jobs in the project.

`method` `pause`

pause() → None

Pause the latest batch job.

`method` `preview`

preview(
    options: 'EnrichmentOptions',
    new_column_name: 'str',
    indices: 'Optional[List[int]]' = None
) → EnrichmentPreviewResults

Enrich a subset of data for a preview.

Args:

options (EnrichmentOptions): Options for enriching the dataset.
new_column_name (str): The name of the new column to store the prompt results.
indices (List[int], optional): The indices of the rows to enrich, up to 10. If None, three rows in the dataset will be randomly picked.

`method` `resume`

resume() → JSONDict

Resume the latest batch job.

`method` `run`

run(options: 'EnrichmentOptions', new_column_name: 'str') → dict[str, Any]

Enrich the entire dataset using the provided prompt.

This method triggers a remote job that applies TLM to each row of the dataset based on the given prompt. The process will run on a remote server and will block execution until the job is fully completed.

Args:

options (EnrichmentOptions): Options for enriching the dataset.
new_column_name (str): The name of the new column to store the prompt results.

`method` `show_trustworthiness_score_history`

show_trustworthiness_score_history() → None

Show the trustworthiness score history of all jobs in the project.

`method` `to_dict`

to_dict() → Dict[str, Any]

Returns a dictionary of EnrichmentProject metadata.

`method` `wait_until_ready`

wait_until_ready() → None

Wait until the latest enrichment job is ready.

`class` `EnrichmentJob`

Represents an Enrichment Job instance.

This class is not meant to be constructed directly. Instead, use the EnrichmentProject methods to create and manage Enrichment Jobs.

`class` `EnrichmentOptions`

Options for enriching a dataset with a Trustworthy Language Model (TLM).

Args:

prompt (str): Using string.Template, that contains both the prompt, and names of columns to embed.
`Example`: “Is this a numeric value, answer Yes or No only. Value: ${column_name}”
constrain_outputs (List[str], optional): List of all possible output values for the metadata column. If specified, every entry in the metadata column will exactly match one of these values (for less open-ended data enrichment tasks). If None, the metadata column can contain arbitrary values (for more open-ended data enrichment tasks). There may be additional transformations applied to ensure the returned value is one of these. If regex is also specified, then these transformations occur after your regex is applied. If optimize_prompt is True, the prompt will be automatically adjusted to include a statement that the response must match one of the constrain_outputs.
optimize_prompt (bool, default = True): When False, your provided prompt will not be modified in any way. When True, your provided prompt may be automatically adjusted in an effort to produce better results.
For instance, if the constrain_outputs are constrained, we may automatically append the following statement to your prompt: “Your answer must exactly match one of the following values: constrain_outputs.”
quality_preset (TLMQualityPreset, default = “medium”): The quality preset to use for the Trustworthy Language Model (TLM) to use for data enrichment.
regex (str | Replacement | List[Replacement], optional): A string, tuple, or list of tuples specifying regular expressions to apply for post-processing the raw LLM outputs. If a string value is passed in, a regex match will be performed and the matched pattern will be returned (if the pattern cannot be matched, None will be returned). Specifically the provided string will be passed into Python’s re.match() method. Pass in a tuple (R1, R2) instead if you wish to perform find and replace operations rather than matching/extraction. R1 should be a string containing the regex pattern to match, and R2 should be a string to replace matches with. Pass in a list of tuples instead if you wish to apply multiple replacements. Replacements will be applied in the order they appear in the list. Note that you cannot pass in a list of strings (chaining of multiple regex processing steps is only allowed for replacement operations).
These tuples specify the desired patterns to match and replace from the raw LLM response, This regex processing is useful in settings where you are unable to prompt the LLM to generate valid outputs 100% of the time, but can easily transform the raw LLM outputs to be valid through regular expressions that extract and replace parts of the raw output string. When this regex is applied, the processed results can be seen ithe {new_column_name} column, and the raw outpus (before any regex processing) will be saved in the {new_column_name}_log column of the results dataframe.

`Example 1`: regex = '.*The answer is: (Bird|[Rr]abbit).*' will extract strings that are the words ‘Bird’, ‘Rabbit’ or ‘rabbit’ after the characters “The answer is: ” from the raw response.
`Example 2`: regex = [('True', 'T'), ('False', 'F')] will replace the words True and False with T and F.
`Example 3`: “regex = (’ Explanation:.*’, ”) will remove everything after and including the words “Explanation:“.
For instance, the response "True. Explanation: 3+4=7, and 7 is an odd number.” would return “True.” after the regex replacement.
tlm_options (TLMOptions, default = {}): Options for the Trustworthy Language Model (TLM) to use for data enrichment.

`class` `EnrichmentResults`

Enrichment result.

`method` `init`

__init__(results: 'DataFrame')

`method` `details`

details() → DataFrame

`classmethod` `from_dataframe`

from_dataframe(df: 'DataFrame') → EnrichmentResults

`classmethod` `from_dict`

from_dict(
    json_dict: 'List[JSONDict]',
    include_original_dataset: 'Optional[bool]' = False
) → EnrichmentResults

`method` `join`

join(original_data: 'DataFrame', with_details: 'bool' = False) → DataFrame

`class` `EnrichmentPreviewResults`

Enrichment preview results.

`method` `init`

__init__(results: 'DataFrame')

`method` `details`

details() → DataFrame

`classmethod` `from_dataframe`

from_dataframe(df: 'DataFrame') → EnrichmentResults

`classmethod` `from_dict`

from_dict(
    json_dict: 'List[JSONDict]',
    include_original_dataset: 'Optional[bool]' = False
) → EnrichmentPreviewResults

`method` `join`

join(original_data: 'DataFrame', with_details: 'bool' = False) → DataFrame

Join the original data with the enrichment results. The result only contains those rows that were enriched by preview.

Args:

original_data (pd.DataFrame): The original data to join with the enrichment results.
with_details (bool): If with_details is True, the details of the enrichment results will be included in the output DataFrame.

module cleanlab_studio.studio.enrichment

Global Variables​

class EnrichmentJobStatusEnum​

class EnrichmentProject​

method __init__​

property created_at​

property id​

property name​

property ready​

property updated_at​

method download_results​

method export_results_as_csv​

method list_all_jobs​

method pause​

method preview​

method resume​

method run​

method show_trustworthiness_score_history​

method to_dict​

method wait_until_ready​

class EnrichmentJob​

class EnrichmentOptions​

class EnrichmentResults​

method __init__​

method details​

classmethod from_dataframe​

classmethod from_dict​

method join​

class EnrichmentPreviewResults​

method __init__​

method details​

classmethod from_dataframe​

classmethod from_dict​

method join​

`module` `cleanlab_studio.studio.enrichment`

Global Variables

`class` `EnrichmentJobStatusEnum`

`class` `EnrichmentProject`

`method` `init`

`property` created_at

`property` id

`property` name

`property` ready

`property` updated_at

`method` `download_results`

`method` `export_results_as_csv`

`method` `list_all_jobs`

`method` `pause`

`method` `preview`

`method` `resume`

`method` `run`

`method` `show_trustworthiness_score_history`

`method` `to_dict`

`method` `wait_until_ready`

`class` `EnrichmentJob`

`class` `EnrichmentOptions`

`class` `EnrichmentResults`

`method` `init`

`method` `details`

`classmethod` `from_dataframe`

`classmethod` `from_dict`

`method` `join`

`class` `EnrichmentPreviewResults`

`method` `init`

`method` `details`

`classmethod` `from_dataframe`

`classmethod` `from_dict`

`method` `join`