Synthetic
Warning:
The utility methods in utils
are not guaranteed to be stable between different versions of the cleanlab-studio
API.
module cleanlab_studio.utils.synthetic
Collection of utility functions for Cleanlab Studio Python API
function score_synthetic_dataset
score_synthetic_dataset(
cleanset_df: DataFrame,
real_or_synth_column: str = 'real_or_synthetic',
synthetic_class_names: Optional[Tuple[str, str]] = None
) → Dict[str, float]
Computes the issue scores for a dataset consisting of real and synthetic data, to evaluate any overarching issues within the synthetic dataset.
Args:
cleanset_df
: The dataframe containing the dataset to score. It should contain a column named “real_or_synthetic” that indicates whether each example is real or synthetic. It should also have the cleanset columns provided by Cleanlab Studio.real_or_synth_column
: The name of the column that indicates whether each example is real or synthetic.synthetic_class_names
: The class names of the “real_or_synthetic” column (ie. which class corresponds to real examples, which to synthetic examples). If None, the default values are (“synthetic”, “real”). The first class name should correspond to the synthetic examples, and the second class name should correspond to the real examples.