Skip to main content

Uploading Data from BigFrames to Cleanlab Studio

Run in Google ColabRun in Google Colab

In this tutorial, you’ll learn how to upload data from bigframes to Cleanlab Studio. You’ll start by creating a bigframes DataFrame and then use the Python client to upload your table. This guide will help you integrate your bigframes/BigQuery data into Cleanlab Studio efficiently.

This notebook uses the bigframes package, along with the cleanlab-studio Python Package.

1. Install and import dependencies

You’ll need to install the cleanlab-studio package, along with the bigframes package.

1a. Install the required packages

Required packages are installed using pip:

%pip install -U cleanlab-studio
%pip install bigframes "numpy>=1.26.0"
import bigframes.pandas as bpd
from cleanlab_studio import Studio

1b. Setup BigQuery options and Cleanlab Studio

To make API calls to BigQuery and Cleanlab Studio, you need to setup BigQuery DataFrame options and create a Cleanlab Studio client.

This tutorial assumes you have already authenticated your Google Cloud account. If you haven’t, you can follow the instructions in the Google Cloud documentation.

Ensure that you set the GCP_PROJECT variable along with the Cleanlab Studio API key in the following block.

# Set BigQuery DataFrames options
GCP_PROJECT = "<your-gcp-project>"
GCP_REGION = "US"
bpd.options.bigquery.project = GCP_PROJECT
bpd.options.bigquery.location = GCP_REGION

# create a Studio client
# you can find your Cleanlab Studio API key by going to app.cleanlab.ai/account
API_KEY = "<YOUR_API_KEY>"
studio = Studio(API_KEY)

2. Create a DataFrame (from a BigQuery table)

The following code block creates a DataFrame from a (public) BigQuery table. You can use this DataFrame to upload data to Cleanlab Studio.

# get the dataset and read into a DataFrame
query_or_table = "bigquery-public-data.ml_datasets.penguins"
df = bpd.read_gbq(query_or_table)

3. Upload bigframes DataFrame to Cleanlab Studio

You can use the cleanlab-studio Python package to upload the bigframes DataFrame to Cleanlab Studio.

After uploading the data, you can access it in Cleanlab Studio by opening the application and finding the dataset on the Dashboard (or clicking the link below).

# upload the dataset to Cleanlab Studio
dataset_id = studio.upload_from_bigframe(df)

# view the dataset in Cleanlab Studio
print(f"https://app.cleanlab.ai/datasets/{dataset_id}")

4. Conclusion

In this tutorial, you learned how to upload data from BigQuery to Cleanlab Studio. You created a table in BigQuery, configured access, and uploaded the data using the cleanlab-studio Python package. You can now access your BigQuery data in Cleanlab Studio and use it to create projects. For next steps, check out our Projects guide.