Uploading Data from BigFrames to Cleanlab Studio
In this tutorial, you’ll learn how to upload data from bigframes
to Cleanlab Studio. You’ll start by creating a bigframes
DataFrame and then use the Python client to upload your table. This guide will help you integrate your bigframes
/BigQuery data into Cleanlab Studio efficiently.
This notebook uses the bigframes
package, along with the cleanlab-studio
Python Package.
1. Install and import dependencies
You’ll need to install the cleanlab-studio
package, along with the bigframes
package.
1a. Install the required packages
Required packages are installed using pip
:
%pip install -U cleanlab-studio
%pip install bigframes "numpy>=1.26.0"
import bigframes.pandas as bpd
from cleanlab_studio import Studio
1b. Setup BigQuery options and Cleanlab Studio
To make API calls to BigQuery and Cleanlab Studio, you need to setup BigQuery DataFrame options and create a Cleanlab Studio client.
This tutorial assumes you have already authenticated your Google Cloud account. If you haven’t, you can follow the instructions in the Google Cloud documentation.
Ensure that you set the GCP_PROJECT
variable along with the Cleanlab Studio API key in the following block.
# Set BigQuery DataFrames options
GCP_PROJECT = "<your-gcp-project>"
GCP_REGION = "US"
bpd.options.bigquery.project = GCP_PROJECT
bpd.options.bigquery.location = GCP_REGION
# create a Studio client
# you can find your Cleanlab Studio API key by going to app.cleanlab.ai/account
API_KEY = "<YOUR_API_KEY>"
studio = Studio(API_KEY)
2. Create a DataFrame (from a BigQuery table)
The following code block creates a DataFrame from a (public) BigQuery table. You can use this DataFrame to upload data to Cleanlab Studio.
# get the dataset and read into a DataFrame
query_or_table = "bigquery-public-data.ml_datasets.penguins"
df = bpd.read_gbq(query_or_table)
3. Upload bigframes
DataFrame to Cleanlab Studio
You can use the cleanlab-studio
Python package to upload the bigframes
DataFrame to Cleanlab Studio.
After uploading the data, you can access it in Cleanlab Studio by opening the application and finding the dataset on the Dashboard (or clicking the link below).
# upload the dataset to Cleanlab Studio
dataset_id = studio.upload_from_bigframe(df)
# view the dataset in Cleanlab Studio
print(f"https://app.cleanlab.ai/datasets/{dataset_id}")
4. Conclusion
In this tutorial, you learned how to upload data from BigQuery to Cleanlab Studio. You created a table in BigQuery, configured access, and uploaded the data using the cleanlab-studio
Python package. You can now access your BigQuery data in Cleanlab Studio and use it to create projects. For next steps, check out our Projects guide.