Vertex AI Training and Serving with TFX and Vertex Pipelines

Jul 31, 2023

bookmark_border

This notebook-based tutorial will create and run a TFX pipeline which trains an ML model using Vertex AI Training service and publishes it to Vertex AI for serving.

This notebook is based on the TFX pipeline we built in Simple TFX Pipeline for Vertex Pipelines Tutorial. If you have not read that tutorial yet, you should read it before proceeding with this notebook.

You can train models on Vertex AI using AutoML, or use custom training. In custom training, you can select many different machine types to power your training jobs, enable distributed training, use hyperparameter tuning, and accelerate with GPUs.

You can also serve prediction requests by deploying the trained model to Vertex AI Models and creating an endpoint.

In this tutorial, we will use Vertex AI Training with custom jobs to train a model in a TFX pipeline. We will also deploy the model to serve prediction request using Vertex AI.

This notebook is intended to be run on Google Colab or on AI Platform Notebooks. If you are not using one of these, you can simply click “Run in Google Colab” button above.

Set up

If you have completed Simple TFX Pipeline for Vertex Pipelines Tutorial, you will have a working GCP project and a GCS bucket and that is all we need for this tutorial. Please read the preliminary tutorial first if you missed it.

Install python packages

We will install required Python packages including TFX and KFP to author ML pipelines and submit jobs to Vertex Pipelines.

# Use the latest version of pip.
pip install --upgrade pip
pip install --upgrade "tfx[kfp]<2"

Uninstall shapely

TODO(b/263441833) This is a temporal solution to avoid an ImportError. Ultimately, it should be handled by supporting a recent version of Bigquery, instead of uninstalling other extra dependencies.

pip uninstall shapely -y

Did you restart the runtime?

If you are using Google Colab, the first time that you run the cell above, you must restart the runtime by clicking above “RESTART RUNTIME” button or using “Runtime > Restart runtime …” menu. This is because of the way that Colab loads packages.

If you are not on Colab, you can restart runtime with following cell.

# docs_infra: no_execute
import sys
if not 'google.colab' in sys.modules:
  # Automatically restart kernel after installs
  import IPython
  app = IPython.Application.instance()
  app.kernel.do_shutdown(True)

Login in to Google for this notebook

If you are running this notebook on Colab, authenticate with your user account:

import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

If you are on AI Platform Notebooks, authenticate with Google Cloud before running the next section, by running

gcloud auth login

in the Terminal window (which you can open via File > New in the menu). You only need to do this once per notebook instance.

Check the package versions.

import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
import kfp
print('KFP version: {}'.format(kfp.__version__))

TensorFlow version: 2.12.1
TFX version: 1.13.0
KFP version: 1.8.22

Set up variables

We will set up some variables used to customize the pipelines below. Following information is required:

GCP Project id. See Identifying your project id.
GCP Region to run pipelines. For more information about the regions that Vertex Pipelines is available in, see the Vertex AI locations guide.
Google Cloud Storage Bucket to store pipeline outputs.

Enter required values in the cell below before running it.

GOOGLE_CLOUD_PROJECT = ''     # <--- ENTER THIS
GOOGLE_CLOUD_REGION = ''      # <--- ENTER THIS
GCS_BUCKET_NAME = ''          # <--- ENTER THIS

if not (GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_REGION and GCS_BUCKET_NAME):
    from absl import logging
    logging.error('Please set all required parameters.')

ERROR:absl:Please set all required parameters.

Set gcloud to use your project.

gcloud config set project {GOOGLE_CLOUD_PROJECT}

ERROR: (gcloud.config.set) argument VALUE: Must be specified.
Usage: gcloud config set SECTION/PROPERTY VALUE [optional flags]
  optional flags may be  --help | --installation

For detailed information on this command and its flags, run:
  gcloud config set --help

PIPELINE_NAME = 'penguin-vertex-training'

# Path to various pipeline artifact.
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(GCS_BUCKET_NAME, PIPELINE_NAME)

# Paths for users' Python module.
MODULE_ROOT = 'gs://{}/pipeline_module/{}'.format(GCS_BUCKET_NAME, PIPELINE_NAME)

# Paths for users' data.
DATA_ROOT = 'gs://{}/data/{}'.format(GCS_BUCKET_NAME, PIPELINE_NAME)

# Name of Vertex AI Endpoint.
ENDPOINT_NAME = 'prediction-' + PIPELINE_NAME

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))

PIPELINE_ROOT: gs:///pipeline_root/penguin-vertex-training

Prepare example data

We will use the same Palmer Penguins dataset as Simple TFX Pipeline Tutorial.

There are four numeric features in this dataset which were already normalized to have range [0,1]. We will build a classification model which predicts the species of penguins.

We need to make our own copy of the dataset. Because TFX ExampleGen reads inputs from a directory, we need to create a directory and copy dataset to it on GCS.

gsutil cp gs://download.tensorflow.org/data/palmer_penguins/penguins_processed.csv {DATA_ROOT}/

InvalidUrlError: Cloud URL scheme should be followed by colon and two slashes: "://". Found: "gs:///data/penguin-vertex-training/".

Take a quick look at the CSV file.

gsutil cat {DATA_ROOT}/penguins_processed.csv | head

You need to login in order to like this post: click here