bookmark_borderNote: We recommend running this tutorial in a Colab notebook, with no setup required! Just click “Run in Google Colab”.
This notebook contains an examples on how to author and run Python function components within the TFX InteractiveContext and in a locally-orchestrated TFX pipeline.
For more context and information, see the Custom Python function components page on the TFX documentation site.
We will first install TFX and import necessary modules. TFX requires Python 3.
import sys
sys.version
'3.9.16 (main, Dec 7 2022, 01:11:51) \n[GCC 9.4.0]'
To avoid upgrading Pip in a system when running locally, check to make sure that we’re running in Colab. Local systems can of course be upgraded separately.
try:
import colab
!pip install --upgrade pip
except:
pass
Note: In Google Colab, because of package updates, the first time you run this cell you must restart the runtime (Runtime > Restart runtime …).**
pip install tfx
TODO(b/263441833) This is a temporal solution to avoid an ImportError. Ultimately, it should be handled by supporting a recent version of Bigquery, instead of uninstalling other extra dependencies.
pip uninstall shapely -y
If you are using Google Colab, the first time that you run the cell above, you must restart the runtime (Runtime > Restart runtime …). This is because of the way that Colab loads packages.
We import TFX and check its version.
# Check version
from tfx import v1 as tfx
tfx.__version__
2022-12-29 10:41:24.984930: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2022-12-29 10:41:24.985022: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2022-12-29 10:41:24.985031: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. '1.12.0'
In this section, we will create components from Python functions. We will notbe doing any real ML problem — these simple functions are just used to illustrate the Python function component development process.
See Python function based component guide for more documentation.
We begin by writing a function that generate some dummy data. This is written to its own Python module file.
%%writefile my_generator.py
import os
import tensorflow as tf # Used for writing files.
from tfx import v1 as tfx
# Non-public APIs, just for showcase.
from tfx.types.experimental.simple_artifacts import Dataset
@tfx.dsl.components.component
def MyGenerator(data: tfx.dsl.components.OutputArtifact[Dataset]):
"""Create a file with dummy data in the output artifact."""
with tf.io.gfile.GFile(os.path.join(data.uri, 'data_file.txt'), 'w') as f:
f.write('Dummy data')
# Set metadata and ensure that it gets passed to downstream components.
data.set_string_custom_property('my_custom_field', 'my_custom_value')
Writing my_generator.py
Next, we write a second component that uses the dummy data produced. We will just calculate hash of the data and return it.
%%writefile my_consumer.py
import hashlib
import os
import tensorflow as tf
from tfx import v1 as tfx
# Non-public APIs, just for showcase.
from tfx.types.experimental.simple_artifacts import Dataset
from tfx.types.standard_artifacts import String
@tfx.dsl.components.component
def MyConsumer(data: tfx.dsl.components.InputArtifact[Dataset],
hash: tfx.dsl.components.OutputArtifact[String],
algorithm: tfx.dsl.components.Parameter[str] = 'sha256'):
"""Reads the contents of data and calculate."""
with tf.io.gfile.GFile(
os.path.join(data.uri, 'data_file.txt'), 'r') as f:
contents = f.read()
h = hashlib.new(algorithm)
h.update(tf.compat.as_bytes(contents))
hash.value = h.hexdigest()
# Read a custom property from the input artifact and set to the output.
custom_value = data.get_string_custom_property('my_custom_field')
hash.set_string_custom_property('input_custom_field', custom_value)
Writing my_consumer.py
Now, we will demonstrate usage of our new components in the TFX InteractiveContext.
For more information on what you can do with the TFX notebook InteractiveContext, see the in-notebook TFX Keras Component Tutorial.
from my_generator import MyGenerator
from my_consumer import MyConsumer
# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
context = InteractiveContext()
WARNING:absl:InteractiveContext pipeline_root argument not provided: using temporary directory /tmpfs/tmp/tfx-interactive-2022-12-29T10_41_28.548607-qb89cj5f as root for pipeline outputs. WARNING:absl:InteractiveContext metadata_connection_config not provided: using SQLite ML Metadata database at /tmpfs/tmp/tfx-interactive-2022-12-29T10_41_28.548607-qb89cj5f/metadata.sqlite.
context.run()
Next, we run our components interactively within the notebook with context.run()
. Our consumer component uses the outputs of the generator component.
generator = MyGenerator()
context.run(generator)
consumer = MyConsumer(
data=generator.outputs['data'],
algorithm='md5')
context.run(consumer)
After execution, we can inspect the contents of the “hash” output artifact of the consumer component on disk.
tail -v {consumer.outputs['hash'].get()[0].uri}
==> /tmpfs/tmp/tfx-interactive-2022-12-29T10_41_28.548607-qb89cj5f/MyConsumer/hash/2/value <== 0015fe7975d1a2794b59aa12635703f1
That’s it, and you’ve now written and executed your own custom components!
Next, we will author a pipeline using these same components. While using the InteractiveContext
within a notebook works well for experimentation, defining a pipeline lets you deploy your pipeline on local or remote runners for production usage.
Here, we will demonstrate usage of the LocalDagRunner running locally on your machine. For production execution, the Airflow or Kubeflow runners may be more suitable.
import os
import tempfile
from tfx import v1 as tfx
# Select a persistent TFX root directory to store your output artifacts.
# For demonstration purposes only, we use a temporary directory.
PIPELINE_ROOT = tempfile.mkdtemp()
# Select a pipeline name so that multiple runs of the same logical pipeline
# can be grouped.
PIPELINE_NAME = "function-based-pipeline"
# We use a ML Metadata configuration that uses a local SQLite database in
# the pipeline root directory. Other backends for ML Metadata are available
# for production usage.
METADATA_CONNECTION_CONFIG = tfx.orchestration.metadata.sqlite_metadata_connection_config(
os.path.join(PIPELINE_ROOT, 'metadata.sqlite'))
def function_based_pipeline():
# Here, we construct our generator and consumer components in the same way.
generator = MyGenerator()
consumer = MyConsumer(
data=generator.outputs['data'],
algorithm='md5')
return tfx.dsl.Pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT,
components=[generator, consumer],
metadata_connection_config=METADATA_CONNECTION_CONFIG)
my_pipeline = function_based_pipeline()
LocalDagRunner
tfx.orchestration.LocalDagRunner().run(my_pipeline)
WARNING:absl:ArtifactQuery.property_predicate is not supported.
We can inspect the output artifacts generated by this pipeline execution.
find {PIPELINE_ROOT}
/tmpfs/tmp/tmpno_nry5a /tmpfs/tmp/tmpno_nry5a/MyConsumer /tmpfs/tmp/tmpno_nry5a/MyConsumer/.system /tmpfs/tmp/tmpno_nry5a/MyConsumer/.system/executor_execution /tmpfs/tmp/tmpno_nry5a/MyConsumer/.system/executor_execution/2 /tmpfs/tmp/tmpno_nry5a/MyConsumer/hash /tmpfs/tmp/tmpno_nry5a/MyConsumer/hash/2 /tmpfs/tmp/tmpno_nry5a/MyConsumer/hash/2/value /tmpfs/tmp/tmpno_nry5a/MyGenerator /tmpfs/tmp/tmpno_nry5a/MyGenerator/.system /tmpfs/tmp/tmpno_nry5a/MyGenerator/.system/executor_execution /tmpfs/tmp/tmpno_nry5a/MyGenerator/.system/executor_execution/1 /tmpfs/tmp/tmpno_nry5a/MyGenerator/data /tmpfs/tmp/tmpno_nry5a/MyGenerator/data/1
You need to login in order to like this post: click here
YOU MIGHT ALSO LIKE