Get in touch
or send us a question?
CONTACT

Tutorial: TensorFlow Lite

 bookmark_border

TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and edge devices.

Key features

  • Optimized for on-device machine learning, by addressing 5 key constraints: latency (there’s no round-trip to a server), privacy (no personal data leaves the device), connectivity (internet connectivity is not required), size (reduced model and binary size) and power consumption (efficient inference and a lack of network connections).
  • Multiple platform support, covering Android and iOS devices, embedded Linux, and microcontrollers.
  • Diverse language support, which includes Java, Swift, Objective-C, C++, and Python.
  • High performance, with hardware acceleration and model optimization.
  • End-to-end examples, for common machine learning tasks such as image classification, object detection, pose estimation, question answering, text classification, etc. on multiple platforms.

Key Point: The TensorFlow Lite binary is ~1MB when all 125+ supported operators are linked (for 32-bit ARM builds), and less than 300KB when using only the operators needed for supporting the common image classification models InceptionV3 and MobileNet.

Development workflow

The following guide walks through each step of the workflow and provides links to further instructions:Note: Refer to the performance best practices guide for an ideal balance of performance, model size, and accuracy.

1. Generate a TensorFlow Lite model

A TensorFlow Lite model is represented in a special efficient portable format known as FlatBuffers (identified by the .tflite file extension). This provides several advantages over TensorFlow’s protocol buffer model format such as reduced size (small code footprint) and faster inference (data is directly accessed without an extra parsing/unpacking step) that enables TensorFlow Lite to execute efficiently on devices with limited compute and memory resources.

A TensorFlow Lite model can optionally include metadata that has human-readable model description and machine-readable data for automatic generation of pre- and post-processing pipelines during on-device inference. Refer to Add metadata for more details.

You can generate a TensorFlow Lite model in the following ways:

  • Use an existing TensorFlow Lite model: Refer to TensorFlow Lite Examples to pick an existing model. Models may or may not contain metadata.
  • Create a TensorFlow Lite model: Use the TensorFlow Lite Model Maker to create a model with your own custom dataset. By default, all models contain metadata.
  • Convert a TensorFlow model into a TensorFlow Lite model: Use the TensorFlow Lite Converter to convert a TensorFlow model into a TensorFlow Lite model. During conversion, you can apply optimizations such as quantization to reduce model size and latency with minimal or no loss in accuracy. By default, all models don’t contain metadata.

2. Run Inference

Inference refers to the process of executing a TensorFlow Lite model on-device to make predictions based on input data. You can run inference in the following ways based on the model type:

On Android and iOS devices, you can improve performance using hardware acceleration. On either platforms you can use a GPU Delegate, on android you can either use the NNAPI Delegate (for newer devices) or the Hexagon Delegate (on older devices) and on iOS you can use the Core ML Delegate. To add support for new hardware accelerators, you can define your own delegate.