Run inference with TensorFlow Lite in Python

By default, a model that's compiled for the Edge TPU fails when using the TensorFlow Lite API, because TensorFlow doesn't know how to execute the custom operator in the compiled model. To make it work, you need to make a few changes to your code that performs an inference. This page shows you how to do that using Python.

If you want to use C++, instead read Run inference with TensorFlow Lite in C++.

Note: This page is intended for developers with experience using the TensorFlow Lite APIs. If you don't have any experience with TensorFlow and aren't ready to take it on, you can instead use our Edge TPU Python API, which simplifies the code required to perform an inference with image classification and object detection models.

To execute your model on the Edge TPU with the Python TensorFlow Lite API, we've implemented a TensorFlow Lite delegate. A delegate is a TensorFlow Lite mechanism that handles certain operations in the model graph. In this case, our delegate handles the Edge TPU custom operator.

To use the Edge TPU delegate, follow these steps:

  1. Update to the latest Edge TPU runtime.

    • If you're using the USB Accelerator or M.2/PCIe Accelerator, you can update the Edge TPU runtime as follows (this assumes you have previously setup your device on this host):

      sudo apt-get install libedgetpu1-std

      # Or if you prefer maximum operating frequency: sudo apt-get install libedgetpu1-max
    • If you're using the Dev Board or System-on-Module (with Mendel), update the runtime and other software on the board like this (this assumes you have previously setup your Dev Board):

      sudo apt-get dist-upgrade
  2. Make sure you have the latest version of the TensorFlow Lite API.

    Open the Python file where you perform inferencing with the TensorFlow Lite Interpreter API (see the example).

    If your code imports the Interpreter class from the tensorflow package, then you must use TensorFlow 1.15 or higher because load_delegate() is not available in older releases (see how to update TensorFlow).

    However, we recommend that you instead use the tflite_runtime package. This is a much smaller package that includes the Interpreter class and load_delegate(). To install the tflite_runtime package, follow the TensorFlow Lite Python quickstart.

  3. Now load the delegate for the Edge TPU when constructing the Interpreter.

    For example, your TensorFlow Lite code should have a line that looks like this:

    interpreter = Interpreter(model_path)

    So change it to this:

    interpreter = Interpreter(model_path,

    Which requires one additional import at the top:

    # If you're using the tflite_runtime package: from tflite_runtime.interpreter import load_delegate

    # Or if you're using the full TensorFlow package: from tensorflow.lite.python.interpreter import load_delegate

    Note: The file is included with the Edge TPU runtime installed in step 1.

That's it. Your code should be all set and when you run inference using a model that's compiled for the Edge TPU, TensorFlow Lite delegates the compiled portions of the graph to the Edge TPU.

If you want to run multiple models at once, read how to run multiple models with multiple Edge TPUs.