July 2019 Updates

New on-device transfer learning APIs, post-training quant support, and new TF Lite delegate

The Coral Team
July 24, 2019

We've recently released the following updates.

Updated Edge TPU Compiler and runtime

The compiler has been updated to version 2.0, adding support for models built using post-training quantization—only when using full integer quantization (previously, we required quantization-aware training)—and fixing a few bugs.

You can get the new Edge TPU Compiler as follows:

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

sudo apt-get update

sudo apt-get upgrade edgetpu

And an updated compiler means there's an updated Edge TPU runtime: now version 12. You should always update your runtime version to match the compiler version.

If you're using the Dev Board, you can update the runtime along with other Mendel system software like this:

sudo apt-get update

sudo apt-get dist-upgrade

If you're using the USB Accelerator, see the next section to update the runtime.

Updated Edge TPU Python library

We've updated the Edge TPU Python library to version 2.11.1, which includes some new and updated APIs as described below.

If you're using the Dev Board, you'll get the new Python library when you upgrade the system software as described above.

If you're using the USB Accelerator, you can upgrade the library (and the Edge TPU runtime) the same way you first installed the library:

wget https://dl.google.com/coral/edgetpu_api/edgetpu_api_latest.tar.gz \
  -O edgetpu_api.tar.gz --trust-server-names

tar xzf edgetpu_api.tar.gz

cd edgetpu_api

bash ./install.sh

By the way, you can check your current library version with this command:

python3 -c "import edgetpu; print(edgetpu.__version__)"

New on-device backpropagation API

With version 2.11.1 of the Python library, we've added the SoftmaxRegression API, which allows you to perform transfer learning on the last layer of an image classification model. An instance of SoftmaxRegression stands in as the fully-connected layer with a softmax activation function, which performs the final classification for the model. By implementing this part of the graph on the device CPU, we can train the weights of the final layer using stochastic gradient descent (SGD), providing fast training on-device by accelerating the vast majority of the process with the Edge TPU.

To learn how to use it, read Retrain a classification model on-device with backpropagation.

Updated weight imprinting API

Prior to the new backpropagation API, we offered another version of on-device transfer learning called weight imprinting with the ImprintingEngine API. In version 2.11.1, we've completely rebuilt this API to improve the accuracy and allow you to keep previously learned classes. This update requires a new model architecture for the input model, so if you've used the previous version of ImprintingEngine, you'll need to update your base model and make some minor code changes. It should be a simple upgrade for most users, and we believe the payoff is well worth it. The new ImprintingEngineallows you to quickly retrain existing classes or add new ones while leaving other classes alone. You can now even keep the classes from the pre-trained base model.

To learn more, read Retrain a classification model on-device with weight imprinting.

New TensorFlow Lite delegate for Edge TPU

Until now, accelerating your model with the Edge TPU required that you write code using either our Edge TPU Python API or C++ API. But now you can accelerate your model on the Edge TPU when using the TensorFlow Lite interpreter API, because we've released a TensorFlow Lite delegate for the Edge TPU.

The TensorFlow Lite Delegate API is an experimental feature in TensorFlow Lite that allows for the TensorFlow Lite interpreter to delegate part or all of graph execution to another executor—in this case, the other executor is the Edge TPU.

To use the Edge TPU delegate, follow these steps:

  1. Update to the latest Edge TPU Python library as described above. (Although you won't need the Edge TPU library, this package is currently the only way to install the Edge TPU delegate, because it's packaged inside the libedgetpu.so file.)
  2. Open a TensorFlow Lite inferencing program, such as the label_image.py example, and add an additional parameter to specify the delegate when constructing the Interpreter. For example, the line currently looks like this:

    interpreter = Interpreter(model_path=args.model_file)

    So change it to this:

    interpreter = Interpreter(model_path=args.model_file,
      experimental_delegates=[load_delegate('libedgetpu.so.1.0')])

    Which requires one additional import at the top:


    # If you're using the tflite_runtime package: from tflite_runtime.interpreter import load_delegate

    # Or if you're using the full TensorFlow package: from tensorflow.lite.python.interpreter import load_delegate

    Note: If you're using the tensorflow package, beware that load_delegate is not available in the TensorFlow 1.14 release—until there is a new release, you must use a nightly build. However, we recommend that you instead switch to the tflite_runtime package by following the TensorFlow Lite Python quickstart.

That's it. Your code should be all set and when you run inference on a model that's compiled for the Edge TPU, TensorFlow Lite delegates the compiled portions of the graph to the Edge TPU.

New image classification models

Coinciding with the updated Edge TPU runtime comes a new model architecture designed by the Edge TPU and AutoML teams. This model is based upon the EfficientNet architecture to achieve the accuracy of a server-side model in a compact size that's optimized for low latency on the Edge TPU.

The new model is Efficientnet-EdgeTpu, and it's available in three sizes: small (S), medium (M), and large (L), corresponding to the model's graph size, which corresponds to the supported input size. And with increased size, comes increased accuracy, but also reduced latency. So the small model is the fastest but least accurate.

To get the pre-trained models and learn how to train with your own data, see the Efficientnet-EdgeTpu GitHub repo.

That's all for now. As always, please send us feedback at coral-support@google.com.