TensorFlow models on the Edge TPU
In order for the Edge TPU to provide highspeed neural network performance with a lowpower cost, the Edge TPU supports a specific set of neural network operations and architectures. This page describes what types of models are compatible with the Edge TPU, how you can create them, and a bit about how to run them.
Compatibility overview
The Edge TPU is capable of executing deep feedforward neural networks such as convolutional neural networks (CNN). It supports only TensorFlow Lite models that are fully 8bit quantized and then compiled specifically for the Edge TPU.
If you're not familiar with TensorFlow Lite, it's a lightweight
version of TensorFlow designed for mobile and embedded devices. It achieves lowlatency inference in
a small binary size—both the TensorFlow Lite models and interpreter kernels are much smaller. You
cannot train a model directly with TensorFlow Lite; instead you must convert your model from a
TensorFlow file (such as a .pb
file) to a TensorFlow Lite file (a .tflite
file), using the
TensorFlow Lite converter.
TensorFlow supports a model optimization technique called quantization, which is required by the Edge TPU. Quantizing your model means converting all the 32bit floatingpoint numbers (such as weights and activation outputs) to the nearest 8bit fixedpoint numbers. This makes the model smaller and faster. And although these 8bit representations can be less precise, the inference accuracy of the neural network is not significantly affected.
Figure 1 illustrates the basic process to create a model that's compatible with the
Edge TPU. Most of the workflow uses standard TensorFlow tools. Once you have a TensorFlow Lite
model, you then use our Edge TPU compiler to create a .tflite
file
that's compatible with the Edge TPU.
However, you don't need to follow this whole process to create a good model for the Edge TPU. Instead, you can leverage existing TensorFlow models that are compatible with the Edge TPU by retraining them with your own dataset. For example, MobileNet is a popular image classification/detection model architecture that's compatible with the Edge TPU. We've created several versions of this model that you can use as a starting point to create your own model that recognizes different objects. To get started, see the section below about how to retrain an existing model with transfer learning.
But if you have designed—or plan to design—your own model from scratch, then you should read the next section about model requirements.
Model requirements
If you want to build your own TensorFlow model that takes full advantage of the Edge TPU at runtime, it must meet the following requirements:
 Tensor parameters are quantized (8bit fixedpoint numbers). You must use either quantizationaware training (recommended) or full integer posttraining quantization.
 Tensor sizes are constant at compiletime (no dynamic sizes).
 Model parameters (such as bias tensors) are constant at compiletime.
 Tensors are either 1, 2, or 3dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1.
 The model uses only the operations supported by the Edge TPU (see table 1 below).
If your model does not meet these requirements entirely, it can still compile, but only a portion of the model will execute on the Edge TPU. At the first point in the model graph where an unsupported operation occurs, the compiler partitions the graph into two parts. The first part of the graph that contains only supported operations is compiled into a custom operation that executes on the Edge TPU, and everything else executes on the CPU, as illustrated in figure 2.
If you inspect your compiled model (with a tool such as
visualize.py
),
you'll see that it's still a TensorFlow Lite model except it now has a custom operation at the
beginning of the graph. This custom operation is the only part of your model that is actually
compiled—it contains all the operations that run on the Edge TPU. The rest of the graph (beginning
with the first unsupported operation) remains the same and runs on the CPU.
If part of your model executes on the CPU, you should expect a significantly degraded inference speed compared to a model that executes entirely on the Edge TPU. We cannot predict how much slower your model will perform in this situation, so you should experiment with different architectures and strive to create a model that is 100% compatible with the Edge TPU. That is, your compiled model should contain only the Edge TPU custom operation.
Operation name  Known limitations 

Logistic  
Relu  
Relu6  
ReluN1To1  
Tanh  
Add  
Maximum  
Minimum  
Mul  
Sub  
AveragePool2d  No fused activation function. 
Concatenation  No fused activation function. If any input is a compiletime constant tensor, there must be only 2 inputs, and this constant tensor must be all zeroes (effectively, a zeropadding op). 
Conv2d  
DepthwiseConv2d  Dilated conv kernels are not supported. 
FullyConnected  Only default format supported for fullyconnected weights. Output tensor is onedimensional. 
L2Normalization  
MaxPool2d  No fused activation function. 
Mean  Supports reduction along x and/or ydimensions only. 
Pad  Supports padding along x and/or ydimensions only. 
Reshape  
ResizeBilinear  Input/output is a 3dimensional tensor. Depending on input/output size, this operation may not be mapped to the Edge TPU to avoid loss in precision. 
ResizeNearestNeighbor  Input/output is a 3dimensional tensor. Depending on input/output size, this operation may not be mapped to the Edge TPU to avoid loss in precision. 
Slice  
Softmax  Supports only 1D input tensor with a max of 16,000 elements. 
SpaceToDepth  
Split  
Squeeze  Supported only when input tensor dimensions that have leading 1s (that is, no relayout needed). For example input tensor with [y][x][z] = 1,1,10 or 1,5,10 is ok. But [y][x][z] = 5,1,10 is not supported. 
StridedSlice  Supported only when all strides are equal to 1 (that is, effectively a Stride op), and with ellipsisaxismask == 0, and newaxismax == 0. 
Transfer learning
Instead of building your own model to conform to the above requirements and then train it from scratch, you can retrain an existing model that's already compatible with the Edge TPU, using a technique called transfer learning (sometimes also called "fine tuning").
Training a neural network from scratch (when it has no computed weights or bias) can take daysworth of computing time and requires a vast amount of training data. But transfer learning allows you to start with a model that's already trained for a related task and then perform further training to teach the model new classifications using a smaller training dataset. You can do this by retraining the whole model (adjusting the weights across the whole network), but you can also achieve very accurate results by simply removing the final layer that performs classification, and training a new layer on top that recognize your new classes.
Using this process, with sufficient training data and some adjustments to the hyperparameters, you can create a highly accurate TensorFlow model in a single sitting. Once you're happy with the model's performance, simply convert it to TensorFlow Lite and then compile it for the Edge TPU. And because the model architecture doesn't change during transfer learning, you know it will fully compile for the Edge TPU (assuming you start with a compatible model).
If you're already familiar with transfer learning, check out our Edge TPUcompatible models that you can use as a starting point to create your own model. Just click to download "All model files" to get the TensorFlow model and pretrained checkpoints you need to begin transfer learning.
If you're new to this technique and want to quickly see some results, try the following tutorials that simplify the process to retrain a MobileNet model with new classes:
Transfer learning ondevice
If you're using an image classification model, you can also perform accelerated transfer learning on the Edge TPU. Our Python API offers two different techniques for ondevice transfer learning:
 Weight imprinting on the last layer
(
ImprintingEngine
)  Backpropagation on the last layer
(
SoftmaxRegression
)
In both cases, you must provide a model that's specially designed to allow training on the last layer. The required model structure is different for each API, but the result is basically the same: the last fullyconnected layer where classification occurs is separated from the base of the graph. Then only the base of the graph is compiled for the Edge TPU, which leaves the weights in the last layer accessible for training. More detail about the model architecture is available in the corresponding documents below. For now, let's compare how retraining works for each technique:

Weight imprinting takes the output (the embedding vectors) from the base model, adjusts the activation vectors with L2normalization, and uses those values to compute new weights in the final layer—it averages the new vectors with those already in the last layer's weights. This allows for effective training of new classes with very few sample images.

Backpropagation is an abbreviated version of traditional backpropagation. Instead of backpropagating new weights to all layers in the graph, it updates only the fullyconnected layer at the end of the graph with new weights. This is the more traditional training strategy that generally achieves higher accuracy, but it requires more images and multiple training iterations.
When choosing between these training techniques, you might consider the following factors:

Training sample size: Weight imprinting is more effective if you have a relatively small set of training samples: anywhere from 1 to 200 sample images for each class (as few as 5 can be effective and the API sets a maximum of 200). If you have more samples available for training, you'll likely achieve higher accuracy by using them all with backpropagation.

Training sample variance: Backpropagation is more effective if your dataset includes large intraclass variance. That is, if the images within a given class show the subject in significantly different ways, such as in angle or size, then backpropagation probably works better. But if your application operates in an environment where such variance is low, and your training samples thus also have little intraclass variance, then weight imprinting can work very well.

Adding new classes: Only weight imprinting allows you to add new classes to the model after you've begun training. If you're using backpropagation, adding a new class after you've begun training requires that you restart training for all classes. Additionally, weight imprinting allows you to retain the classes from the pretrained model (those trained before converting the model for the Edge TPU); whereas backpropagation requires all classes to be learned ondevice.

Model compatibility: Backpropagation is compatible with more model architectures "out of the box"; you can convert existing, pretrained MobileNet and Inception models into embedding extractors that are compatible with ondevice backpropagation. To use weight imprinting, you must use a model with some very specific layers and then train it in a particular manner before using it for ondevice training (currently, we offer a version of MobileNet v1 with the proper modifications).
In both cases, the vast majority of the training process is accelerated by the Edge TPU. And when performing inferences with the retrained model, the Edge TPU accelerates everything except the final classification layer, which runs on the CPU. But because this last layer accounts for only a small portion of the model, running this last layer on the CPU should not significantly affect your inference speed.
To learn more about each technique and try some sample code, see the following pages:
 Retrain a classification model ondevice with weight imprinting
 Retrain a classification model ondevice with backpropagation
Edge TPU runtime and APIs
To execute your model on the Edge TPU, you need the Edge TPU runtime and API library installed on the host system.
If you're using the Coral Dev Board or SoM, the Edge TPU runtime and API library is already provided in the Mendel operating system. If you're using an accessory device such as the Coral USB Accelerator, you must install both onto the host computer—only Debianbased operating systems are currently supported (see the setup instructions).
From your host system, you can perform an inference using one of the following APIs provided with the Edge TPU library:
 Edge TPU C++ API: This is just a small extension
(
edgetpu.h
) for the TensorFlow Lite C++ API, so you'll mostly be using the latter to execute an inference with your model.  Edge TPU Python API: This is a
wrapper for the C++ API that adds several convenience APIs not included in C++, such as
ClassificationEngine
, which allows you to perform image classification by simply providing a compiled.tflite
model and the image you want to classify.
Is this content helpful?