For more numbers like the ones above, see our benchmarks page.
As a part of Google Research, our team is working with other machine learning teams at Google to help build the next generation of neural networks for low-power devices. There is constant progress being made with TensorFlow tools that optimize models for embedded devices, and with new neural network architectures that are specially-designed to provide fast inferencing speeds in a small package.
For example, the new EfficientNet-EdgeTPU model provides new levels of performance that balance low latency with high accuracy on the Edge TPU. It comes in three sizes (small, medium, and large), offering increasing levels of accuracy with trade-offs in inference latency.
Flexibility and scalability
We offer the Edge TPU in multiple form factors to suit various prototyping and production environments—from embedded systems deployed in the field, to network systems operating on-premise.
For example, our USB Accelerator simply plugs into a desktop, laptop, or embedded system such as a Raspberry Pi so you can quickly prototype your application. From there, you can scale to production systems by adding our Mini PCIe or M.2 Accelerator to your hardware system.
If you're looking for a fully-integrated system, you can get started with our Dev Board—a single-board computer based on NXP's i.MX 8M system-on-chip. Then you can scale to production by connecting our System-on-Module (included on the Dev Board) to your own baseboard.
Our workflow to create models compatible with Coral devices is based on the TensorFlow framework. No additional APIs are required.
You only need a small runtime package, which delegates the execution of your neural network to the Edge TPU.
The Edge TPU supports a variety of model architectures built with TensorFlow, including models built with Keras.
To make your model compatible, you need to convert the trained model into the TensorFlow Lite format and quantize all parameter data (you can use either quantization-aware training or full integer post-training quantization). Then pass the model to our Edge TPU Compiler and it's ready to go.
We have verified many popular model architectures for image classification, object detection, semantic segmentation, pose estimation, keyphrase detection, and more to come. You can download several pre-trained models or read more about how to create a model for the Edge TPU.
For applications that run multiple models, you can execute your models concurrently on a single Edge TPU by co-compiling the models so they share the Edge TPU scratchpad memory. Or, if you have multiple Edge TPUs in your system, you can increase performance by assigning each model to a specific Edge TPU and run them in parallel.
Learn more about running multiple models.
Although the Edge TPU is primarily intended for inferencing, you can also use it to accelerate transfer-learning with a pre-trained model. To simplify this process, we've created a Python API that executes the backbone of your model on the Edge TPU during training, and then calculates and saves new weight parameters for the final layer.
Learn more about on-device retraining.