Run multiple models with multiple Edge TPUs

The Edge TPU provides its best inference speeds when running just one model, because the cache space available on the Edge TPU cannot accommodate more than one model at a time. Although you can run multiple models on one Edge TPU, doing so requires the cache to be cleared each time you swap models, thus slowing down the entire pipeline. A solution to this performance bottleneck is to run each model on a different Edge TPU.

For example, you can connect multiple USB Accelerators to your host computer or attach a USB Accelerator to your Dev Board (which already has its own Edge TPU).

When multiple Edge TPUs are available, you can select which Edge TPU to use for each model using either the Edge TPU Python API or the C++ API.

Select an Edge TPU in Python

If you have multiple Edge TPUs connected, the Python API automatically assigns each inference engine (such as ClassificationEngine and DetectionEngine) to a different Edge TPU. So you don't need to write any extra code.

For example, if you have two Edge TPUs and two models, you can run each model on separate Edge TPUs by simply creating the inference engines as usual and then running them:

# Each engine is automatically assigned to a different Edge TPU
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)

If you have just one Edge TPU, then this code still works and both models run on the same Edge TPU.

However, if you have multiple (N) Edge TPUs and N + 1 (or more) models, then you must specify which Edge TPU to use for each additional inference engine. Otherwise, you'll receive an error that says your engine does not map to an Edge TPU device.

For example, if you have two Edge TPUs and three models, you must set the third engine to run on the same Edge TPU as one of the others (you decide which). The following code shows how you can do this for engine_c by specifying the device_path argument to be the same device used by engine_b:

# The second engine is purposely assigned to the same Edge TPU as the first
engine_a = ClassificationEngine(classification_model)
engine_b = DetectionEngine(detection_model)
engine_c = DetectionEngine(other_detection_model, engine_b.device_path())

You can also get a list of available Edge TPU device paths from ListEdgeTpuPaths().

For example code, see two_models_inference.py.

Note: All Edge TPUs connected over USB are treated equally; there's no prioritization when distributing the models. But if you attach a USB Accelerator to a Dev Board, the system always prefers the on-board (PCIe) Edge TPU before using the USB devices.

Select an Edge TPU in C++

Unlike the Python API, the C++ API requires that you always specify which Edge TPU you want to use when you have more than one model. You can do this with the EdgeTpuManager::NewEdgeTpuContext() method, which accepts arguments for the device_type (either kApexPci or kApexUsb) and the device_path, which you can query with EdgeTpuManager.EnumerateEdgeTpu().

For more information, read the Edge TPU C++ API overview and the edgetpu.h file.

Also see the two_models_two_tpus_threaded.cc example.

Performance considerations

As you scale the number of Edge TPUs in your system, consider the following possible performance issues:

  • Python does not support real multi-threading for CPU-bounded operations (read about the Python global interpreter lock (GIL)). However, we have optimized our Python API to work within Python’s multi-threading environment for all Edge TPU operations because they are IO-bounded, which can provide performance improvements. But beware that CPU-bounded operations such as image downscaling will probably encounter a performance impact when you run multiple models because these operations cannot be multi-threaded in Python.

  • When using multiple USB Accelerators, your inference speed will eventually be bottlenecked by the host USB bus’s speed, especially when running large models.

  • If you connect multiple USB Accelerators through a USB hub, be sure that each USB port can provide at least 500mA when using the default operating frequency or 900mA when using the maximum frequency (refer to the USB Accelerator performance settings). Otherwise, the device might not be able to draw enough power to function properly.

  • If you use an external USB hub, connect the Edge TPU to the primary ports only. Some USB hubs include sub-hubs with secondary ports that are not compatible—our API cannot establish an Edge TPU context on these ports. For example, if you type lsusb -t, you should see ports printed as shown below. The first 2 USB ports (usbfs) will work fine but the last one will not.

    /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/7p, 5000M
    | Port 3: Dev 36, If 0, Class=Hub, Driver=hub/4p, 5000M
        | Port 1: Dev 51, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # WORKS
        | Port 2: Dev 40, If 0, Class=Hub, Driver=hub/4p, 5000M
            | Port 1: Dev 41, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # WORKS
            |__ Port 2: Dev 39, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M  # DOESN'T WORK