Run inference with TensorFlow Lite in C++

The TensorFlow Lite C++ API doesn't natively know how to execute a model that's compiled for the Edge TPU. To make it work, you need to make a few changes to your code as described on this page, using additional APIs provided by our edgetpu.h file.

If you want to use Python, instead read Run inference with TensorFlow Lite in Python.

Note: This page is intended for developers with experience using the TensorFlow Lite APIs. If you don't have any experience with TensorFlow and aren't ready to take it on, you can instead use our Edge TPU Python API, which simplifies the code required to perform an inference with image classification and object detection models.

To perform an inference with the TensorFlow Lite C++ API, you need to make a few modifications to your code using APIs from our edgetpu.h file.

For details about the C++ Edge TPU APIs, you should read the edgetpu.h file, but the basic usage requires the following:

  • EdgeTpuContext: This creates an object that's associated with an Edge TPU. Usually, you'll have just one Edge TPU to work with so you can instantiate this with EdgeTpuManager::OpenDevice(). But it's possible to use multiple Edge TPUs, so this method is overloaded so you can specify which Edge TPU you want to use.

  • kCustomOp and RegisterCustomOp(): You need to pass these to tflite::BuiltinOpResolver.AddCustom() in order for the tflite::Interpreter to understand how to execute the Edge TPU custom op inside your compiled model.

In general, the code you need to write includes the following pieces:

  1. Load your compiled Edge TPU model as a FlatBufferModel:
    const std::string model_path = "/path/to/model_compiled_for_edgetpu.tflite";
    std::unique_ptr<tflite::FlatBufferModel> model =
        tflite::FlatBufferModel::BuildFromFile(model_path.c_str());
    

    This model is required below in tflite::InterpreterBuilder().

    For details about compiling a model, read TensorFlow models on the Edge TPU.

  2. Create the EdgeTpuContext object:
    std::shared_ptr<edgetpu::EdgeTpuContext> edgetpu_context =
        edgetpu::EdgeTpuManager::GetSingleton()->OpenDevice();
    

    This context is required below in tflite::Interpreter.SetExternalContext().

  3. Specify the Edge TPU custom op when you create the Interpreter object:
    std::unique_ptr<tflite::Interpreter> model_interpreter =
        BuildEdgeTpuInterpreter(*model, edgetpu_context.get());
    
    std::unique_ptr BuildEdgeTpuInterpreter(
        const tflite::FlatBufferModel& model,
        edgetpu::EdgeTpuContext* edgetpu_context) {
      tflite::ops::builtin::BuiltinOpResolver resolver;
      resolver.AddCustom(edgetpu::kCustomOp, edgetpu::RegisterCustomOp());
      std::unique_ptr interpreter;
      if (tflite::InterpreterBuilder(model, resolver)(&interpreter) != kTfLiteOk) {
        std::cerr << "Failed to build interpreter." << std::endl;
      }
      // Bind given context with interpreter.
      interpreter->SetExternalContext(kTfLiteEdgeTpuContext, edgetpu_context);
      interpreter->SetNumThreads(1);
      if (interpreter->AllocateTensors() != kTfLiteOk) {
        std::cerr << "Failed to allocate tensors." << std::endl;
      }
      return interpreter;
    }
    
  4. Then use the Interpreter (the model_interpreter above) to execute inferences using tflite APIs. The main step is to call tflite::Interpreter::Invoke(), though you also need to prepare the input and then interpret the output. For more information, see the TensorFlow Lite documentation.

Also see our example code here. For the most simple example using TensorFlow Lite with our edgetpu.h file, see minimal.cc

And if you want to run multiple models at once, read how to run multiple models with multiple Edge TPUs.