Retrain an image classification model

This tutorial shows you how to retrain an image classification model to recognize a new set of classes. You'll use a technique called transfer learning to retrain an existing model and then compile it to run on an Edge TPU device—you can use the retrained model with either the Coral Dev Board or the Coral USB Accelerator.

Specifically, this tutorial shows you how to retrain a quantized MobileNet V1 model to recognize different types of flowers (adopted from TensorFlow's docs). But you can reuse these procedures with your own image dataset, and with a different pre-trained model.

Tip: If you want a shortcut to train an image classification model, try Cloud AutoML Vision. It's a web-based tool that allows you to train a model with your own images, optimize it, and then export it for the Edge TPU.

What is transfer learning?

Ordinarily, training an image classification model can take several days, but transfer learning is a technique that takes a model already trained for a related task and uses it as the starting point to create a new model. Usually this takes less than an hour. (This process is sometimes also called "fine-tuning" the model.)

Transfer learning can be done in two ways:

  • Last layers-only retraining: This approach retrains only the last few layers of the model, where the final classification occurs. This is fast and it can be done with a small dataset.
  • Full model retraining: This approach retrains each layer of the neural network using the new dataset. It can result in a model that is more accurate, but it takes more time, and you must retrain using a dataset of significant sample size to avoid overfitting the model.

Transfer learning is most effective when the features learned in the pre-trained model are general, not highly specialized. For example, a pre-trained model that can recognize household objects might be re-trained to recognize new office supplies, but a model pre-trained to recognize different dog breeds might not.

The steps below show you how to perform transfer-learning using either last-layers-only or full-model retraining. Most of the steps are the same—just keep an eye out for the different commands depending on the technique you desire.

These instructions do not require deep experience with TensorFlow or convolutional neural networks (CNNs), but such experience will definitely help you build a more accurate model. This tutorial also does not teach you how to design and organize a dataset, or tune the hyperparameters to converge your model to the highest possible accuracy. For any of that, refer to other literature about deep learning strategies.

Total time required to complete this tutorial is about 1 hour. But if you're experienced with TensorFlow and you retrain only the last few layers, you can finish in about 30 minutes.

Prerequisites

The procedure in this tutorial to retrain a classification model can be achieved on any computer supported by Docker. However, once you have the retrained model, you must run it on either an Coral Dev Board or Accelerator device, which have their own system requirements.

Note: This retraining tutorial is designed to run training on a desktop CPU—not on a GPU or in the cloud. It is possible to perform transfer learning on a GPU or cloud, but that requires changes to the configuration that is beyond the scope of this document. And although it's possible to perform retraining on the Coral Dev Board, the performance is likely slower than your desktop because the Edge TPU cannot be used for training.

Set up the Docker container

Docker is a virtualization platform that makes it easy to set up an isolated environment for this tutorial. Using our Docker container, you can easily set up the required environment, which includes TensorFlow, Python, classification scripts, and the pre-trained checkpoints for MobileNet V1 and V2.

To set up your container, follow these steps:

  1. First install Docker on your desktop machine (this link is for Ubuntu; select your appropriate platform from the Docker left navigation).

  2. Create a directory where you want to save the retrained model files. For example:

    CLASSIFY_DIR=${HOME}/edgetpu/classify && mkdir -p $CLASSIFY_DIR
    
  3. Download our Dockerfile and build the image:

    cd $CLASSIFY_DIR
    
    wget -O Dockerfile "http://storage.googleapis.com/cloud-iot-edge-pretrained-models/docker/classify_docker"
    sudo docker build - < Dockerfile --tag classify-tutorial
  4. Start the Docker container using the new directory as a bind mount:

    docker run --name edgetpu-classify \
    --rm -it --privileged -p 6006:6006 \
    --mount type=bind,src=${CLASSIFY_DIR},dst=/tensorflow/models/research/slim/transfer_learn \
    classify-tutorial
    

Your terminal should now show your command prompt inside the Docker container.

You're ready to start training your model.

Prepare your dataset

In this tutorial, you'll create a flower classifier. So before you begin training, you need to download the flowers dataset and convert it to the TFRecord format. We've prepared the following script to take care of that for you:

# From the Docker /tensorflow/models/research/slim/ directory
./prepare_checkpoint_and_dataset.sh --network_type mobilenet_v1

The network_type can be one of the following: mobilenet_v1, mobilenet_v2, inception_v1, inception_v2, inception_v3, or inception_v4. If you decide to try one of these other model architectures, be sure you use the same model name in the other commands where it's used below.

Retrain your classification model

You can perform transfer-learning to retrain just the last few layers of a model, or you can retrain the whole model. However, beware that if you have limited training data, retraining the whole model can lead to overfitting, so you should instead retrain just the last layers. We'll show both methods below.

  1. Start transfer-learning in one of the following ways:

    • If you want to retrain only the last few layers of the model, use the following command:

      ./start_training.sh --network_type mobilenet_v1
      
    • If you want to retrain the whole model, use this command:

      ./start_training.sh --network_type mobilenet_v1 --train_whole_model true
      

    It might take a 1 - 2 minutes for the training pipeline to start. Once training begins, the terminal will continuously print progress of the training, with lines like this:

    INFO:tensorflow:Recording summary at step 42.
    INFO:tensorflow:global step 60: loss = 1.1883 (1.347 sec/step)
    INFO:tensorflow:global step 80: loss = 0.8204 (1.363 sec/step)
    

    Depending on your machine and the model architecture (MobileNet generally trains a lot faster than Inception), it can take 10 - 30 minutes to train the last few layers with 300 steps for MobileNet V1 (based on 16 core CPU and 60G memory). Training the whole model will take longer.

  2. To monitor training progress, start tensorboard in a new terminal:

    1. Start bash in a separate terminal to join the same Docker container.

      sudo docker exec -it edgetpu-classify /bin/bash
      
    2. In the new Docker terminal, execute the following command to start tensorboard. After you execute the command, tensorboard visualizes the model accuracy throughout training in your local machine's browser at http://localhost:6006/.

      # From the Docker /tensorflow/models/research/slim/ directory
      tensorboard --logdir=./transfer_learn/train/
      
  3. To evaluate the performance using the latest checkpoint, use the run_evaluation.sh script.

    If your training is still in process, you can still run the script, but you need to open another new terminal as follows:

    sudo docker exec -it edgetpu-classify /bin/bash
    

    (Or just wait until training completes.) Then run the evaluation script:

    # From the Docker /tensorflow/models/research/slim/ directory
    ./run_evaluation.sh --network_type mobilenet_v1
    

    After some various output, you'll see the accuracy printed like this:

    eval/Accuracy[0.7175]eval/Recall_5[1]
    

If the accuracy does not satisfy you, open the start_training.sh file and tweak some parameters passed to the train_image_classifier.py script, and then retrain again (step 1). (You'll need to first remove all the trained files from ./transfer_learn/train/.)

Compile the model for the Edge TPU

To run your retrained model on the Edge TPU, you need to convert the new checkpoint file to a frozen graph, convert that to a TensorFlow Lite flatbuffer file, then compile the model for the Edge TPU. We've provided a script to simplify some of this for you, which you can run as follows.

  1. To freeze the graph and convert it to TensorFlow Lite, use the following script and specify the checkpoint number you want to use (this example uses checkpoint 300):

    # From the Docker /tensorflow/models/research/slim/ directory
    ./convert_checkpoint_to_edgetpu_tflite.sh --network_type mobilenet_v1 --checkpoint_num 300
    

    Your converted TensorFlow Lite model is named output_tflite_graph.tflite and is output in the Docker container at tensorflow/models/research/slim/transfer_learn/models/, which is the mounted directory available on your host filesystem at $CLASSIFY_DIR ($HOME/edgetpu/classify/models/).

  2. Now open a new terminal (outside the Docker container) and compile the model using the Edge TPU Compiler:

    # Install the compiler:
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    
    echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
    sudo apt update
    sudo apt install edgetpu
    # Change directories to where the new model is: cd $HOME/edgetpu/classify/models
    # Compile the model: edgetpu_compiler output_tflite_graph.tflite

The compiled file is named output_tflite_graph_edgetpu.tflite and saved in the current directory.

Run the model

You can now use the retrained and compiled model with the Edge TPU Python API.

If you need a photo of a flower to try with your new flower classifier model, here's an image that's freely available from the Open Images Dataset:

wget https://c2.staticflickr.com/9/8374/8519435096_45e27efd0d_o.jpg -O ${CLASSIFY_DIR}/flower.jpg

Then, you can use the classify_image.py sample script with the following steps, depending on which device you're using.

Note: The labels.txt file provided with your retrained model is not compatible with the classify_image.py script by default. To use this script, you need to edit the ${CLASSIFY_DIR}/models/labels.txt file to replace every colon with a space. For example, by default, the first line reads 0:daisy so you must change it to be 0 daisy.

Using the Coral Dev Board

First copy the model, labels, and image to the Dev Board (this assumes you're connected via USB; otherwise, you should change the command based on your board's IP address):

scp ${CLASSIFY_DIR}/models/output_tflite_graph_edgetpu.tflite ${CLASSIFY_DIR}/models/labels.txt ${CLASSIFY_DIR}/flower.jpg mendel@192.168.100.2:~/

Then switch to your Dev Board terminal and navigate to the demo directory and run the classify_image.py script:

cd /usr/lib/python3/dist-packages/edgetpu/demo/

python3 classify_image.py \
--model ~/output_tflite_graph_edgetpu.tflite \
--label ~/labels.txt \
--input ~/flower.jpg

Using the Coral USB Accelerator

Just navigate into the demos directory that you downloaded during device setup and run the classify_image.py script:

cd python-tflite-source/edgetpu/demo/

python3 classify_image.py \
--model ${CLASSIFY_DIR}/models/output_tflite_graph_edgetpu.tflite \
--label ${CLASSIFY_DIR}/models/labels.txt \
--input ${CLASSIFY_DIR}/flower.jpg