Minigo

This project teaches you how to build an AlphaGo Zero implementation called Minigo, and run it on the Edge TPU

Project intro

Project summary

The AlphaGo Zero algorithm from DeepMind shocked the AI community when it became the first AI to beat a professional go player, using a relatively simple approach. At its heart, AlphaGo Zero is a convolutional neural network (CNN) that parses the game board using an input tensor similar to an image bitmap. Minigo is a different implementation of the design in the AlphaGo Zero papers, and it uses only open-source tools and libraries.

The fact that the Minigo model uses an architecture similar to other image processing models makes it an excellent fit for the Edge TPU, allowing you to run your own version of the go-playing AI and play it yourself.

Understanding this image processing model and its interaction with tree search algorithms provides an excellent test bed for understanding how neural networks can augment decision making in ambiguous environments. In particular, running Minigo on the Edge TPU is an excellent way to understand the techniques used to fit neural networks into increasingly small spaces. You can then understand the effects of different choices by measuring the strength of the model on an unambiguous, objective metric: How well can it play?

Minigo is a minimalist engine modeled after AlphaGo Zero and based on Brian Lee's MuGo—a pure Python implementation of the first AlphaGo paper "Mastering the Game of Go with Deep Neural Networks and Tree Search" . The MuGo implementation introduces features and architecture changes present in the more recent AlphaGo Zero paper, "Mastering the Game of Go without Human Knowledge". More recently, this architecture was extended for Chess and Shogi in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" .

While inspired by DeepMind's AlphaGo algorithm, this project is not a DeepMind project, nor is it affiliated with the official AlphaGo project—it is not an official version of AlphaGo. For more information about how these algorithms work, check out the Minigo repository on GitHub.

In this project, we show how you can run Minigo on the Edge TPU accelerator, using a system as small as a Raspberry Pi.

How it works

Minigo's model is given a go board as input, and the model provides the answer to two questions: Who is winning, and what are probably good moves to consider? To determine these answers, the input go board is treated like a 19x19 image with 17 "channels" (instead of the 3 RGB channels found in an ordinary image). Each pixel in the 19x19 matrix represents each position of the board, and the 17 channels represent the board in the last 8 moves and whose turn it is. The input is given to a stack of "convolutional blocks," which are convolutional layers with batch-normalization and activations, and joined with residual connections. A network is described by the number of these blocks and their "width," or number of convolutional filters.

At the top of a stack of these blocks, the network splits into a "value head" and a "policy head." The output of the value head is a scalar between -1 and 1, answering the question "which player is likely to win?" And the output of the policy head is a probability distribution answering the question "what moves are good?"

The Minigo model is trained via a reinforcement learning loop: Models play games against each other, creating data to train new models on, and so forth.

A detailed description of how the AlphaGo algorithm works vastly exceeds the scope of this reference project. To learn more, we recommend the following reading:

What you'll do in this tutorial

  • Install the Minigo model and software
  • Run the code, displaying the game on a monitor
  • Learn how to retrain the Minigo model (optional)

What you'll need

All you need is either a Coral USB Accelerator (connected to a host Linux computer such as a Raspberry Pi with a keyboard/mouse and monitor) or a Coral Dev Board (with a connected monitor and accessible from a computer over SSH/MDT).

How to build it

Time required:

  • Less than 30 minutes to start playing the Minigo AI

Step 1: Set up your Coral device

If you haven't yet set up your Dev Board or USB Accelerator, get it connected by following the appropriate Get Started guide, then come back here.

Step 2: Verify the Edge TPU libraries are installed

python3 -c 'import edgetpu; print("OK")'

This command should print "OK" and not show any error messages if everything worked.

If not, consult your device setup guide.

Step 3: Install minigo

Start by downloading the Minigo Git repository:

sudo apt update

sudo apt install git

git clone https://www.github.com/tensorflow/minigo

Then launch the included Python virtual environment and install other requirements:

cd minigo

pip3 install virtualenv virtualenvwrapper

python3 -m virtualenv --system-site-packages ./

source ./bin/activate

sh ./minigui/edgetpu/install_requirements.sh

Continuing from inside of the minigo directory, download the Minigo model:

mkdir saved_models

curl http://storage.googleapis.com/minigo-pub/edgetpu-19x19/minigo-v17-2019-04-29-edgetpu.tflite -o saved_models/v17-2019-04-29-edgetpu.tflite

Step 4: Start the Minigo server

Now launch the Minigo Server using the following command. (If you downloaded a model different from the one above, you need to edit the corresponding line in minigui/control/minigo_edgetpu.ctl.)

python3 minigui/serve.py --control minigui/control/minigo_edgetpu.ctl &

The Minigo game server is now running, waiting for you to start a game. The ampersand (&) at the end of the above command makes the program run in the background, so you can still access the terminal prompt, which you'll want for the commands below.

Note: When you're ready to stop the game server, press Enter to bring up the command prompt, then type fg to bring the game process to the foreground. Then you can quit the process by pressing Control+C.

Step 5: Start a game

There are two ways you can see Minigo in action:

5-a: Watch the Minigo AI play itself

If you just want to watch the AI play against itself, start the kiosk mode by opening this URL in a browser on the host computer: http://localhost:5001/kiosk.html.

Tip: We recommend using Chrome or Chromium for the best performance.

If you're using the Coral Dev Board over SSH/MDT, connect a monitor to the HDMI port and then run this command from the Dev Board shell:

sh ./minigui/edgetpu/start_chromium.sh http://localhost:5001/kiosk.html

You should then see the game appear on the monitor as shown in figure 1.

Figure 1. Minigo playing in kiosk mode

The large board on the left shows the current game pieces with opaque pieces and Minigo's "principal variation" with semi-transparent pieces. The principal variation is the sequence of moves Minigo currently considers to be the best for both players, and it may change for a period of time until Minigo selects its move.

To the right, there are three smaller boards (only in kiosk mode):

  • Top: The current move sequence that Minigo is considering. If it decides this is better than the sequence on the left, then the board on the left updates its principal variation.
  • Middle: A heat map of where Minigo is focusing its attention for the next move, indicated with dark squares. The darker a square is, the more time Minigo spends investigation variations that start with a move at that point.
  • Bottom: A heat map of "bad moves." That is, the darker an area is colored (in the color of the opponent), the more likely that a play in that area could swing the game in the favor of the opponent. (Except during the first handfull of moves, Minigo normally considers most points on the board to be bad moves, as demonstrated in the screenshot above: it's white's turn to play and Minigo thinks any move except the one at point O2 will swing the game in black's favor, which is why the board is tinted black everywhere except that point.)

The graph on the top-right shows Minigo's prediction for the game outcome over time: with black certain to win at the top of the graph and white certain to win at the bottom.

Underneath the graph is a debug log from the Minigo engine. The Minigo UI communicates with the engine using GTP, a text-based protocol that computer go engines have used for almost 20 years.

5-b: Play a game versus Minigo

To play against the Minigo AI, instead navigate to http://localhost:5001/lw_demo.html.

If you're using the Coral Dev Board over MDT/SSH, connect a monitor to the HDMI port and connect a mouse to the USB-A port. Then run this command from the Dev Board shell:

sh ./minigui/edgetpu/start_chromium.sh http://localhost:5001/lw_demo.html

By default, the game is set up for human vs. human gameplay: Notice at the top-right corner, there are two buttons that both say "Human" (see figure 2). These indicate which player (black and white) is controlled by a human.

Figure 2. The buttons to assign players and control gameplay

To play against Minigo, click one of the buttons to change the player to "Minigo." Remember that black plays first, so if you want to play first, click the white button to set white as "Minigo."

To play a piece, use your mouse to click a position on the board.

When it's Minigo's turn, you'll see the principal variation appear on the board (the semi-transparent pieces, as described above) while the AI model assesses the game board options. Once it decides, Minigo places a piece on the board and control returns to you so you can play.

Good luck!

Figure 3. Minigo playing in "lightweight demo" mode, which allows one or two human players

Extending the project

This Minigo demo shows how to run a basic Minigo engine on the Edge TPU. If you want to extend this project,we strongly recommend you familiarize yourself closely with the main Minigo project on Github.

To make MiniGo work on the Edge TPU, we added a new DualNetworkEdgeTPU class, which mimics the behavior of the old, Tensorflow based, DualNetwork class. With this new class, we use the Edge TPU Python API to drive our quantized Edge TPU model.

The model takes a tensor of shape [1, 19, 19, 17] that represents the current state of the 19x19 go board and its history over the last 8 moves.

The model outputs two tensors:

  • The policy_output is a probability distribution over the 19x19+1 possible actions we can take: one for every possible place we could play a stone plus one for passing (not putting any stone down). The probability tries to predict which action has the highest likelihood of us winning the game in the end.

  • The value_output is a single number and represents an evaluation of how "good" the current board state is (from black's perspective).

The game ends when both players play consecutive pass moves, or the value_output passes beyond a resign threshold (that is, the value_output predicts that the game is hopeless). For details see the code snippet below (from dual_net_edge_tpu.py).

The following function calls the neural network for each board position to be evaluated:

    def run_many(self, positions):
        """Runs inference on a list of position."""
        processed = list(map(features_lib.extract_features, positions))
        probabilities = []
        values = []
        for state in processed:
            assert state.shape == (self.board_size, self.board_size,
                                   17), str(state.shape)
            result = self.engine.RunInference(state.flatten())
            # If needed you can get the raw inference time from the result object.
            # inference_time = result[0] # ms
            policy_output = result[1][0:self.output_policy_size]
            value_output = result[1][-1]
            probabilities.append(policy_output)
            values.append(value_output)

So now that you have Minigo playing code at your fingertips what can you do with it? Perhaps a go playing robot, like the original Mechanical Turk purported to be? Or a light-up go board with suggested moves for learning to become a better go player yourself? The possibilities are endless.

Training your own go models

Optionally, you can also train your own Minigo model.

Notice: The following procedures are advanced and require familiarity with TensorFlow tools and the Minigo codebase.

You can find detailed instructions for retraining the Minigo model on the Minigo GitHub repository—however, before following that link, read the rest of this section because you need to modify a few things in order to make the resulting model compatible with the Edge TPU.

To begin, you need to install TensorFlow and several additional dependencies: see the Minigo readme. And if you want to train on a Cloud TPU, you need to set up a Google Cloud project.

Then the steps you must follow are roughly as follows:

  1. Initialize a Minigo model with random weights. (Models and training data can be found in the public Cloud Storage bucket at gs://minigo-pub/.)
  2. Run self-play: play games with the latest model, producing data used for training.
  3. Run training: train a new model with the self-play results from the most recent N generations.
  4. Potentially repeat steps 2 and 3.

Importantly, in order for the model to work on the Edge TPU, you need to modify the parameters for the train.py script. In particular, you need to add the following parameters to add training ops to the graph that make the model compatible with the Edge TPU:

  --quantize=true
  --quant_delay=850000

You also need some flags to reduce the model's size so it fits entirely in the Edge TPU's memory.

For example, the following command trains a model from one of Minigo's publicly accessible datasets on a Cloud TPU-enabled virtual machine:

python3 train.py \
  --trunk_layers=19 \
  --conv_width=128 \
  --fc_width=128 \
  --use_tpu=true \
  --tpu_name=${TPU_NAME} \
  --steps_to_train=1040384 \
  --work_dir=gs://${GCS_BUCKET}/minigo/train/ \
  --iterations_per_loop=256 \
  --train_batch_size=64 \
  --summary_steps=256 \
  --lr_boundaries=350000 \
  --lr_boundaries=700000 \
  --lr_rates=0.02 \
  --lr_rates=0.002 \
  --lr_rates=0.0002 \
  --value_cost_weight=1 \
  --quantize=true \
  --quant_delay=850000 \
  $(gsutil ls gs://minigo-pub/v17-19x19/data/golden_chunks/train_*.tfrecord.zz)

Now with this information in-hand about training specifically for the Edge TPU, read the Minigo GitHub repository documentation about Training Minigo. We suggest you also read more about creating a Cloud Storage bucket a TPU-enabled virtual machine.

Freeze and quantize the model

To adapt a Minigo model for the Edge TPU, an existing Minigo model must be quantized—this means the weights of the network are converted from floating-point numbers to integer numbers between 0 and 255. The activations of the network are also treated as integers.

After you train the model, you have a TensorFlow model checkpoint. You then must freeze the weights to prepare the model for quantization with the following command:

# Edit the path below to point to the checkpoint you want to freeze
MODEL_PATH="outputs/models/.../model.ckpt-200000"

BOARD_SIZE=19 python freeze_graph.py \
  --trunk_layers=9 \
  --conv_width=128 \
  --fc_width=128 \
  --quantize=true \
  --quant_delay=180000 \
  --model_path="${MODEL_PATH}" \
  --use_tpu=false

You can now feed this model through the TensorFlow Lite Converter. For example:

${TENSORFLOW_DIR}/lite/toco \
  --input_file="${MODEL_PATH}.pb" \
  --input_file="${MODEL_PATH}.tflite" \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --inference_type=QUANTIZED_UINT8 \
  --input_input_type=QUANTIZED_UINT8 \
  --input_shapes=1,19,19,17 \
  --input_arrays=pos_tensor \
  --output_arrays=policy_output,value_output \
  --allow_nudging_weights_to_use_fast_gemm_kernel=true

This crates a .tflite file (the Tensorflow Lite format).

Compile the model for the Edge TPU

Now feed the resulting .tflite file to the Edge TPU Compiler.

The resulting file is ready to run on the Edge TPU: Simply revise the minigui/control/minigo_edgetpu.ctl file to set the location of your newly compiled model, then run the Minigo demo as before.

About the creators

Andrew Jackson is a keen amateur go player and was present in Korea for the historic games between AlphaGo and Lee Sedol, commentating the matches for the American Go Association. He currently works on on-device and cloud-based machine intelligence at Google in Seattle.

Tom Madams‎ understands the rules of go well enough to write an AI but not to actually play the game with great success. He currently works on several projects at Google in Seattle, including on-device machine intelligence and plasma fusion research.

Seth Troisi mostly sticks to the smaller chess board. He works with neural networks on Android at Google in Seattle, and as a 20% project works on AI in various games.