Week 13: Notes

There was no lecture or tutorial this week.

Our topics this week are game playing, the minimax algorithm and alpha-beta pruning.

game playing algorithms

We will now discuss how to write programs that can play games such as Tic Tac Toe, checkers or even chess. All of these are 2-player abstract strategy games, which are games with the following characteristics:

30 years ago the best human players could still defeat the top computer programs in games such as chess and Go. But this is no longer true. In the 1990s computer programs were developed (notably IBM's Deep Blue) that could defeat the top human chess players. And just in the last few years computer programs (notably Google's AlphaGo) have become stronger than the top human players at Go.

The newest and most powerful game-playing programs are based on neural networks. Those are well beyond the scope of this course; we will instead focus on the classic minimax algorithm.

For any game we wish to play, we will implement a strategy, which is a function that takes the current game state as input and decides which move to play. Every deterministic game with sequential moves and no hidden information has some optimal strategy which will perform at least as well as any other strategy against any opponent. We would like our implementation to play an optimal strategy if possible.

By the way, a game with simultaneous moves may have no optimal strategy. For example, consider the classic game of Rock, Paper, Scissors, in which player choose and present their moves simultaneously. In this game, for any strategy S there is some strategy that will be better than S against certain opponents. For example, if S is "always play Scissors", then against a opponent who always plays Rock, the strategy P "always play Paper" is better than S. In general there is no optimal strategy for this game.

However, games with sequential moves such as Tic-Tac-Toe and chess have optimal strategies. In Tic-Tac-Toe, if the first player plays optimally, they can never lose. If both players play optimally, the game will be a draw. (Most children come to realize this after playing Tic-Tac-Toe for a while.) Not every game has these characteristics. For example, it has been proven that if both players play optimally in Connect Four, the first player will win.

Let's now consider how to write a computer program that may play a game such as TIc-Tac-Toe. For any abstract strategy game, a game tree represents all possible sequences of moves. Each node of a game tree represents a state of a game in progress.

Here is a partial game tree for Tic-Tac-Toe:

In the first move of the game, player 1 (X) has 9 possible moves. So the branching factor of Tic-Tac-Toe at the beginning is 9. In the next move, player 2 (O) has 8 possible moves, so the branching factor decreases to 8 at this point. As the game progresses, the branching factor decreases further. If every game of Tic-Tac-Toe continued until the game was completely full, there would be 9 · 8 · … · 2 · 1 = 8! possible games. However, the number of possible games of Tic-Tic-Toe is less than 9!, because many games will end before the board is full.

The total number of possible games of Tic-Tac-Toe is relatively small, since its average branching factor is fairly small and the game is short. As a result, we can write a program that explores the entire game tree of Tic-Tac-Toe. As we will see below, this allows us to play an optimal strategy. More serious games tend to have much larger game trees. The game tree for chess, for example, is vast: it is believed that there are at least 10120 possible chess games. And so no program can ever entirely explore chess's game tree.

scores and minimax values

We will assign a numerical score to each outcome in a game. In Tic-Tac-Toe, we will represent a win for player X (who plays first) using the score +1, and a win for O using the score -1. If the game is a draw, we will say that the score is 0. Thus X wishes to maximize the game score, and O wants to minimize it. In fact for all games we consider will adopt a similar convention: the first player (called X or "Max") wants to maximize the score and the second player (called O or "Min") wishes to minimize it.

Some games might have only two possible outcomes, if a draw is not possible. For these games, the only possible scores are +1 and -1. On the other hand, some games could have more than three possible outcomes. For example, we can imagine a game in which players can capture each other's pieces and the score at the end is the difference between the numbers of pieces that each player has remaining. In a game like this, a player wishes not only to win, but also to win by as much as possible.

Consider the following game tree for a hypothetical abstract strategy game:

tree

The game has only two moves. First X plays, bringing the game to either state a, b, or c. Now O plays, bringing the game to one of the nine states in the lowermost row. Each of these states is labelled with a numeric score. Once again, X wishes to maximize the final score and O wishes to minimize it. How should X and O choose their moves?

In each of the states a, b, and c it is O's turn to play, and O sbould choose the move that yields the lowest score. So we may assign a value to each of these states, namely the minimum of the values of all successor states. This is the score of the outcome of the game, assuming that O plays perfectly, i.e. O chooses the move that will minimize the final score:

tree

In the start state it is X's turn to play. X should choose the move that leads to a state of maximal value. So we may likewise assign a value to the start state, namely the maximum of the values of all successor states. This will be the score of the outcome of the game, assuming that both X and O play perfectly.

tree

The game above is trivial, but we may apply the same analysis to a game tree of arbitrary depth, assigning each node its minimax value, obtained by minimizing whenever it is O's turn to play and maximizing when it is X's turn, starting at the leaves of the game tree and working upward. This process of labelling nodes in this way is the minimax algorithm.

Note the following two important points about minimax values:

Now let's return to Tic-Tac-Toe. We have already noted that with optimal play by both players, Tic-Tac-Toe will be a draw. In other words, the minimax value of the initial state of Tic-Tac-Toe is 0. Here is the partial game tree for Tic-Tac-Toe that we saw before, with each node labelled with its minimax value:

Note the following:

As mentioned above, the game tree for chess is far too large for us to draw (it has more nodes than there are atoms in the universe) and is likewise much too large for us to analyze by computer. But if we could draw such a tree and label its nodes with the minimax algorithm, then the value of the start node would be either +1 (indicating that with the right strategy White can always win), 0 (indicating that best play leads to a draw) or -1 (meaning that Black can always win). It is not known what this value would be, since we don't know whether White or even Black can always win at chess. (It certainly seems very unlikely that Black can always win, but it has not been proven that this is not the case.)

implementing minimax in C#

We can implement the minimax algorithm in C# straightforwardly using recursion.

Consider the following simple game, which we will call 21. A counter initially has value 0. Players A and B alternate turns, starting with A. On each player's turn, they may add 1, 2, or 3 to the counter. The first player who reaches the number 21 loses. With optimal play, who will win – A or B?

To answer this question, we will implement the minimax algorithm using a pair of mutually recursive methods maxValue() and minValue(). Here is the implementation:

class TwentyOne {        
    // Compute the minimax value at the given state, assuming that it is
    // A's turn to play.
    static int maxValue(int counter) {
        if (counter >= 21)
             return 1;         // B already hit 21, so A won
                        
        int val = int.MinValue;
        for (int i = 1 ; i <= 3 ; ++i)
             val = Max(val, minValue(counter + i));
                        
        return val;
    }

    // Compute the minimax value at the given state, assuming that it is
    // B's turn to play.
    static int minValue(int counter) {
        if (counter >= 21)
            return -1;        // A already hit 21, so B won
                
        int val = int.MaxValue;
        for (int i = 1 ; i <= 3 ; ++i)
             val = Min(val, maxValue(counter + i));
                        
        return val;
    }
        
    static void Main() {
        WriteLine(maxValue(0));
    }
}

maxValue() takes an integer argument counter, representing the current value of the counter, and assumes that it is A's turn to play. If the counter is 21, then B hit 21 on his turn and so the game is over: A has won, and the minimax value is +1. That is the base case. Otherwise, maxValue() considers every possible move for A (i.e. 1, 2, or 3). For each of those moves, it adds the move value to the counter, then calls minValue() to compute the minimax value at the following game state, where it is B's turn. maxValue chooses the maximum possible value of all successor states. minValue() is analogous.

Together, these mutually recursive methods will explore the entire game tree. If we run the main method above, it prints

-1

This means that the minimax value at the root is -1, so B can always win.

Our program above tells us who can win, but not which moves to make! Let's extend the methods above so that in addition to returning the minimax value at each state, they also return the optimal move at that state. It is not a large change:

  static int maxValue(int state, out int best) {
      best = 0;
        
      if (state >= 21)
          return 1;     // I won
            
      int val = int.MinValue;
      for (int i = 1 ; i <= 3 ; ++i) {
          int v = minValue(state + i, out int dummy);
          if (v > val) {
              val = v;
              best = i;
            }
        }
            
      return val;
    }

  static int minValue(int state, out int best) {
      best = 0;
        
      if (state >= 21)
          return -1;    // I won
        
      int val = int.MaxValue;
      for (int i = 1 ; i <= 3 ; ++i) {
          int v = maxValue(state + i, out int dummy);
          if (v < val) {
              val = v;
              best = i;
            }
        }
            
      return val;
    }

Notice that when each method calls the other, it ignores the best move that is returned (hence the parameter "out int dummy"). However, each time we make a top-level call to one of these methods, we can use the returned best move as the selected move for a computer player. For example, the top-level call

     minValue(10, out int best)

will return the best move for player B, assuming that it is B's turn and the counter is currently 10.

Here is a playable implementation of 21 that uses the code above and lets you play against a computer player.

Our minimax implementation for 21 will run in time that is exponential in N, where N is the target number of points (i.e. N = 21). (Think about why this is true.) So this implementation will be extremely inefficient if N grows larger, e.g. for the value N = 100. However, since the game state is so simple (just a small integer), we could easily modify our implementation to run much more quickly, actually in linear time. Think about how. (Hint: use d______ p__________.) :)

game state objects

For more complicated games where the game state is not just an integer, typically we will instead use an object-oriented approach. There will be a class with a name such as Game that represents the game state. A Game object might hold, for example, an 8 x 8 array representing a chessboard, with array values that show which pieces are where. Typically the game state class will have a move() method that makes a single move, enforcing the rules of the game.

Let's modify our implementation of 21 to use this object-oriented approach:

// A game of 21.
class Game {
    int counter = 0;
    public int turn = 1;    // whose turn it is to play (1 or 2)
    public int winner = 0;    // the player who has won, or 0 if nobody has won yet
    
    public Game() { }
    
    public Game clone() {
        Game g = new Game();
        g.counter = counter; g.turn = turn; g.winner = winner;
        return g;
    }
    
    public void move(int i) {
        counter += i;
        turn = 3 - turn;    // it is now the other player's turn
        if (counter >= 21)
            winner = turn;    // the current player has won
    }
}

class TwentyOne {
    // Compute the minimax value at the given game state.
    static int minimax(Game game, out int best) {
        best = 0;
        
        if (game.winner > 0)     // game is done
            return game.winner == 1 ? 1 : -1;

        int val = game.turn == 1 ? int.MinValue : int.MaxValue;
        
        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, out int dummy);
            
            if (game.turn == 1 && v > val ||
                game.turn == 2 && v < val) {  // we have a new best move
              val = v;
              best = i;
            }
        }
        
        return val;
    }
    
    static void Main() {
        Game g = new Game();
        WriteLine(minimax(g, out int best));
    }
}

This code may look different from the previous implementation of 21 we saw, but in fact it is algorithmically identical to the previous version. Notice that in this version we have combined the minValue() and maxValue() methods into a single method minimax() that can compute the minimax value for any node, whether it is A's or B's turn to play. In practice, some real-world implementations of minimax use separate mutually recursive functions, and others use a single combined function. (For what it's worth, I personally generally favor the combined approach since it tends to eliminate some duplicate code.)

Notice the important call to game.clone() in the recursive minimax() method above. In this object-oriented version, when we call game.move() it modifies the game state. If we did not clone the game state at each step, then there would be only a single Game object, and as soon as the recursion reached an ending state then the single Game would be done forever! In other words, the clone operation allows us to modify the state as we explore one part of the tree while allowing the state to remain intact for exploring other branches.

For some games that have a significant amount of state, it can be expensive to clone the game state each time the recursive search makes a move. So some game state classes may have a unmove() method that allows the caller to undo a move that was made. If the Game class for 21 above had an unmove() method, then instead of

        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, out int dummy);

we could write

        for (int i = 1 ; i <= 3 ; ++i) {
            g1.move(i);
            int v = minimax(g1, out int dummy);
            g1.unmove(i);   // undo the move

With this approach, a call to minimax() will always preserve the previous game state, just like in the version with clone(). For games with complex state, this approach can be a significant performance win.

As an exercise, you may wish to implement unmove() in the Game class for 21 above. It is not difficult.

depth-limited searching

Without much difficulty, we may extend our minimax implementation in C# above to play games such as Tic-Tac-Toe. The minimax algorithm can explore the entire game tree for Tic-Tac-Toe in milliseconds, so it can easily play a perfect game.

Now let's consider using the minimax algorithm to write a program that plays more complex games such as chess, checkers or Connect Four.

For these games, we cannot hope to explore an entire game tree to determine the minimax value of a given board position. That's because the game trees are simply too large. For example, a Connect Four board has 6 x 7 = 42 squares, each of which may have any of 3 values (X, O, and empty). So there are something like 342 possible board positions (actually somewhat less, since some positions can never be reached since game play stops after one player has won). The number of possible games is greater, since each board position can be reached in many possible ways. The total number of possible games is, however, less than 742, since there are at most 7 possible moves at each turn and a game will have at most 42 turns. In any case, these numbers are so large that an exhaustive tree search is not feasible.

We can still, however, use the minimax algorithm to construct a program that may play quite well, even though not perfectly. To do this, a program can search the game tree only to some fixed depth D. To achieve this, the minimax() function can take an additional parameter indicating the current search depth, and increment the value of this parameter each time it recurses. Once it reaches depth D, it can compute an heuristic estimate of the current state's minimax value. The estimate is an informed guess about the outcome of the game, given its current state. At this depth, minimax() will return the heuristic estimate instead of recursing further.

With this approach, we typically use values such as +1000 and -1000 to represent a victory for player 1 or player 2, instead of the values +1 and -1 we used above. Then for any state where the game has not yet ended, our heuristic estimate function should return a value inside this range, e.g. -1000 < x < 1000. A larger value means that we believe that player 1 has a stronger position and is hence more likely to win. Conversely, a lower value means that victory for player 2 is likely.

For example, in chess we might add up the values of the pieces on the board, assigning them values such as Queen = 9, Rook = 5, Bishop = 3, Knight = 3, Pawn = 1. In other words, this is a linear combination of features that we can observe in the current board state. This is a common approach. In practice, a real-world chess program will probably also consider pieces' placement on the board in computing its estimate. For example, pieces near the center of the board may be considered to be stronger and hence somewhat more valuable.

With this technique, even though the minimax algorithm cannot explore the game tree until its end, it will seek out paths that lead to higher values of the heuristic estimate (for player 1) or lower values (for player 2). If the heuristic estimate function accurately relects a player's advantage in the game, this can lead to very strong play. Most game-playing programs have historically used this technique (through recently newer approaches such as Monte Carlo Tree Search and neural networks have also shown their strength).

extreme value pruning

In general a game tree may be large, with millions or billions of nodes or more. We'd like to be able to implement the minimax algorithm as efficiently as we can so that we can explore as many nodes as possible per second.

We may often be able to prune (cut off) some branches of the game tree from our search because they are unnecessary to explore. For example, consider the following simple game tree. In this game the only possible outcomes are a win for X (score = +1), a win for O (score =-1) or a draw (score = 0). As usual X plays first. In this tree, leaf nodes contain the scores for each outcome and we have filled in the miniimax values of other nodes:

tree

Assuming that our minimax search explores child nodes from left to right, the shaded nodes above may be pruned from the search. When we see the -1 at the lower left, we know that the node above it will also have value -1, since we are minimizing and no smaller value is possible. So there is no need to examine the following two nodes (with values 1 and 0). Similarly, after we determine that the second node in the second row has value 1, there is no need to examing the node to its right (or any of its children), since we know that the top node will also have value 1, meaning that the game is a win for X.

Simply put, if we are searching for a move for either player and have already found a move that guarantees a win, we need look no further. This technique of extreme value pruning will work for any game has only a bounded set of possible scores, i.e. there is a maximum possible score and a minimum possible score.

alpha-beta pruning

Alpha-beta pruning is another important technique that can often prune many more nodes from a game search tree. Before we consider it in detail, let's first consider an informal example. Suppose that you are playing chess, and it is your turn to play. You have already though about a possible move M, and after deep consideration you have happily concluded that M will allow you to gain a bishop with no possible retaliation by your opponent. As you may know, a bishop in chess is generally considered to be worth about 3 pawns, i.e. about 3 points on an informal scale of move strength. You are now considering other possible moves. You now begin to think about an alternative move N. You quickly see that the opponent has a countermove to N that will result in a neutral outcome, i.e. no immediate gain for either player.

Now, the question is this: will you now spend time deeply considering other possible countermoves that the opponent might make to N? Certainly not. You already know that M is better than N, since M is worth 3 points and at best N will have a neutral outcome for you. So you can prune your search at this point, and think no more about move N. This is the essence of alpha-beta pruning.

With that intuition in mind, now consider the two-move search tree we saw earlier on this page:

tree

As the minimax algorithm runs on this game tree, we first compute the value of node 'a', which is min(14, 4, 6) = 4. The start node is maximizing, so we now know that the start node's value will be at least 4. Now we descend into the second subtree and observe the leaf node whose score is 3. At this point we know that node 'b' will have a value of at most 3, since that node is minimizing. So 'b' cannot possibly affect the start node's value, which is, once again, at least 4. And so we need not examine the remaining children of 'b', i.e. -2 and 12. Instead, we can return immediately, yielding a value of 3 for 'b'. The correct minimax value for 'b' is actually -2 (i.e. the minimum of its children), but it does not matter that we've returned a value that is too large: it will be ignored anyway.

To implement this optimization in general, we can keep track of a value α as we recursively descend the game tree. α is the best score that we have found (so far) that player Max can definitely achieve on the current search path. In other words, Max knows that the minimax value of some parent node on the path will be at least α. In our informal chess example above, α was 3 points as we considered move N.

In the game tree above, α = 4 as we consider node b. We are not interested in minimax values that are less than α, since we know we will not use them. So if are are computing a min-value and see that it will be less than α, we can prune the search at that point. In fact, even if we see that the current node's value will be less than or equal to α, we can prune the search, because we know that the current move will be no better than a move that we have already found.

Analogously, we also keep a value β, which is the best score that we have found (so far) that player Min can achieve on the current search path. In other words, Min knows that the minimax value of some parent node on the path will be at least as good as β, i.e. less than or equal to β. And so we are not interested in minimax values that are greater than or equal to β.

In the game tree above, alpha-beta pruning may seem like a modest optimization. In fact with deeper trees it is very significant, because it reduces the effective branching factor of the game tree, i.e. the number of children that we must consider at each step. In real-world minimax implementations alpha-beta pruning may commonly improve performance by as much as a factor of 10.

Let's see how to implement alpha-beta pruning in C#. Above, we considered a simple counting game called 21. To make it slightly more interesting, let's consider a game that we will call 22 with slightly different rules. Just as before, a counter initially has value 0. Players A and B alternate turns, starting with A. On each player's turn, they may add 1, 2, or 3 to the counter.

In this game, if any player reaches 20, the game is a draw. If a player reaches 21, they win. If they reach 22, they have a double win, which is worth two victories.

With optimal play, who will win – A or B? And will they earn a single or double win? Or will the outcome be a draw?

An implementation of the game is below, including alpha-beta pruning. Study it to understand how it works. Notice that alpha-beta pruning involves only a modest change to the code, with extra parameters 'alpha' and 'beta' to the minimax() function.

One final note. What values should we give α and β initially? We might set α = -∞, β = +∞, indicating that we are interested in any minimax values at all. However usually we can do better than that. Suppose that (as in most games) we know that the possible range of outcomes has a lower and upper bound. In particular, in this game the lowest possible outcome is -2, and the highest is +2. Then we should set α and β to those values, respectively. As you can see, the Main() method below initializes α and β to these values.

Why does this matter? Recall that alpha-beta pruning will prune the search as soon as it finds that a minimax value will be at most α or at least β. So by setting α and β to these values, we get extreme value pruning for free: the search cuts off whenever it finds a game-winning move.

using static System.Console;
using static System.Math;

class Game {
    int counter = 0;
    public int turn = 1;    // whose turn it is to play (1 or 2)
    
    public bool done;       // true if game is over
    
    // game outcome: 0 = draw; +1, +2 = player 1 victory; -1, -2 = player 2 victory
    public int outcome;
    
    public Game() { }
    
    public Game clone() {
        Game g = new Game();
        g.counter = counter; g.turn = turn; g.done = done; g.outcome = outcome;
        return g;
    }
    
    public void move(int i) {
        counter += i;

        if (counter >= 20) {
            done = true;
            outcome = (counter - 20) * (turn == 1 ? 1 : -1);
        }
        
        turn = 3 - turn;    // it is now the other player's turn
    }
}


class TwentyTwo {
    // Compute the minimax value at the given game state.
    static int minimax(Game game, int alpha, int beta, out int bestMove) {
        bestMove = 0;
        
        if (game.done)     // game is done
            return game.outcome;

        int bestVal = game.turn == 1 ? int.MinValue : int.MaxValue;
        
        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, alpha, beta, out int dummy);
            
            if (game.turn == 1 ? v > bestVal : v < bestVal) {  // we have a new best move
                bestVal = v;
                bestMove = i;
                if (game.turn == 1) {
                    if (v >= beta)
                        return v;
                    alpha = Max(alpha, v);
                } else {
                    if (v <= alpha)
                        return v;
                    beta = Min(beta, v);                    
                }
            }
        }
        
        return bestVal;
    }
    
    static void Main() {
        Game g = new Game();
        WriteLine(minimax(g, -2, 2, out int bestMove));
    }
}