Week 12: Notes

depth-limited searching

Last week we studied the minimax algorithm, which can find the minimax value for any position in a game tree.

Without much difficulty, we may use the minimax algorithm to write a computer player for games such as Tic-Tac-Toe. The minimax algorithm can explore the entire game tree for Tic-Tac-Toe in milliseconds, so it can easily play a perfect game.

Now let's consider using the minimax algorithm to write a program that plays more complex games such as chess, checkers or Connect Four.

For these games, we cannot hope to explore an entire game tree to determine the minimax value of a given board position. That's because the game trees are simply too large. For example, a Connect Four board has 6 x 7 = 42 squares, each of which may have any of 3 values (X, O, and empty). So there are something like 342 possible board positions (actually somewhat less, since some positions can never be reached since game play stops after one player has won). The number of possible games is greater, since each board position can be reached in many possible ways. The total number of possible games is, however, less than 742, since there are at most 7 possible moves at each turn and a game will have at most 42 turns. In any case, these numbers are so large that an exhaustive tree search is not feasible.

We can still, however, use the minimax algorithm to construct a program that may play quite well, even though not perfectly. To do this, a program can search the game tree only to some fixed depth D. To achieve this, the minimax() function can take an additional parameter indicating the current search depth, and increment the value of this parameter each time it recurses. Once it reaches depth D, it can compute an heuristic estimate of the current state's minimax value. The estimate is an informed guess about the outcome of the game, given its current state. At this depth, minimax() will return the heuristic estimate instead of recursing further.

With this approach, we typically use values such as +1000 and -1000 to represent a victory for player 1 or player 2, instead of the values +1 and -1 we used above. Then for any state where the game has not yet ended, our heuristic estimate function should return a value inside this range, e.g. -1000 < x < 1000. A larger value means that we believe that player 1 has a stronger position and is hence more likely to win. Conversely, a lower value means that victory for player 2 is likely.

For example, in chess we might add up the values of the pieces on the board, assigning them values such as Queen = 9, Rook = 5, Bishop = 3, Knight = 3, Pawn = 1. In other words, this is a linear combination of features that we can observe in the current board state. This is a common approach. In practice, a real-world chess program will probably also consider pieces' placement on the board in computing its estimate. For example, pieces near the center of the board may be considered to be stronger and hence somewhat more valuable.

With this technique, even though the minimax algorithm cannot explore the game tree until its end, it will seek out paths that lead to higher values of the heuristic estimate (for player 1) or lower values (for player 2). If the heuristic estimate function accurately relects a player's advantage in the game, this can lead to very strong play. Most game-playing programs have historically used this technique (through recently newer approaches such as Monte Carlo Tree Search and neural networks have also shown their strength).

extreme value pruning

In general a game tree may be large, with millions or billions of nodes or more. We'd like to be able to implement the minimax algorithm as efficiently as we can so that we can explore as many nodes as possible per second.

We may often be able to prune (cut off) some branches of the game tree from our search because they are unnecessary to explore. For example, consider the following simple game tree. In this game the only possible outcomes are a win for X (score = +1), a win for O (score =-1) or a draw (score = 0). As usual X plays first. In this tree, leaf nodes contain the scores for each outcome and we have filled in the miniimax values of other nodes:

tree

Assuming that our minimax search explores child nodes from left to right, the shaded nodes above may be pruned from the search. When we see the -1 at the lower left, we know that the node above it will also have value -1, since we are minimizing and no smaller value is possible. So there is no need to examine the following two nodes (with values 1 and 0). Similarly, after we determine that the second node in the second row has value 1, there is no need to examing the node to its right (or any of its children), since we know that the top node will also have value 1, meaning that the game is a win for X.

Simply put, if we are searching for a move for either player and have already found a move that guarantees a win, we need look no further. This technique of extreme value pruning will work for any game has only a bounded set of possible scores, i.e. there is a maximum possible score and a minimum possible score.

alpha-beta pruning

Alpha-beta pruning is another important technique that can often prune many more nodes from a game search tree. Before we consider it in detail, let's first consider an informal example. Suppose that you are playing chess, and it is your turn to play. You have already though about a possible move M, and after deep consideration you have happily concluded that M will allow you to gain a bishop with no possible retaliation by your opponent. As you may know, a bishop in chess is generally considered to be worth about 3 pawns, i.e. about 3 points on an informal scale of move strength. You are now considering other possible moves. You now begin to think about an alternative move N. You quickly see that the opponent has a countermove to N that will result in a neutral outcome, i.e. no immediate gain for either player.

Now, the question is this: will you now spend time deeply considering other possible countermoves that the opponent might make to N? Certainly not. You already know that M is better than N, since M is worth 3 points and at best N will have a neutral outcome for you. So you can prune your search at this point, and think no more about move N. This is the essence of alpha-beta pruning.

With that intuition in mind, now consider the two-move search tree we saw earlier on this page:

tree

As the minimax algorithm runs on this game tree, we first compute the value of node 'a', which is min(14, 4, 6) = 4. The start node is maximizing, so we now know that the start node's value will be at least 4. Now we descend into the second subtree and observe the leaf node whose score is 3. At this point we know that node 'b' will have a value of at most 3, since that node is minimizing. So 'b' cannot possibly affect the start node's value, which is, once again, at least 4. And so we need not examine the remaining children of 'b', i.e. -2 and 12. Instead, we can return immediately, yielding a value of 3 for 'b'. The correct minimax value for 'b' is actually -2 (i.e. the minimum of its children), but it does not matter that we've returned a value that is too large: it will be ignored anyway.

To implement this optimization in general, we can keep track of a value α as we recursively descend the game tree. α is the best score that we have found (so far) that player Max can definitely achieve on the current search path. In other words, Max knows that the minimax value of some parent node on the path will be at least α. In our informal chess example above, α was 3 points as we considered move N.

In the game tree above, α = 4 as we consider node b. We are not interested in minimax values that are less than α, since we know we will not use them. So if are are computing a min-value and see that it will be less than α, we can prune the search at that point. In fact, even if we see that the current node's value will be less than or equal to α, we can prune the search, because we know that the current move will be no better than a move that we have already found.

Analogously, we also keep a value β, which is the best score that we have found (so far) that player Min can achieve on the current search path. In other words, Min knows that the minimax value of some parent node on the path will be at least as good as β, i.e. less than or equal to β. And so we are not interested in minimax values that are greater than or equal to β.

In the game tree above, alpha-beta pruning may seem like a modest optimization. In fact with deeper trees it is very significant, because it reduces the effective branching factor of the game tree, i.e. the number of children that we must consider at each step. In real-world minimax implementations alpha-beta pruning may commonly improve performance by as much as a factor of 10.

Let's see how to implement alpha-beta pruning in C#. In last week's lecture notes, we considered a simple counting game called 21. To make it slightly more interesting, let's consider a game that we will call 22 with slightly different rules. Just as before, a counter initially has value 0. Players A and B alternate turns, starting with A. On each player's turn, they may add 1, 2, or 3 to the counter.

In this game, if any player reaches 20, the game is a draw. If a player reaches 21, they win. If they reach 22, they have a double win, which is worth two victories.

With optimal play, who will win – A or B? And will they earn a single or double win? Or will the outcome be a draw?

An implementation of the game is below, including alpha-beta pruning. Study it to understand how it works. Notice that alpha-beta pruning involves only a modest change to the code, with extra parameters 'alpha' and 'beta' to the minimax() function.

One final note. What values should we give α and β initially? We might set α = -∞, β = +∞, indicating that we are interested in any minimax values at all. However usually we can do better than that. Suppose that (as in most games) we know that the possible range of outcomes has a lower and upper bound. In particular, in this game the lowest possible outcome is -2, and the highest is +2. Then we should set α and β to those values, respectively. As you can see, the Main() method below initializes α and β to these values.

Why does this matter? Recall that alpha-beta pruning will prune the search as soon as it finds that a minimax value will be at most α or at least β. So by setting α and β to these values, we get extreme value pruning for free: the search cuts off whenever it finds a game-winning move.

using static System.Console;
using static System.Math;

class Game {
    int counter = 0;
    public int turn = 1;    // whose turn it is to play (1 or 2)
    
    public bool done;       // true if game is over
    
    // game outcome: 0 = draw; +1, +2 = player 1 victory; -1, -2 = player 2 victory
    public int outcome;
    
    public Game() { }
    
    public Game clone() {
        Game g = new Game();
        g.counter = counter; g.turn = turn; g.done = done; g.outcome = outcome;
        return g;
    }
    
    public void move(int i) {
        counter += i;

        if (counter >= 20) {
            done = true;
            outcome = (counter - 20) * (turn == 1 ? 1 : -1);
        }
        
        turn = 3 - turn;    // it is now the other player's turn
    }
}


class TwentyTwo {
    // Compute the minimax value at the given game state.
    static int minimax(Game game, int alpha, int beta, out int bestMove) {
        bestMove = 0;
        
        if (game.done)     // game is done
            return game.outcome;

        int bestVal = game.turn == 1 ? int.MinValue : int.MaxValue;
        
        for (int i = 1 ; i <= 3 ; ++i) {
            Game g1 = game.clone();
            g1.move(i);
            int v = minimax(g1, alpha, beta, out int dummy);
            
            if (game.turn == 1 ? v > bestVal : v < bestVal) {  // we have a new best move
                bestVal = v;
                bestMove = i;
                if (game.turn == 1) {
                    if (v >= beta)
                        return v;
                    alpha = Max(alpha, v);
                } else {
                    if (v <= alpha)
                        return v;
                    beta = Min(beta, v);                    
                }
            }
        }
        
        return bestVal;
    }
    
    static void Main() {
        Game g = new Game();
        WriteLine(minimax(g, -2, 2, out int bestMove));
    }
}