Week 11: Notes

depth-limited searching

Last week we studied the minimax algorithm, which can find the minimax value for any position in a game tree.

Without much difficulty, we may use minimax to write a computer player for games such as Tic-Tac-Toe. The minimax algorithm can explore the entire game tree for Tic-Tac-Toe in milliseconds, so it can easily play a perfect game.

Now let's consider using minimax to write a program that plays more complex games such as chess, checkers or Connect Four.

For these games, we cannot hope to explore an entire game tree to determine the minimax value of a given board position. That's because the game trees are simply too large. For example, a Connect Four board has 6 x 7 = 42 squares, each of which may have any of 3 values (X, O, and empty). So there are something like 342 possible board positions (actually somewhat less, since some positions can never be reached since game play stops after one player has won). The number of possible games is greater, since each board position can be reached in many possible ways. The total number of possible games is, however, less than 742, since there are at most 7 possible moves at each turn and a game will have at most 42 turns. In any case, these numbers are so large that an exhaustive tree search is not feasible.

But we can still use the minimax algorithm to construct a program that may play quite well, even though not perfectly. To do this, a program can search the game tree only to some fixed depth D. To achieve this, the minimax() function can take an additional parameter indicating the current search depth, and increment the value of this parameter each time it recurses. Once it reaches depth D, it can compute an heuristic estimate of the current state's minimax value. The estimate is an informed guess about the outcome of the game, given its current state. At this depth, minimax() will return the heuristic estimate instead of recursing further.

With this approach, we typically use values such as +1000 and -1000 to represent a victory for player 1 or player 2, instead of the values +1 and -1 we used above. Then for any state where the game has not yet ended, our heuristic estimate function should return a value inside this range, e.g. -1000 < x < 1000. A larger value means that we believe that player 1 has a stronger position and is hence more likely to win. Conversely, a lower value means that victory for player 2 is likely.

For example, in chess we might add up the values of the pieces on the board, assigning them values such as queen = 9, rook = 5, bishop = 3, knight = 3, pawn = 1. In other words, this is a linear combination of features that we can observe in the current board state. This is a common approach. In practice, a real-world chess program will probably also consider pieces' placement on the board in computing its estimate. For example, pieces near the center of the board may be considered to be stronger and hence somewhat more valuable.

With this technique, even though the minimax algorithm cannot explore the game tree until its end, it will seek out paths that lead to higher values of the heuristic estimate (for player 1) or lower values (for player 2). If the heuristic estimate function accurately relects a player's advantage in the game, this can lead to very strong play. Most game-playing programs have historically used this technique (through recently newer approaches such as Monte Carlo Tree Search and neural networks have also shown their strength).

To illustrate these ideas, here's an implementation of Connect Four in C# using a model-view architecture: connect_four.csproj (the project file), game.cs (the model class), view.cs (a graphical interface in GTK), and ai.cs (two AI players, one of which just moves randomly). The program has a minimax player, whose search depth you can specify on the command line. For example, to play against the minimax player with depth 4, type

$ dotnet run minimax:4

The minimax player has a heuristic evaluation function that works like this. It considers every adjacent group of 4 squares on the entire board that form a horizontal, vertical, or diagonal line. In each such group of 4, it counts how many of the squares are occupied by each player's stones. If the group only contains stones only of player 1, it adds the following values to the heuristic estimate:

(The group cannot contain 4 stones of the same player, since then the game would be over and the heuristic function would not be called.)

Similarly, if the group contains only stones of player 2, then it subtracts the corresponding values from the heuristic estimate.

In my experience, this sort of evaluation function will work well in Connect 4 and in similar games in which the goal is place a number of stones in a row, such as Gomoku.

Here is the recursive minimax function in this Connect 4 implementation:

    double minimax(Game g, int depth, out int best) {
        best = -1;
        if (g.winner is int i)
            return outcome[i];
        
        if (depth >= max_depth)
            return eval(g);

        bool maximizing = g.turn == 1;
        double v = maximizing ? int.MinValue : int.MaxValue;
        
        foreach (int m in g.possible_moves()) {
            g.move(m);
            double w = minimax(g, depth + 1, out int _);
            g.unmove(m);
            if (maximizing ? w > v : w < v) {
                v = w;
                best = m;
            }
        }

        return v;
    }

At search depths 1 – 3 the minimax player is fairly weak. However at depth 4 it is significantly stronger, and I myself am not able to beat it easily.

extreme value pruning

In general a game tree may be large, with millions or billions of nodes or more. We'd like to be able to implement the minimax algorithm as efficiently as we can so that we can explore as many nodes as possible per second.

We may often be able to prune (cut off) some branches of the game tree from our search because they are unnecessary to explore. For example, consider the following simple game tree. In this game the only possible outcomes are a win for X (score = +1), a win for O (score =-1) or a draw (score = 0). As usual X plays first. In this tree, leaf nodes contain the scores for each outcome and we have filled in the miniimax values of other nodes:

tree

Assuming that our minimax search explores child nodes from left to right, the shaded nodes above may be pruned from the search. When we see the -1 at the lower left, we know that the node above it will also have value -1, since we are minimizing and no smaller value is possible. So there is no need to examine the following two nodes (with values 1 and 0). Similarly, after we determine that the second node in the second row has value 1, there is no need to examing the node to its right (or any of its children), since we know that the top node will also have value 1, meaning that the game is a win for X.

Simply put, if we are searching for a move for either player and have already found a move that guarantees a win, we need look no further. This technique of extreme value pruning will work for any game has only a bounded set of possible scores, i.e. there is a maximum possible score and a minimum possible score.

alpha-beta pruning

Alpha-beta pruning is another important technique that can often prune many more nodes from a game search tree. Before we consider it in detail, let's first consider an informal example. Suppose that you are playing chess, and it's your turn to play. You've already thought about a possible move M, and you've happily concluded that M will allow you to gain a bishop with no possible retaliation by your opponent. As you may know, a bishop in chess is generally considered to be worth about 3 pawns, i.e. about 3 points on an informal scale of move strength. You are now considering other possible moves. You now begin to think about an alternative move N. You quickly see that the opponent has a countermove to N that will result in a neutral outcome, i.e. no immediate gain for either player.

Now, the question is this: will you now spend time considering other possible countermoves that the opponent might make to N? Certainly not. You already know that M is better than N, since M is worth 3 points and at best N will have a neutral outcome for you. So you can prune your search at this point, and think no more about move N. This is the essence of alpha-beta pruning.

With that intuition in mind, now consider this game with only two moves:

tree

As the minimax algorithm runs on this game tree, we first compute the value of node 'a', which is min(14, 4, 6) = 4. The start node is maximizing, so we now know that the start node's value will be at least 4. Now we descend into the second subtree and observe the leaf node whose score is 3. At this point we know that node 'b' will have a value of at most 3, since that node is minimizing. So 'b' cannot possibly affect the start node's value, which is, once again, at least 4. And so we need not examine the remaining children of 'b', i.e. -2 and 12. Instead, we can return immediately, yielding a value of 3 for 'b'. The correct minimax value for 'b' is actually -2 (i.e. the minimum of its children), but it does not matter that we've returned a value that is too large: it will be ignored anyway.

To implement this optimization in general, we can keep track of a value α as we recursively descend the game tree. α is the best score that we have found (so far) that player Max can definitely achieve on the current search path. In other words, Max knows that the minimax value of some parent node on the path will be at least α. In our informal chess example above, α was 3 points as we considered move N.

In the game tree above, α = 4 as we consider node b. We are not interested in minimax values that are less than α, since we know we will not use them. So if are are computing a min-value and see that it will be less than α, we can prune the search at that point. In fact, even if we see that the current node's value will be less than or equal to α, we can prune the search, because we know that the current move will be no better than a move that we have already found.

Analogously, we also keep a value β, which is the best score that we have found (so far) that player Min can achieve on the current search path. In other words, Min knows that the minimax value of some parent node on the path will be at least as good as β, i.e. less than or equal to β. And so we are not interested in minimax values that are greater than or equal to β.

In the game tree above, alpha-beta pruning may seem like a modest optimization. In fact with deeper trees it is very significant, because it reduces the effective branching factor of the game tree, i.e. the number of children that we must consider at each step. In real-world minimax implementations alpha-beta pruning may often improve performance by as much as a factor of 10.

Let's implement alpha-beta pruning for our Connect Four game. Here is the updated minimax function, with code related to alpha-beta pruning in bold:

    double minimax(Game g, int depth, double alpha, double beta, out int best) {
        best = -1;
        if (g.winner is int i)
            return outcome[i];
        
        if (depth >= max_depth)
            return eval(g);

        bool maximizing = g.turn == 1;
        double v = maximizing ? int.MinValue : int.MaxValue;
        
        foreach (int m in g.possible_moves()) {
            g.move(m);
            double w = minimax(g, depth + 1, alpha, beta, out int _);
            g.unmove(m);
            if (maximizing ? w > v : w < v) {
                v = w;
                best = m;
                if (maximizing) {
                    if (v >= beta)
                        return v;
                    alpha = Max(alpha, v);
                } else {
                    if (v <= alpha)
                        return v;
                    beta = Min(beta, v);
                }
            }
        }

        return v;
    }

    public int move(Game g) {
        minimax(g, 0, P2_WIN, P1_WIN, out int best);
        return best;
    }

Notice that alpha-beta pruning involves only a modest change to the code, with extra parameters 'alpha' and 'beta' to the minimax() function.

In the top-level function move() we specify the initial values α = P2_WIN = -1000.0 and β = P1_WIN (= +1000.0). These are the lowest and highest possible minimax values. Recall that alpha-beta pruning will prune the search as soon as it finds that a minimax value v such that v ≤ α or v β. So by setting α and β to these values, we get extreme value pruning for free: the search cuts off whenever it finds a game-winning move. For this reason, these are better initial values than -∞ and +∞.

negamax

When we write a recursive minimax function, instead of a minimax value we may decide to return a negamax value. That is a minimax value relative to the player whose turn it is to move. If a state has a positive negamax value, then the current player will win with perfect play. To put it differently, if it is player 1's turn to move, the negamax and minimax values are the same. If it is player 2's turn to move, the negamax value is (- m), where m is the minimax value.

So this is not a different algorithm – just a different way of writing the code. It arguably makes the code a bit simpler (though perhaps trickier), since every player wants to maximize their negamax score.

Here is the minimax function for our Tic-Tac-Toe game, rewritten to use negamax. Notice that it must invert the return value from a recursive call, since a state's negamax value for the current player is the inverse of its value for the other player.

    // Given a game state, compute and return its negamax value,
    // and also return the best possible move from the state.
    int negamax(Game g, out Move? best) {
        best = null;
        if (g.winner() is int i)
            return g.turn == 1 ? outcome[i] : - outcome[i];

        int v = int.MinValue;
        
        foreach (Move m in g.possible_moves()) {
            Game g1 = g.clone();
            g1.move(m);
            int w = - negamax(g1, out Move _);
            if (w > v) {
                v = w;      // update best known minimax value
                best = m;   // remember this move
            }
        }

        return v;
    }

simple games

We can solve some especially simple games easily using the following ideas:

For example, consider the following simple game. There are a number of stones on a table (say, 20). On each player's true, they may remove any number of stones that is a perfect square. The player who takes the last stone wins.

Here is an implementation of the game using these ideas:

// Return true if the current player can win when there are n stones
// on the table, along with the best possible move.
bool wins(int n, out int move) {
    move = -1;

    if (n == 0)
        return false;   // I lost

    for (int m = 1; m * m <= n ; ++m)
        if (!wins(n - m * m, out int _)) {
            // Removing m * m stones will make the other player lose.
            move = m * m;
            return true;  // so I can win
        }
    
    move = 1;
    return false;   // no winning move
}

int play(int n) {
    wins(n, out int move);
    return move;
}

void main() {
    int n = 20;
    while (n > 0) {
        WriteLine($"n = {n}");
        Write($"Your move? ");
        int m = int.Parse(ReadLine()!);
        n -= m;
        if (n == 0)
            break;
        
        WriteLine($"n = {n}");
        int move = play(n);
        WriteLine($"computer plays {move}");
        n -= move;
    }
}

main();

This is completely equivalent to a program that uses numeric minimax values and a class representing the game state, but it is arguably a simpler implementation.

The program above will play perfectly. However, if we increase the number of starting stones to e.g. 100, it will be extremely slow. The problem is that it will seach all possible game trees, and there are exponentially many of those.

However, as the recursive function wins() calls itself it will compute the negamax value of many game states over and over again. We can make the function far more efficient using dynamic programming. A state is just an integer, so we can use a 1-dimensional array to cache each state's minimax value. Here is one possible implementation that can replace the play() function above:

int play(int n) {
    // If it's possible to win from a position with i stones, move[i]
    // holds the number of stones to take.  Otherwise move[i] = 0.
    int[] move = new int[n + 1];

    for (int i = 1 ; i <= n ; ++i)
        // Compute move[i].
        for (int j = 1 ; j * j <= i ; ++j)
            if (move[i - j * j] == 0) {   // the opponent loses
                move[i] = j * j;          // we can win
                break;
            }
    return move[n] == 0 ? 1 : move[n];
}

We see that we can determine a state N's minimax value (and best move) in O(N2), which is far better than the previous exponential solution.

Introduction to GTK

See the following pages, linked from our course home page: