Week 11: Notes

depth-limited searching

Last week we studied the minimax algorithm, which can find the minimax value for any position in a game tree. We also talkes about how many games have a game tree that is too big for us to search exhaustively, and discussed playing these games using minimax to a fixed search depth with a heuristic evaluation function that we run at the depth limit.

To illustrate these ideas, here's an implementation of Connect Four in C# using a model-view architecture: connect_four.csproj (the project file), game.cs (the model class), view.cs (a graphical interface in GTK), and ai.cs (two AI players, one of which just moves randomly). The program has a minimax player, whose search depth you can specify on the command line. For example, to play against the minimax player with depth 4, type

$ dotnet run minimax:4

The minimax player has a heuristic evaluation function that works like this. It considers every adjacent group of 4 squares on the entire board that form a horizontal, vertical, or diagonal line. In each such group of 4, it counts how many of the squares are occupied by each player's stones. If the group only contains stones only of player 1, it adds the following values to the heuristic estimate:

(The group cannot contain 4 stones of the same player, since then the game would be over and the heuristic function would not be called.)

Similarly, if the group contains only stones of player 2, then it subtracts the corresponding values from the heuristic estimate.

In my experience, this sort of evaluation function will work well in Connect 4 and in similar games in which the goal is to place a number of stones in a row, such as Gomoku.

Here is the recursive minimax function in this Connect 4 implementation:

    // Return the minimax value of the current game state, plus
    // the best move to make in that state.
    (double, int) minimax(Game g, int depth) {
        if (g.winner() is int i)
            return (outcome[i], -1);
        
        if (depth >= max_depth)
            return (eval(g), -1);

        bool maximizing = g.turn == 1;
        double v = maximizing ? int.MinValue : int.MaxValue;
        int best = -1;
        
        foreach (int m in g.possible_moves()) {
            g.move(m);
            (double w, _) = minimax(g, depth + 1);
            g.unmove(m);
            if (maximizing ? w > v : w < v) {
                v = w;
                best = m;
            }
        }

        return (v, best);
    }

At search depths 1 – 3 the minimax player is fairly weak. However at depth 4 it is significantly stronger, and I myself am not able to beat it easily.

alpha-beta pruning

Alpha-beta pruning is an important technique that can often prune many more nodes from a game search tree. Before we consider it in detail, let's first consider an informal example. Suppose that you are playing chess, and it's your turn to play. You've already thought about a possible move M, and you've happily concluded that M will allow you to gain a bishop with no possible retaliation by your opponent. As you may know, a bishop in chess is generally considered to be worth about 3 pawns, i.e. about 3 points on an informal scale of move strength. You are now considering other possible moves. You now begin to think about an alternative move N. You quickly see that the opponent has a countermove to N that will result in a neutral outcome, i.e. no immediate gain for either player.

Now, the question is this: will you now spend time considering other possible countermoves that the opponent might make to N? Certainly not. You already know that M is better than N, since M is worth 3 points and at best N will have a neutral outcome for you. So you can prune your search at this point, and think no more about move N. This is the essence of alpha-beta pruning.

With that intuition in mind, now consider this game with only two moves:

tree

As the minimax algorithm runs on this game tree, we first compute the value of node 'a', which is min(14, 4, 6) = 4. The start node is maximizing, so we now know that the start node's value will be at least 4. Now we descend into the second subtree and observe the leaf node whose score is 3. At this point we know that node 'b' will have a value of at most 3, since that node is minimizing. So 'b' cannot possibly affect the start node's value, which is, once again, at least 4. And so we need not examine the remaining children of 'b', i.e. -2 and 12. Instead, we can return immediately, yielding a value of 3 for 'b'. The correct minimax value for 'b' is actually -2 (i.e. the minimum of its children), but it does not matter that we've returned a value that is too large: it will be ignored anyway.

To implement this optimization in general, we can keep track of a value α as we recursively descend the game tree. α is the best score that we have found (so far) that player Max can definitely achieve on the current search path. In other words, Max knows that the minimax value of some parent node on the path will be at least α. In our informal chess example above, α was 3 points as we considered move N.

In the game tree above, α = 4 as we consider node b. We are not interested in minimax values that are less than α, since we know we will not use them. So if are are computing a min-value and see that it will be less than α, we can prune the search at that point. In fact, even if we see that the current node's value will be less than or equal to α, we can prune the search, because we know that the current move will be no better than a move that we have already found.

Analogously, we also keep a value β, which is the best score that we have found (so far) that player Min can achieve on the current search path. In other words, Min knows that the minimax value of some parent node on the path will be at least as good as β, i.e. less than or equal to β. And so we are not interested in minimax values that are greater than or equal to β.

In the game tree above, alpha-beta pruning may seem like a modest optimization. In fact with deeper trees it is very significant, because it reduces the effective branching factor of the game tree, i.e. the number of children that we must consider at each step. In real-world minimax implementations alpha-beta pruning may often improve performance by as much as a factor of 10.

Let's implement alpha-beta pruning for our Connect Four game. Here is the updated minimax function, with code related to alpha-beta pruning in bold:

     (double, int) minimax(Game g, int depth, double alpha, double beta) {
        if (g.winner() is int i)
            return (outcome[i], -1);
        
        if (depth >= max_depth)
            return (eval(g), -1);

        bool maximizing = g.turn == 1;
        double v = maximizing ? int.MinValue : int.MaxValue;
        int best = -1;
        
        foreach (int m in g.possible_moves()) {
            g.move(m);
            (double w, _) = minimax(g, depth + 1, alpha, beta);
            g.unmove(m);
            if (maximizing ? w > v : w < v) {
                v = w;
                best = m;
                if (maximizing) {
                    if (v >= beta)
                        return (v, best);
                    alpha = Max(alpha, v);
                } else {
                    if (v <= alpha)
                        return (v, best);
                    beta = Min(beta, v);
                }
            }
        }

        return (v, best);
    }

    public int move(Game g) => minimax(g, 0, P2_WIN, P1_WIN).Item2;

Notice that alpha-beta pruning involves only a modest change to the code, with extra parameters 'alpha' and 'beta' to the minimax() function.

In the top-level function move() we specify the initial values α = P2_WIN = -1000.0 and β = P1_WIN (= +1000.0). These are the lowest and highest possible minimax values. Recall that alpha-beta pruning will prune the search as soon as it finds that a minimax value v such that v ≤ α or v β. So by setting α and β to these values, the search will cut off immediately whenever it finds a game-winning move. For this reason, these are better initial values than -∞ and +∞.

simple games

We can solve some especially simple games easily using the following ideas:

For example, consider the following simple game. There are a number of stones on a table (say, 20). On each player's true, they may remove any number of stones that is a perfect square. The player who takes the last stone wins.

Let's write a function that determines whether the current player can win from any given position:

// Return true if n is a winning position for the current player
// in a game of subtract a square.
bool wins(int n) {
    if (n == 0)
        return false;  // I already lost

    for (int i = 1; i * i <= n; ++i)
        if (!wins(n - i * i))  // this move goes to a position where the opponent will lose
            return true;  // so I can win

    return false;  // I have no winning moves
}

This is completely equivalent to a program that uses numeric minimax values and a class representing the game state, but it is a simpler implementation. Although this function does not return the best move to make in any given state, you could easily adapt it to do so. That will yield a strategy that plays the game perfectly.

However, if we increase the number of starting stones to e.g. 100, the function above will be extremely slow. The problem is that it will seach all possible game trees, and there are exponentially many of those.

As this function calls itself it will consider the same game states over and over again. We can make the function far more efficient using dynamic programming. A state is just an integer, so we can use a 1-dimensional array to cache each state's minimax value. Here is one possible implementation:

// Return true if n is a winning position for the current player
// in a game of subtract a square.
bool wins(int n) {
    bool[] w = new bool[n + 1];
    w[0] = false;

    for (int k = 1; k <= n; ++k) {
        // Compute w[k].
        for (int i = 1; i * i <= k ; ++i)
            if (!w[k - i * i]) {
                w[k] = true;
                break;
            }
    }

    return w[n];
}

We see that we can determine a state N's minimax value (and hence the best move) in O(N2), which is far better than the previous exponential solution.