Week 11: Notes

characteristics of games

If we would like to write a computer player for a game, it's first worth considering various attributes that a game may have.

A game is for some number of players. If there is only one player, then generally we consider the game to be a puzzle. In this course we will only consider 2-player games.
A game may be deterministic, i.e. there is dice-rolling or other no element of chance. The same moves will always result in the same outcome. For example, checkers and chess are deterministic. Or it may be non-deterministic (stochastic), in which there is chance involved. All dice games and most card games are non-deterministic.
A game may have perfect information, in which both players are aware of the entire game state. Or it may have imperfect (i.e. hidden) information, in which players know only a partial state. For example, poker is an example of a game with hidden information: you can't see your opponent's cards.
The moves in a game may be sequential: first one player makes a move, then the next player moves, and so on. Or they may be simultaneous, in which players choose moves simultaneously. Most real-time games (such as multiplayer video games) have simultaneous moves.
A game has a certain number of states that are possible. For example, Tic-Tac-Toe has less than 3⁹ possible states, since there are 9 squares, each of which has 3 possible values (empty, X, or O). Actually the number of possible states is much less than 3⁹, since many arrangements of Xs and Os can not be reached in any valid game.
A game has a branching factor, which is the number of moves that are possible at any point. In most games the branching factor will vary from move to move, so we often talk about a game's average branching factor. For example, in Tic-Tac-Toe there are 9 possible first moves, 8 possible second moves and so on, so the game's average branching factor is about 4.

In this course we will mostly consider how to play 2-player abstract strategy games, which are deterministic, sequential games of perfect information. These games include, for example, Tic-Tac-Toe, Connect Four, checkers, and chess.

history of computer game playing

Researchers have written programs that can play computer games ever since shortly after electronic computers were first invented in the 1940s.

Arthur Samuel developed a program for playing checkers starting around 1950 and continuing into the 1970s, by which time it reached the level of play of a decent human amateur. Starting around 1990, Jonathan Schaeffer and a team at the University of Alberta developed a much stronger checkers program called Chinook. Chinook defeated a top human player (Don Lafferty, ranked #2 in the world) in 1995, so this is arguably the moment when computers became stronger than humans at checkers. In 2007, Schaeffer and his team solved the game of checkers, showing that optimal play by both players will lead to a draw. Solving the game required computations by hundreds of computers over two decades!

For many years top human players were much stronger than computers at chess. However that began to change in the 1990s. IBM developed a program called Deep Blue that played against the top Russian grandmaster Garry Kasparov in a series of high-profile games in 1996 and 1997. In 1996 Kasparov defeated Deep Blue by a score of 4-2 (3 wins, 2 draws, 1 loss), which was the first time that a top human player lost even a single game to a computer in a classical chess tournament. In 1997 an improved version of Deep Blue defeated Kasparov by a score of 3½–2½ (2 wins, 3 draws, 1 loss). This is pretty much the point at which computers became stronger than top human players at chess.

For many years after that, computer programs that play Go had no chance versus strong human players. However, in 2006 an algorithm called Monte Carlo tree search was invented that allowed Go programs to start becoming stronger. In the mid 2010s, Google developed a program called AlphaGo that combined Monte Carlo tree search with deep learning via self-play. In 2016, AlphaGo defeated top human player Lee Sedol in a 5-game tournament by a score of 4-1. Go programs have become even stronger since then.

In recent years researchers have focused on writing programs to play Starcraft, a complex real-time strategy game. Playing Starcraft and other real-time strategy games remains an active area of resarch.

minimax

In this course we will focus on the classic minimax algorithm, which is commonly used in computer players for games such as checkers and chess. The newest and most powerful game-playing programs are based on Monte Carlo tree search, neural networks, and self-play, topics which are beyond the scope of this course.

For any game we wish to play, we will implement a strategy, which is a function that takes the current game state as input and decides which move to play. Every deterministic game with sequential moves and no hidden information has some optimal strategy which will perform as well as possible under the assumption that the opponent will use the strongest possible counterstrategy. You may see a more precise definition of these ideas in more advanced classes about game theory.

In Tic-Tac-Toe, if the first player plays optimally, they can never lose. If both players play optimally, the game will be a draw. (Most children realize this after playing Tic-Tac-Toe for a while.) Not every game is a draw with optimal play. For example, it was proven in 1988 that if both players play optimally in Connect Four, the first player will win.

For any abstract strategy game such as Tic-Tac-Toe, a game tree represents all possible sequences of moves. Each node of a game tree represents a state of a game in progress. Here is a partial game tree for Tic-Tac-Toe:

In the first move of the game, player 1 (X) has 9 possible moves. So the branching factor of Tic-Tac-Toe at the beginning is 9, and then decreases as the game progresses. If every game of Tic-Tac-Toe continued until the game was completely full, there would be 9 · 8 · … · 2 · 1 = 8! possible games. However, the number of possible games of Tic-Tic-Toe is less than 9!, because many games will end before the board is full.

The total number of possible games of Tic-Tac-Toe is relatively small, since its average branching factor is fairly small and the game is short. As a result, we can write a program that explores the entire game tree of Tic-Tac-Toe. As we will see, this allows us to play an optimal strategy. More serious games have much larger game trees. For example, it is believed that there are at least 10¹²⁰ possible chess games. And so no program can ever entirely explore chess's game tree.

scores and minimax values

We will assign a numerical score to each outcome in a game. In Tic-Tac-Toe, we will represent a win for player X (who plays first) using the score +1, and a win for O using the score -1. If the game is a draw, we will say that the score is 0. Thus X wishes to maximize the game score, and O wants to minimize it. In fact for all games we consider we'll adopt a similar convention: the first player (called X or "Max") wants to maximize the score and the second player (called O or "Min") wants to minimize it.

Some games might have only two possible outcomes, if a draw is not possible. For these games, the only possible scores are +1 and -1. On the other hand, some games could have more than three possible outcomes. For example, at the end of a poker hand a player will win (or lose) some amount of money, which is the game outcome. In a game like this, a player wishes not only to win, but also to win by as much as possible.

Consider the following game tree for a hypothetical abstract strategy game:

tree

The game has only two moves. First X plays, bringing the game to either state a, b, or c. Now O plays, bringing the game to one of the nine states in the lowermost row. Each of these states is labelled with a numeric score. Once again, X wants to maximize the final score and O wishes to minimize it. How should X and O choose their moves?

In each of the states a, b, and c it is O's turn to play, and O sbould choose the move that yields the lowest score. So we may assign a value to each of these states, namely the minimum of the values of all successor states. This is the score of the outcome of the game, assuming that O plays perfectly, i.e. O chooses the move that will minimize the final score:

tree

In the start state it is X's turn to play. X should choose the move that leads to a state of maximal value. So we may likewise assign a value to the start state, namely the maximum of the values of all successor states. This will be the score of the outcome of the game, assuming that both X and O play perfectly.

tree

The game above is trivial, but we may apply the same analysis to a game tree of any depth, assigning each node its minimax value, obtained by minimizing whenever it is O's turn to play and maximizing when it is X's turn, starting at the leaves of the game tree and working upward. This process of labelling nodes in this way is the minimax algorithm.

Note the following two points:

If we know the minimax value of every state, then we know how to play optimally. At every point, X should choose the move that leads to the following node with the highest minimax value; O should choose the move that leads to the following node with the lowest minimax value.
The minimax value at any state predicts the numerical outcome of the game, assuming that both players play optimally from that point onward. In particular, the mimimax value of the initial state is the score of the outcome of the game played from beginning to end, assuming perfect play by both X and O.

Now let's return to Tic-Tac-Toe. We have already noted that with optimal play by both players, Tic-Tac-Toe will be a draw. In other words, the minimax value of the initial state of Tic-Tac-Toe is 0. Here is the partial game tree for Tic-Tac-Toe that we saw before, with each node labelled with its minimax value:

Note the following:

After any first move by X, the outcome will still be a draw if both players play correctly.
In the last row above, it is O's turn to play. One node has minimax value +1, meaning that X has already won the game, assuming that X will play optimally from that point forward. Take a moment to see why that is true. Also, in the same row two nodes have the minimax value -1, meaning that O has already won the game at that point. That's because in both of those states O has a move that will force a victory, as a bit of reflection will show.
In the second row above, it is O's turn to move. Therefore the minimax value of each node in that row is the minimum of the minimax values of the nodes below it.
In the third row above, it is X's turn to move. Therefore the minimax value of each node in that row is the maximum of the minimax values of the nodes below it.

As mentioned above, the game tree for chess is far too large for us to draw (it has more nodes than there are atoms in the universe) and is likewise much too large for us to analyze by computer. But if we could draw such a tree and label its nodes with the minimax algorithm, then the value of the start node would be either +1 (indicating that with the right strategy White can always win), 0 (indicating that best play leads to a draw) or -1 (meaning that Black can always win). It is not known what this value would be, since we don't know whether White or even Black can always win at chess. (It certainly seems very unlikely that Black can always win, but it has not been proven that this is not the case.)

implementing minimax in C#

We can implement the minimax algorithm in C# straightforwardly using recursion. Consider the game of Tic-Tac-Toe. Here's an implementation in C# using a model-view architecture: tic_tac_toe.csproj (the project file), game.cs (the model classes), view.cs (a graphical interface in GTK), and ai.cs (two AI players, one of which just moves randomly).

The model class Move in game.cs represents a move, which is just an (x, y) position:

class Move {
    public int x, y;

    public Move(int x, int y) {
        this.x = x; this.y = y;
    }
}

The model class Game represents a state of the game. The most important methods in this class are

List<Move> possible_moves(): Return a list of all possible moves from this state.
bool move(Move m): Make a move, returning true if it was legal.
int? winner(): If the game is over, return its winner (0 = draw, 1 = player 1, 2 = player 2); otherwise return null.
Game clone(): Return a copy of this Game object.

In ai.cs, we first define the Player interface. A Player is just an object with a move() method that can decide how to move in any state:

interface Player {
    Move move(Game g);
}

Next we define a computer player called RandomPlayer, which just moves randomly:

class RandomPlayer : Player {
    Random r = new Random();

    public Move move(Game g) {
        List<Move> moves = g.possible_moves();
        return moves[r.Next(moves.Count)];
    }
}

Finally we define a player that moves using the minimax algorithm:

class MinimaxPlayer : Player {
    int[] outcome = new int[] { 0, 1, -1 };        

    // Given a game state, compute and return its minimax value,
    // and also return the best possible move from the state.
    int minimax(Game g, out Move? best) {
        best = null;
        if (g.winner() is int i)
            return outcome[i];

        bool maximizing = g.turn == 1;
        int v = maximizing ? int.MinValue : int.MaxValue;
        
        foreach (Move m in g.possible_moves()) {
            Game g1 = g.clone();
            g1.move(m);
            int w = minimax(g1, out Move _);
            if (maximizing && w > v || !maximizing && w < v) {
                v = w;      // update best known minimax value
                best = m;   // remember this move
            }
        }

        return v;
    }

    public Move move(Game g) {
        minimax(g, out Move? best);
        return best!;
    }
}

The method minimax() recursively computes the minimax value for any game state. The base case is when the game is complete, in which case minimax() returns a minimax value based on the game outcome. Otherwise, it calls itself recursively once for every possible move, and computes the maximum or minimum of all minimax values of the resulting states.

Notice the important call to game.clone() in the minimax() method above. It's necessary because a call to game.move() modifies the game state. If we did not clone the game state at each step, then there would be only a single Game object, and as soon as the recursion reached an ending state then the single Game would be done forever! In other words, the clone operation allows us to modify the state as we explore one part of the tree while allowing the state to remain intact for exploring other branches.

For some games that have a significant amount of state, it can be expensive to clone the game state each time the recursive search makes a move. So some game state classes may instead have a unmove method that allows the caller to undo a move that was made. If the Game class for Tic-Tac-Toe above had an unmove() method, then instead of

Game g1 = g.clone();
g1.move(m);
int w = minimax(g1, out Move _);

we could write

g.move(m);
int w = minimax(g, out Move _);
g.unmove(m);

With this approach, a call to minimax() will always preserve the previous game state, just like in the version with clone(). For games with complex state, this approach can be a significant performance win, though on the other hand unmove() may not always be so easy to implement.

To play the game interactively versus the minimax opponent, type

$ dotnet run minimax

You'll see a graphical interface like this:

Even though you move first, you won't be able to defeat the minimax player, whose strategy is optimal.

You can run a series of games between the minimax and random players like this:

$ dotnet run -g 10 minimax random
game 0: winner = minimax
game 1: winner = minimax
game 2: winner = minimax
game 3: winner = minimax
game 4: winner = minimax
game 5: winner = minimax
game 6: winner = minimax
game 7: winner = minimax
game 8: winner = minimax
game 9: winner = minimax
minimax won 10, random won 0, 0 draws
$ dotnet run -g 10 random minimax
game 0: winner = minimax
game 1: winner = minimax
game 2: winner = minimax
game 3: winner = minimax
game 4: winner = minimax
game 5: winner = draw
game 6: winner = minimax
game 7: winner = minimax
game 8: winner = minimax
game 9: winner = minimax
random won 0, minimax won 9, 1 draws

It's unsurprising that the minimax player wins almost every game. Occasionally the random player may get lucky and force a draw; this is much more likely when the random player moves first, as in the second series of games above.

depth-limited searching

Without much difficulty, we may use minimax to write a computer player for games such as Tic-Tac-Toe. The minimax algorithm can explore the entire game tree for Tic-Tac-Toe in milliseconds, so it can easily play a perfect game.

Now let's consider using minimax to write a program that plays more complex games such as chess, checkers or Connect Four.

For these games, we cannot hope to explore an entire game tree to determine the minimax value of a given board position. That's because the game trees are simply too large. For example, a Connect Four board has 6 x 7 = 42 squares, each of which may have any of 3 values (X, O, and empty). So there are something like 3⁴² possible board positions (actually somewhat less, since some positions can never be reached since game play stops after one player has won). The number of possible games is greater, since each board position can be reached in many possible ways. The total number of possible games is, however, less than 7⁴², since there are at most 7 possible moves at each turn and a game will have at most 42 turns. In any case, these numbers are so large that an exhaustive tree search is not feasible.

But we can still use the minimax algorithm to construct a program that may play quite well, even though not perfectly. To do this, a program can search the game tree only to some fixed depth D. To achieve this, the minimax() function can take an additional parameter indicating the current search depth, and increment the value of this parameter each time it recurses. Once it reaches depth D, it can compute an heuristic estimate of the current state's minimax value. The estimate is an informed guess about the outcome of the game, given its current state. At this depth, minimax() will return the heuristic estimate instead of recursing further.

With this approach, we typically use values such as +1000 and -1000 to represent a victory for player 1 or player 2, instead of the values +1 and -1 we used above. Then for any state where the game has not yet ended, our heuristic estimate function should return a value inside this range, e.g. -1000 < x < 1000. A larger value means that we believe that player 1 has a stronger position and is hence more likely to win. Conversely, a lower value means that victory for player 2 is likely.

For example, in chess we might add up the values of the pieces on the board, assigning them values such as queen = 9, rook = 5, bishop = 3, knight = 3, pawn = 1. In other words, this is a linear combination of features that we can observe in the current board state. This is a common approach. In practice, a real-world chess program will probably also consider pieces' placement on the board in computing its estimate. For example, pieces near the center of the board may be considered to be stronger and hence somewhat more valuable.

With this technique, even though the minimax algorithm cannot explore the game tree until its end, it will seek out paths that lead to higher values of the heuristic estimate (for player 1) or lower values (for player 2). If the heuristic estimate function accurately relects a player's advantage in the game, this can lead to very strong play. Most game-playing programs have historically used this technique (through recently newer approaches such as Monte Carlo Tree Search and neural networks have also shown their strength).