Week 10: Notes

characteristics of games

If we would like to write a computer player for a game, it's first worth considering various attributes that a game may have.

In this course we will mostly consider how to play 2-player abstract strategy games, which are deterministic, sequential games of perfect information. These games include, for example, Tic-Tac-Toe, Connect Four, checkers, and chess.

history of computer game playing

Researchers have written programs that can play computer games ever since shortly after electronic computers were first invented in the 1940s.

Arthur Samuel developed a program for playing checkers starting around 1950 and continuing into the 1970s, by which time it reached the level of play of a decent human amateur. Starting around 1990, Jonathan Schaeffer and a team at the University of Alberta developed a much stronger checkers program called Chinook. Chinook defeated a top human player (Don Lafferty, ranked #2 in the world) in 1995, so this is arguably the moment when computers became stronger than humans at checkers. In 2007, Schaeffer and his team solved the game of checkers, showing that optimal play by both players will lead to a draw. Solving the game required computations by hundreds of computers over two decades!

For many years top human players were much stronger than computers at chess. However that began to change in the 1990s. IBM developed a program called Deep Blue that played against the top Russian grandmaster Garry Kasparov in a series of high-profile games in 1996 and 1997. In 1996 Kasparov defeated Deep Blue by a score of 4-2 (3 wins, 2 draws, 1 loss), which was the first time that a top human player lost even a single game to a computer in a classical chess tournament. In 1997 an improved version of Deep Blue defeated Kasparov by a score of 3½–2½ (2 wins, 3 draws, 1 loss). This is pretty much the point at which computers became stronger than top human players at chess.

For many years after that, computer programs that play Go had no chance versus strong human players. However, in 2006 an algorithm called Monte Carlo tree search was invented that allowed Go programs to start becoming stronger. In the mid 2010s, Google developed a program called AlphaGo that combined Monte Carlo tree search with deep learning via self-play. In 2016, AlphaGo defeated top human player Lee Sedol in a 5-game tournament by a score of 4-1. Go programs have become even stronger since then.

In recent years researchers have focused on writing programs to play Starcraft, a complex real-time strategy game. DeepMind's program AlphaStar defeated top professional player Gregorz "Mana" Komincz in a series of games in 2018, however arguably AlphaStar had some unfair advantages (e.g. it was able to see the entire world). In a later match with fairer conditions, Komincz won against AlphaStar. Playing Starcraft and other real-time strategy games remains an active area of resarch.

minimax

In this course we will focus on the classic minimax algorithm, which is commonly used in computer players for games such as checkers and chess. The newest and most powerful game-playing programs are based on Monte Carlo tree search, neural networks, and self-play, which are beyond the scope of this course;.

For any game we wish to play, we will implement a strategy, which is a function that takes the current game state as input and decides which move to play. Every deterministic game with sequential moves and no hidden information has some optimal strategy which will perform as well as possible under the assumption that the opponent will use the strongest possible counterstrategy. You may see a more precise definition of these ideas in more advanced classes about game theory.

In Tic-Tac-Toe, if the first player plays optimally, they can never lose. If both players play optimally, the game will be a draw. (Most children realize this after playing Tic-Tac-Toe for a while.) Not every game is a draw with optimal play. For example, it was proven in 1988 that if both players play optimally in Connect Four, the first player will win.

For any abstract strategy game such as Tic-Tac-Toe, a game tree represents all possible sequences of moves. Each node of a game tree represents a state of a game in progress. Here is a partial game tree for Tic-Tac-Toe:

In the first move of the game, player 1 (X) has 9 possible moves. So the branching factor of Tic-Tac-Toe at the beginning is 9, and then decreases as the game progresses. If every game of Tic-Tac-Toe continued until the game was completely full, there would be 9 · 8 · … · 2 · 1 = 8! possible games. However, the number of possible games of Tic-Tic-Toe is less than 9!, because many games will end before the board is full.

The total number of possible games of Tic-Tac-Toe is relatively small, since its average branching factor is fairly small and the game is short. As a result, we can write a program that explores the entire game tree of Tic-Tac-Toe. As we will see, this allows us to play an optimal strategy. More serious games have much larger game trees. For example, it is believed that there are at least 10120 possible chess games. And so no program can ever entirely explore chess's game tree.

scores and minimax values

We will assign a numerical score to each outcome in a game. In Tic-Tac-Toe, we will represent a win for player X (who plays first) using the score +1, and a win for O using the score -1. If the game is a draw, we will say that the score is 0. Thus X wishes to maximize the game score, and O wants to minimize it. In fact for all games we consider we'll adopt a similar convention: the first player (called X or "Max") wants to maximize the score and the second player (called O or "Min") wants to minimize it.

Some games might have only two possible outcomes, if a draw is not possible. For these games, the only possible scores are +1 and -1. On the other hand, some games could have more than three possible outcomes. For example, at the end of a poker hand a player will win (or lose) some amount of money, which is the game outcome. In a game like this, a player wishes not only to win, but also to win by as much as possible.

Consider the following game tree for a hypothetical abstract strategy game:

tree

The game has only two moves. First X plays, bringing the game to either state a, b, or c. Now O plays, bringing the game to one of the nine states in the lowermost row. Each of these states is labelled with a numeric score. Once again, X wants to maximize the final score and O wishes to minimize it. How should X and O choose their moves?

In each of the states a, b, and c it is O's turn to play, and O sbould choose the move that yields the lowest score. So we may assign a value to each of these states, namely the minimum of the values of all successor states. This is the score of the outcome of the game, assuming that O plays perfectly, i.e. O chooses the move that will minimize the final score:

tree

In the start state it is X's turn to play. X should choose the move that leads to a state of maximal value. So we may likewise assign a value to the start state, namely the maximum of the values of all successor states. This will be the score of the outcome of the game, assuming that both X and O play perfectly.

tree

The game above is trivial, but we may apply the same analysis to a game tree of any depth, assigning each node its minimax value, obtained by minimizing whenever it is O's turn to play and maximizing when it is X's turn, starting at the leaves of the game tree and working upward. This process of labelling nodes in this way is the minimax algorithm.

Note the following two points:

Now let's return to Tic-Tac-Toe. We have already noted that with optimal play by both players, Tic-Tac-Toe will be a draw. In other words, the minimax value of the initial state of Tic-Tac-Toe is 0. Here is the partial game tree for Tic-Tac-Toe that we saw before, with each node labelled with its minimax value:

Note the following:

As mentioned above, the game tree for chess is far too large for us to draw (it has more nodes than there are atoms in the universe) and is likewise much too large for us to analyze by computer. But if we could draw such a tree and label its nodes with the minimax algorithm, then the value of the start node would be either +1 (indicating that with the right strategy White can always win), 0 (indicating that best play leads to a draw) or -1 (meaning that Black can always win). It is not known what this value would be, since we don't know whether White or even Black can always win at chess. (It certainly seems very unlikely that Black can always win, but it has not been proven that this is not the case.)

implementing minimax in C#

We can implement the minimax algorithm in C# straightforwardly using recursion. Consider the game of Tic-Tac-Toe. Here's an implementation in C# using a model-view architecture: tic_tac_toe.csproj (the project file), game.cs (the model classes), view.cs (a graphical interface in GTK), and ai.cs (two AI players, one of which just moves randomly).

The model class Move in game.cs represents a move, which is just an (x, y) position:

class Move {
    public int x, y;

    public Move(int x, int y) {
        this.x = x; this.y = y;
    }
}

The model class Game represents a state of the game. The most important methods in this class are

List<Move> possible_moves()
Return a list of all possible moves from this state.
bool move(Move m)
Make a move, returning true if it was legal.
int? winner()
If the game is over, return its winner (0 = draw, 1 = player 1, 2 = player 2); otherwise return null.
Game clone()
Return a copy of this Game object.

In ai.cs, we first define the Player interface. A Player is just an object with a move() method that can decide how to move in any state:

interface Player {
    Move move(Game g);
}

Next we define a computer player called RandomPlayer, which just moves randomly:

class RandomPlayer : Player {
    Random r = new Random();

    public Move move(Game g) {
        List<Move> moves = g.possible_moves();
        return moves[r.Next(moves.Count)];
    }
}

Finally we define a player that moves using the minimax algorithm:

class MinimaxPlayer : Player {
    int[] outcome = new int[] { 0, 1, -1 };        

    // Given a game state, compute and return its minimax value,
    // and also return the best possible move from the state.
    int minimax(Game g, out Move? best) {
        best = null;
        if (g.winner() is int i)
            return outcome[i];

        bool maximizing = g.turn == 1;
        int v = maximizing ? int.MinValue : int.MaxValue;
        
        foreach (Move m in g.possible_moves()) {
            Game g1 = g.clone();
            g1.move(m);
            int w = minimax(g1, out Move _);
            if (maximizing && w > v || !maximizing && w < v) {
                v = w;      // update best known minimax value
                best = m;   // remember this move
            }
        }

        return v;
    }

    public Move move(Game g) {
        minimax(g, out Move? best);
        return best!;
    }
}

The method minimax() recursively computes the minimax value for any game state. The base case is when the game is complete, in which case minimax() returns a minimax value based on the game outcome. Otherwise, it calls itself recursively once for every possible move, and computes the maximum or minimum of all minimax values of the resulting states.

Notice the important call to game.clone() in the minimax() method above. It's necessary because a call to game.move() modifies the game state. If we did not clone the game state at each step, then there would be only a single Game object, and as soon as the recursion reached an ending state then the single Game would be done forever! In other words, the clone operation allows us to modify the state as we explore one part of the tree while allowing the state to remain intact for exploring other branches.

For some games that have a significant amount of state, it can be expensive to clone the game state each time the recursive search makes a move. So some game state classes may instead have a unmove method that allows the caller to undo a move that was made. If the Game class for Tic-Tac-Toe above had an unmove() method, then instead of

Game g1 = g.clone();
g1.move(m);
int w = minimax(g1, out Move _);

we could write

g.move(m);
int w = minimax(g, out Move _);
g.unmove(m);

With this approach, a call to minimax() will always preserve the previous game state, just like in the version with clone(). For games with complex state, this approach can be a significant performance win, though on the other hand unmove() may not always be so easy to implement.

To play the game interactively versus the minimax opponent, type

$ dotnet run minimax

You'll see a graphical interface like this:

Even though you move first, you won't be able to defeat the minimax player, whose strategy is optimal.

You can run a series of games between the minimax and random players like this:

$ dotnet run -g 10 minimax random
game 0: winner = minimax
game 1: winner = minimax
game 2: winner = minimax
game 3: winner = minimax
game 4: winner = minimax
game 5: winner = minimax
game 6: winner = minimax
game 7: winner = minimax
game 8: winner = minimax
game 9: winner = minimax
minimax won 10, random won 0, 0 draws
$ dotnet run -g 10 random minimax
game 0: winner = minimax
game 1: winner = minimax
game 2: winner = minimax
game 3: winner = minimax
game 4: winner = minimax
game 5: winner = draw
game 6: winner = minimax
game 7: winner = minimax
game 8: winner = minimax
game 9: winner = minimax
random won 0, minimax won 9, 1 draws

It's unsurprising that the minimax player wins almost every game. Occasionally the random player may get lucky and force a draw; this is much more likely when the random player moves first, as in the second series of games above.