Here are notes about topics from the lecture.
For more details about the minimax algorithm, one good source is Russell and Norvig, Artificial Intelligence: A Modern Approach, Third Edition, sections 5.1 – 5.3.
We will now discuss how to write programs that can play games such as Tic Tac Toe, checkers (draughts) or even chess. All of these are 2-player abstract strategy games, which are games with the following characteristics:
There is no element of chance in the game: the same moves will always result in the same outcome.
There is no hidden information: both players have the same information about the game state.
30 years ago the best human players could still defeat the top computer programs in games such as chess and Go. But this is no longer true: in the 1990s computer programs were developed (notably IBM's Deep Blue) that could defeat the top human chess players. And just in the last few years computer programs (notably Google's AlphaGo) have become stronger than the top human players at Go.
As a first example of a game to play, consider the following very simple game, which I call Row. Row is played on a board consisting of a one-dimensional row of N sqaures, which are all initially empty. There are two players, X and O. X moves first. Each player in turn places their symbol in any empty square. The game ends when all squares are full. The winner is the player who has the longest contiguous sequence of their symbol anywhere on the board. If both players have a longest sequence of equal length, the game is a draw.
For example, here is one possible sequence of moves of a game of Row played on a 5-square board:
X wins the game, since he ends up with 2 in a row and O's longest stretch consists of only a single symbol.
We may now ask: when Row is played on a board with N squares, can the first player (or, perhaps, even the second player) always win? This is a very simple game, but the answer to this question in general may not be obvious.
Let's now consider how to write a computer program that may play a game such as Row. For any abstract strategy game, a game tree represents all possible sequences of moves. For example, here is a game tree showing all possible games of Row with N = 3. (I have omitted board states that are redundant up to symmetry. For example, I have not included the state in which X first plays in the rightmost square, becacuse it is symmetric to the state in which X's first move is in the leftmost square.)
Note that X is the winner in two of the outcomes depicted here, and the third outcome (X O X) is a draw.
We will assign a numerical score to each outcome in a game. For the game of Row, if X wins the score is +1; if O wins the score is -1; if the game is a draw, the score is 0. Thus X wishes to maximize the game score, and O wants to minimize it. In fact for all games we consider will adopt a similar convention: the first player (called X or "Max") wants to maximize the score and the second player (called O or "Min") wishes to minimize it.
Some games might have more than three possible scores. For example, we can imagine a game in which players can capture each other's pieces and the score at the end is the difference between the numbers of pieces that each player has remaining. In a game like this, a player wishes not only to win, but also to win by as much as possible.
Consider the following game tree for some abstract strategy game:
The game has two moves. First X plays, bringing the game to either state a, b, or c. Now O plays, bringing the game to one of the nine states in the lowermost row. Each of these states is labelled with a numeric score. Once again, X wishes to maximize the score and O wishes to minimize it. How should X and O choose their moves?
In each of the states a, b, and c it is O's turn to play, and O sbould choose the move that yields the lowest score. So we may assign a value to each of these states, namely the minimum of the scores of all successor states. This is the score of the outcome of the game, assuming that O plays perfectly, i.e. O chooses the move that will minimize the final score.
In the start state it is X's turn to play. X should choose the move that leads to a state of maximal value. So we may likewise assign a value to the start state, namely the maximum of the values of all successor states. This will be the score of the outcome of the game, assuming that both X and O play perfectly.
We may apply the above analysis to a game tree of arbitrary depth, assigning each node its minimax value, obtained by minimizing whenever it is O's turn to play and maximizing when it is X's turn. The process of labelling nodes in this way is the minimax algorithm. The mimimax value of the top node is the score of the final outcome, assuming perfect play by both X and O.
A game tree for chess is far too large for us to draw (it has more nodes than there are atoms in the universe) and is likewise much too large for us to analyze by computer. But if we could draw such a tree and label its nodes with the minimax algorithm, then the value of the start node would be either +1 (indicating that with the right strategy White can always win), 0 (indicating that best play leads to a draw) or -1 (meaning that Black can always win). It is not known what this value would be, since we don't know whether White or even Black can always win at chess. (It certainly seems very unlikely that Black can always win, but it has not been proven that this is not the case.)
As a simple exercise, apply the minimax algorithm to the complete game tree depicted above for Row with N = 3. Label each final board position with either +1 (X wins) or 0 (a draw), then compute the values of the other nodes. This will prove the (rather obvious) fact that X can force a win.
Let us now consider how to implement the minimax algorithm in C#. Our first goal is to compute the minimax value of the start node of a game of Row, for a board of arbitrary size N. This will reveal whether the first player (X) can always win for any particular board size.
We can implement the minimax algorithm using a
recursive method. This method minimax
will take a game
state as its input, and will makes recursive calls to determine the
minimax values of all successor states, e.g. the states following
each possible move that the current player can make. When it is X's
turn to play, minimax
will compute the maximum of the
values of all successors; when O is playing, minimax
will compute the minimum.
mimimax
must call itself recursively
passing an updated board state in which some move have been made.
minimax
could copy the board state, but that would be
computationally expensive, especially since we typically wish to
examine thousands or millions of board states in as short a time as
possible. So instead we will keep only a single copy of board state
in memory. Before each recursive call, minimax
will
update the board state by making a possible move; after the call
returns, minimax will undo the move, returning the board to
the previous state.
Here is a program that implements this minimax
method as just described for the game of Row described above:
The program computes and prints the minimax value of the game tree's start node for every value of N in the range 1 ≤ N ≤ 12. If you run this program, you will see that the first player (X) can win for every odd board size through N = 9, but when N = 11 the game will actually be a draw with perfect play!
Notice that in this program the logic defining the
rules of the game is in the Board
class, and the minimax
algorithm is implemented in a separate Minimax
class. I
generally recommend this kind of class separation in a game-playing
program.
Now let's extend our program so that it can actually play the game, i.e. determine the best move to make at any particular time. Here is an extended implementation including a user interface that lets the user play as either X or O:
Notice that in this updated program we have made only
minimal changes to the minimax
method. It now keeps
track of the best move it has seen in the current board position, and
returns it.
Also note that in this program all user interface
code is in a Game
class, separate from both the Board
and Minimax
classes. Again, I recommend keeping separate
functionality in separate classes in this way.
Now let's consider using the minimax algorithm to write a program that plays a real game such as chess or Connect Four.
For these games, we cannot hope to explore an entire game tree to determine the minimax value of a given board position. That's because the game trees are simply too large. For example, a Connect Four board has 6 x 7 = 42 squares, each of which may have any of 3 values (X, O, and empty). So there are something like 342 possible board positions (actually somewhat less, since some positions can never be reached since game play stops after one player has won). The number of possible games is greater, since each board position can be reached in many possible ways. The total number of possible games is less than 742, since there are at most 7 possible moves at each turn and a game will have at most 42 turns. These numbers are so large that an exhaustive tree search is not feasible.
We can still, however, use the minimax algorithm to construct a program that plays well. Of course, it will not play perfectly – the only way to achieve that would be to search the entire game tree. To do this, a program can search the game tree only to some fixed depth D. Once it reaches a node of depth D, it can compute an heuristic estimate of the current state's minimax value. The estimate is an informed guess about the outcome of the game, given its current state. For example, in chess we might add up the values of the pieces on the board, assigning them values such as Queen = 9, Rook = 5, Bishop = 3, Knight = 3, Pawn = 1. This is a linear combination of features that we can observe in the current board state. This is a common approach.
The heuristic estimate should scaled into the range of possible outcomes of the game. For example, if +1 represents a victory for White and -1 is a victory for Black, then the heuristic estimate itself should fall into the range [-1, +1]. For example, a value of 0.9 would represent that White will win, assuming good play by both players.