Consider the following 2-player game. There are N stones on a table. On each term, a player may take any number of stones that exactly divides N. The player who takes the last stone loses.
For example, suppose that N = 10. The moves might proceed as follows:
Player 1 takes 5 stones, leaving 5.
Player 2 takes 1 stone, leaving 4.
Player 1 takes 2 stones, leaving 2.
Player 2 takes 1 stone, leaving 1.
Player 1 must take the last stone and loses.
a) Write a function that takes a positive integer N and returns true if player 1 can always win the game for the given N.
b) Make your function faster for large N using dynamic programming.
c) Call N "hot" if player 1 can win the game when played with N stones, otherwise "cold". For each value M in the range 1 .. 6, print out the percentage of values in the range 1 .. 10M that are hot.
Consider the following 2-player game. N identical coins are placed in a circle. On each turn, a player may remove 1, 2, or 3 coins that are in adjacent positions. (If a coin has been removed between positions P and Q, then P and Q are not adjacent.) The player who takes the last coin wins.
For example, suppose that N = 10:
OOO O O O O OOO
On the first turn, player 1 removes 3 coins:
... O O O O OOO
Then player 2 removes 3 coins:
... O O O . O..
Now player 1 removes 2 coins:
... O O . . ...
Now player 2 must remove 1 of the 2 remaining coins. After that, player 1 will take the last coin and win.
a) Write a function that takes an integer N, and returns a boolean value that is true if player 1 can always win this game for the given N.
b) Modify your program so that it takes N on the command line, and simulates a game between two players who both use an optimal minimax strategy. Print out each board position in the game, showing all coins in a row. For example, the first board positions in the game depicted above might look like this:
1. OOOOOOOOOO 2. ...OOOOOOO 3. ...O...OOO 4. ...O.....O
c) Implement a state cache so that your computer player does not need to recompute the minimax value of any state that it has already seen. (Sometimes this sort of cache is called a transposition table.) You may wish to preserve the state cache between moves. Does the cache make your player significantly faster?
d) For an extra challenge, modify the program so that for any N it prints out each board position showing the coins in a circle, as depicted above for N = 10.
In the game of Chomp, a chocolate bar has a rectangular shape consisting of M x N squares for some values of M and N. The game is for two players. On each turn, a player chooses some square S to eat, and must remove S along with all squares that are below or to the right of S. The player who removes the last square loses.
For example, here is a possible game starting with a 4 x 5 board:
Image credit: © 2020 Lord
Belbury (CC
BY-SA 4.0)
On the first move, player 1 removes two squares in the lower right. Then player 2 removes 3 squares at the bottom. After that, player 1 removes a large block of squares on the right. Finally, player 2 removes 3 squares. Player 1 must now take the last square and loses.
a) Write a function that takes integers M and N, and returns a boolean value that is true if player 1 can win an M x N game of Chomp.
b) Write a program that lets the user play Chomp against the computer. The compute player should use a minimax strategy. After each move, print a textual representation of the board, e.g.
01234 0 XXXXX 1 XXXXX 2 XXXX 3 X
The user should enter their move as a row-column pair, e.g. "2 3".
c) Make the computer player more interesting by weakening it in some way so that it does not always play perfectly. For example, it could randomly ignore 1 out of every 4 possible moves.
a) Modify the Connect Four program we saw in the lecture so that after any tournament the minimax agent will report
the average number of moves that it considered in any state that it visited (i.e. the branching factor)
the average number of calls to the heuristic evaluation function that occurred while making each move
the average time in milliseconds that the agent took to make each move
b) Run a 100-game tournament in which minimax:4 plays against itself:
$ dotnet run -g 100 minimax:4 minimax:4
Note the values of the statistics above.
c) Modify the agent to use alpha-beta pruning.
d) Run the same 100-game tournament again. How have the values of these statistics changed?
Continuing the previous exercise, modify the Connect Four minimax agent to use a transposition table, which is a cache of minimax values of game states that the agent has already visited in its search. This may avoid duplicate work that could otherwise occur if the agent can reach the same state via two different sequences of moves.
How does the transposition table affect the average time per move, and the average number of calls to the heuristic evaluation function?
Continuing the previous exercise, modify the Connect Four minimax agent so that instead of searching to a fixed depth, it performs a series of searches of increasing depth until a time limit is exceeded, then returns the best move that it has found so far.
Continuing the previous exercise, modify the Connect Four minimax agent so that instead of considering moves in an arbitrary order at each step, it sorts them in order according to how promising they seem to be, and considers the most promising moves first. To see how promising a move M is, you can look up the state S that M leads to in the transposition table to find its minimax value. If you are not at the current depth limit, then S is likely to be in the transposition table since you will probably have seen it on a previous deepening iteration. If S is not in the transposition table, you can run the heuristic evaluation function to find a minimax value for S.
Alpha-beta pruning will generally work best if the agent considers the best possible moves first. Does this move ordering optimization significantly improve the performance of the agent?