Hannah Fry was the presenter of this year’s Royal Institution Christmas Lectures. There were some beautiful demonstrations (in my opinion) on several mathematical subjects that have relevance in the real world. The demonstration of how vaccination is effective using the zombie apocalypse was very thought provoking as was the Christmas present detection machine, which was a great real life (ish) version of the problem of false positives in various kinds of testing.
The one thing that caught my eye from the point of view of programming was Matt Parker’s demonstration of MENACE. MENACE is a computer made of matchboxes that can learn how to play noughts and crosses. There is a matchbox for each position MENACE can face in a game. (MENACE always goes first which means it only needs matchboxes for every other move. Use of symmetry further cuts down the number of matchboxes required.) Each match box contains a number of coloured beads. Each colour represents a different move that can be made from the matchbox’s position. So MENACE makes a move by a human selecting the matchbox for the current position and then selecting bead at random from the box to determine the move.
The learning part comes in as follows:
- If the game is lost, the beads selected from each of the boxes used are thrown away.
- If the game is won, the bead is put back in the box along with three extra beads of the same colour
- If the game is a draw, the bead is put back along with one extra bead of the same colour.
Thus the probability of a move that led to a win or a draw increases and the probability of a move that led to a loss decreases.
My computer simulation of MENACE which I called a “stochastic player” differs from Matthew Scrogg’s version in some important respects:
- The number of matchboxes I track is limited only by the RAM in my computer. Therefore, I decided not to worry about eliminating symmetrical positions.
- I can also track matchboxes for both players of the game. In fact, I can teach my version of MENACE to play noughts and crosses by having it play itself and evaluating the “odd move” boxes the opposite way to the “even move” boxes.
- I thought carefully about how to eliminate the possibility of my MENACE playing an illegal move. In the physical matchbox version, each matchbox starts out with no beads for any of the illegal moves. In the end, I decided I couldn’t be bothered with that and decided just to add an extra rule: If the game ends in an illegal move, the beads representing the illegal move are all removed from the matchbox for the last move. So, at the beginning, my version of MENACE doesn’t even know the basic rules of noughts and crosses.
The code for the game can be found at https://bitbucket.org/jeremy-pereira/noughtsandcrosses/src/master/. This is still a work in progress. At the time of writing, it has no user interface and the observations below were made using unit test cases. The tag blog-917 represents the state of the code at the point this blog was written.
To see how my program faired, I ran a mini tournament of it against itself. The tournament consisted of one hundred rounds of forty games. For each set of forty games, the number of wins for each side, the number of draws and the number of games ending in illegal moves was reported. Also, the number of positions encountered by the player was recorded. In my version of MENACE, the machine starts with no match boxes and creates a matchbox filled with five beads for each move each time it encounters a position it has never seen before.
The above is a typical run of the tournament. The results vary but always have the same general pattern. At first there are a lot of games that end in illegal moves, unsurprisingly. Then one or other (or occasionally both) of noughts or crosses starts winning a bit more. However, this tends to be short lived because the number of draws shoots up until it dominates the results. I assume that learning opportunities are sparse while most games end in illegal moves. Once one side or the other starts winning significant numbers of games, both sides start learning and the player quickly becomes unbeatable – at least by itself.
In terms of the number of positions known, they rise pretty quickly at first, but there seems to be some limit which is approached asymptotically. I do not think this is a hard limit in the sense that all possible positions have been seen in the tournament. For a start, I don’t believe the number of reachable positions in a game of noughts and crosses is anything like as small as nine hundred. This is something I can check if I build a minimax noughts and crosses player. A whole branch of positions might get closed off if the first five descents into that branch results in losses.
Some obvious future avenues include pitting my stochastic player against a minimax solution. Also, I need to build a UI so humans can play and also, it would be good to do a Connect4 player.