Hazard

Minimax November 21, 2016

Several months ago I skimmed through a borrowed copy of Paradigms of Artificial Intelligence Programming by Peter Norvig. It’s nearly 1,000 pages and far too useful to borrow, so I bought a copy of my own. I’ve been thinking about minimax on and off for a few weeks, and with Norvig’s help I finally sat down to implement it.

Minimax gets its name from the idea of “minimizing the maximum loss”; you want the best worst-case scenario. In a zero-sum game, that usually means minimizing your opponent’s best-case scenario. Norvig builds his minimax function atop an implementation of Othello, including in it some game-specific details for the sake of better efficiency. I’m more interested in the generic algorithm than good performance, and I don’t want to get too distracted implementing game rules, so I’m going to implement the simplest game I can think of that still benefits from thinking ahead: single-pile Nim. Matt Parker demonstrates an “artificial intelligence” from the 1960s playing Nim, but we’ll loosen some of Dr. Nim’s constraints to prevent our version from being quite so deterministic.

The Game of Nim

Before we can do a minimax search, however, we have to implement a game. The game is merely preamble to the main topic, so I’ll go through it fairly quickly.

We’ll depend on two core protocols:


(defprotocol IPlayer
  (ident [this])
  (choice [this game]))

(defprotocol IGame
  (players [this])
  (moves [this player])
  (move [this player m])
  (opponent [this player])
  (winner [this]))

In IPlayer, ident is some name for the player, and choice should return what move the player wants to make given a game state.

In IGame, players returns the game’s list of who’s playing, moves is the moves that are available to a player, move should return a new game with its state updated by the move m, opponent returns the ident of the other player, and winner returns the ident of the winning player if there is one.

The rules of Nim are simple: starting with some number of coins, players alternately take 1, 2, or 3 coins; whoever takes the last coin wins. Here’s an implementation of IGame for Nim:


(declare ->Nim)

(defrecord Nim [players coins winner]
  IGame
  (players [this]
    players)
  (moves [this player]
    (filter #(<= % coins) [1 2 3]))
  (move [this player m]
    (assert (integer? m) (str "Must give a number as a move. Gave " (pr-str m) "."))
    (assert (<= 1 m 3) (str "Can only take 1, 2, or 3 coins."))
    (assert (<= m coins) (str "Cannot take more than " coins " coins. Tried to take " m "."))
    (let [left (- coins m)]
      (->Nim players left (when (zero? left) player))))
  (opponent [this player]
    (->> players
      (remove #(= player (ident %)))
      first ident))
  (winner [this]
    winner))

(defn new-nim [coins players]
  (assert (= 2 (count players)) "There must be exactly 2 players.")
  (assert (= 2 (count (distinct (map ident players)))) "Players must have different names.")
  (->Nim players coins nil))

A move is an integer indicating some number of coins to take. moves returns 1, 2, and 3, but no more than the actual number of coins remaining.

To play a game, we’ll alternate between players and update the current state of the game, quitting when there’s a winner.


(defn play [game]
  (loop [game game
         choices []
         players (cycle (players game))]
    (let [player (first players)
          m (choice player game)
          game' (move game (ident player) m)]
      (if (= (ident player) (winner game'))
        {:winner (ident player)
         :plays (conj choices m)}
        (recur game' (conj choices m) (rest players))))))

Here’s a convenience function for creating players:


(defn new-player [ident strategy]
  (reify IPlayer
    (ident [_] ident)
    (choice [this game] (strategy game ident))))

All that’s now needed to play our game is a function which, given a game and a player’s ident, returns a move. The simplest non-trivial function is one that picks moves randomly:


(defn random-strategy [game player]
  (rand-nth (moves game player)))

Let’s play!


(play (new-nim 15
        [(new-player "Player 1" random-strategy)
         (new-player "Player 2" random-strategy)]))

:plays lets us verify that the game is working correctly; all the moves should add up to 15.

Let’s play a bunch of games to see if one player has an advantage.


(def random-players [(new-player "Random 1" random-strategy)
                     (new-player "Random 2" random-strategy)])

(defn play-game [players]
  (play (new-nim (+ 5 (rand-int 20)) players)))

(frequencies (map :winner (repeatedly 1000 #(play-game random-players))))

Playing random strategies against each other seems to be fairly balanced.

Minimax

Now that we have a game to play, we can apply a minimax search to it. Following Norvig, we’ll define special values to indicate a lost game and a won game. There’s no point in having a score worse than losing a game or a score better than winning, so we’ll use the platform’s smallest and largest integers.


(def winning-value js/Number.MAX_SAFE_INTEGER)

(def losing-value js/Number.MIN_SAFE_INTEGER)

Minimax is fundamentally a recursive algorithm: to know which move is best, you need to know what move your opponent could follow with that would be worst for you. To know that, you need to know what your best move following their move will be, and so on until the end of the game. In games more interesting than Nim, it’s not practical to explore the whole tree of possibilities all the way to the end of the game, so we’ll limit the depth of our search to 6 steps down the tree.

Here’s the whole definition of our minimax search:


(defn minimax [game player depth score]
  (if (zero? depth)
    [nil (score game player)]
    (let [ms (moves game player)
          opp (opponent game player)]
      (if (seq ms)
        (loop [best-move nil
               best-rating losing-value
               ms ms]
          (if (seq ms)
            (let [[m & ms] ms
                  game' (move game player m)
                  [_ rating] (minimax game' opp (dec depth) score)
                  rating (- rating)]
              (if (< best-rating rating)
                (recur m rating ms)
                (recur (or best-move m) rating ms)))
            [best-move best-rating]))
        (if (seq (moves game opp))
          (let [[_ rating] (minimax game opp (dec depth) score)]
            [nil (- rating)])
          (condp = (winner game)
            player [nil winning-value]
            opp [nil losing-value]
            [nil 0]))))))

minimax returns a vector of two values: a move and a rating of how good it is. When it hits its recursion limit, it doesn’t return a move, only a rating for the game it was given. Above its max depth, minimax checks if the player has any legal moves. If so, it runs through them all and finds the move that has the best rating. A move’s rating is determined by applying the move to the game, then doing a minimax search from the opponent’s prespective and negating the returned rating. If our opponent loses, we win. If a move is good for them, it’s bad for us. Once it’s checked all the moves, minimax returns the best rating.

There is one holdover from Norvig’s Othello version of minimax, and you can see it when the player has no moves: it’s assumed that the opponent can go again. That doesn’t make any sense in the case of Nim, since the only time a player has no moves is when there are no coins remaining, but I left the logic there in the spirit of writing a general minimax function.

If neither the player nor their opponent has any moves left, minimax checks the winner and returns the appropriate rating.

To call minimax we need a function that scores a move. In Othello, a good scoring metric is the difference between the number of your pieces and the number of your opponent’s pieces, but we don’t have that luxury in Nim. We can only rate moves on whether we won; otherwise the game continues. Minimax works just fine with such a simple scoring metric.


(defn score [game player]
  (if (= player (winner game))
    winning-value
    0))

To use our minimax search in a player strategy, we have to wrap it in function that only takes a game and a player ident. Here’s where we finally take the move that minimax returns.


(defn minimax-strategy [game player]
  (first (minimax game player 6 score)))

Let’s see how it performs against a random strategy.


(def minimax-random-players [(new-player "Random" random-strategy)
                             (new-player "Minimax" minimax-strategy)])

(frequencies (map :winner (repeatedly 20 #(play-game minimax-random-players))))

It’s clearly much better than randomly choosing moves.

Let’s pit minimax against itself.


(def minimax-players [(new-player "Minimax 1" minimax-strategy)
                      (new-player "Minimax 2" minimax-strategy)])

(frequencies (map :winner (repeatedly 20 #(play-game minimax-players))))

You’re welcome to run more than 20 games to see how the strategies balance. In my tests, the first player seems to have an advantage, which is borne out by some logical reasoning that I’ll let Matt Parker explain.

There’s at least one more assumption baked into minimax that prevents it from being more general. In fact, the same assumption is implicit in one of the protocols I defined at the beginning of the post: IGame and minimax only support two-player games.

There are several improvements we can make to minimax, and Norvig outlines several. The biggest improvement comes from pruning branches that yield sub-optimal results for either you or your opponent. If your opponent makes a mistake and plays a branch you haven’t explored, you can rest assured that your lot will improve over what you calculated, since we’ve already minimized the maximum loss. Pruning branches allows minimax to spend it’s time probing deeper on better branches. Norvig calls this an alpha-beta search, and it’s a fairly trivial update to minimax.

Norvig also considers alternative scoring functions. For Othello, he adds a function to score moves based on whether they’re likely to give the opponent a corner or force the opponent into giving you one. Theoretically, a weighted scoring function isn’t necessary for minimax to succeed, but in practice, it is. The search tree of interesting games like Othello grows too rapidly to search exhaustively. Norvig’s weighted scoring function is based on insight into how the game works, and using it in place of a non-weighted scoring function allows minimax to sniff out what branches are promising, even when it can’t directly explore that far into the future. The above implementation of minimax doesn’t support weighing some moves as better as score doesn’t take the move in question.

If you’d like more details, I suggest you find your own copy of Paradigms of Artificial Intelligence Programming.