Combinatorial Games. Episode 2: Tic-Tac-Toe Problems in Leetcode and Solve the Game using Minimax

Jul 12 2020 Tech Blog 23 minutes read (About 3394 words)

This episode extends last one, where Minimax and Alpha Beta Pruning algorithms are introduced. We will solve several tic-tac-toe problems in leetcode, gathering intuition and building blocks for tic-tac-toe game logic, which can be naturally extended to Connect-N game or Gomoku (N=5). Then we solve tic-tac-toe using Minimax and Alpha Beta pruning for small N and analyze their state space. In the following episodes, based on building blocks here, we will implement a Connect-N Open Gym GUI Environment, where we can play against computer visually or compare different computer algorithms. Finally, we demonstrate how to implement a Monte Carlo Tree Search for Connect-N Game.

Episode 1: Minimax and Alpha Beta Pruning in Leetcode
Episode 2: Tic-Tac-Toe Problems in Leetcode and Solve the Game using Minimax
Episode 3: Connect-N (Tic-Tac-Toe, Gomoku) OpenAI Gym GUI Environment
Episode 4: Connect-N (Tic-Tac-Toe, Gomoku) AlphaGo Zero Self-Play MCTS Reinforcement Learning

Leetcode Tic-Tac-Toe Problems

1275. Find Winner on a Tic Tac Toe Game (Easy)

Tic-tac-toe is played by two players A and B on a 3 x 3 grid.
Here are the rules of Tic-Tac-Toe:
Players take turns placing characters into empty squares (" ").
The first player A always places "X" characters, while the second player B always places "O" characters.
"X" and "O" characters are always placed into empty squares, never on filled ones.
The game ends when there are 3 of the same (non-empty) character filling any row, column, or diagonal.
The game also ends if all squares are non-empty.
No more moves can be played if the game is over. Given an array moves where each element is another array of size 2 corresponding to the row and column of the grid where they mark their respective character in the order in which A and B play.
Return the winner of the game if it exists (A or B), in case the game ends in a draw return "Draw", if there are still movements to play return "Pending".
You can assume that moves is valid (It follows the rules of Tic-Tac-Toe), the grid is initially empty and A will play first.

Example 1:
Input: moves = [[0,0],[2,0],[1,1],[2,1],[2,2]]
Output: "A"
Explanation: "A" wins, he always plays first.
"X " "X " "X " "X " "X "
" " -> " " -> " X " -> " X " -> " X "
" " "O " "O " "OO " "OOX"

Example 2:
Input: moves = [[0,0],[1,1],[0,1],[0,2],[1,0],[2,0]]
Output: "B"
Explanation: "B" wins.
"X " "X " "XX " "XXO" "XXO" "XXO"
" " -> " O " -> " O " -> " O " -> "XO " -> "XO "
" " " " " " " " " " "O "

Example 3:
Input: moves = [[0,0],[1,1],[2,0],[1,0],[1,2],[2,1],[0,1],[0,2],[2,2]]
Output: "Draw"
Explanation: The game ends in a draw since there are no moves to make.
"XXO"
"OOX"
"XOX"

Example 4:
Input: moves = [[0,0],[1,1]]
Output: "Pending"
Explanation: The game has not finished yet.
"X "
" O "
" "

The intuitive solution is to permute all 8 possible winning conditions: 3 vertical lines, 3 horizontal lines and 2 diagonal lines. We keep 8 variables representing each winning condition and a simple trick is converting board state to a 3x3 2d array, whose cell has value -1, 1, and 0. In this way, we can traverse the board state exactly once and in the process determine all 8 variables value by summing corresponding cell value. For example, row[0] is for first line winning condition, summed by all 3 cells in first row during board traveral. It indicates win for first player only when it's equal to 3 and win for second player when it's -3.

{linenos

# AC
from typing import List

class Solution:
    def tictactoe(self, moves: List[List[int]]) -> str:
        board = [[0] * 3 for _ in range(3)]
        for idx, xy in enumerate(moves):
            player = 1 if idx % 2 == 0 else -1
            board[xy[0]][xy[1]] = player

        turn = 0
        row, col = [0, 0, 0], [0, 0, 0]
        diag1, diag2 = False, False
        for r in range(3):
            for c in range(3):
                turn += board[r][c]
                row[r] += board[r][c]
                col[c] += board[r][c]
                if r == c:
                    diag1 += board[r][c]
                if r + c == 2:
                    diag2 += board[r][c]

        oWin = any(row[r] == 3 for r in range(3)) or any(col[c] == 3 for c in range(3)) or diag1 == 3 or diag2 == 3
        xWin = any(row[r] == -3 for r in range(3)) or any(col[c] == -3 for c in range(3)) or diag1 == -3 or diag2 == -3

        return "A" if oWin else "B" if xWin else "Draw" if len(moves) == 9 else "Pending"

Below we give another AC solution. Despite more code, it's more efficient than previous one because for a given game state, it does not need to visit each cell on the board. How is it achieved? The problem guarentees each move is valid, so what's sufficent to examine is to check neighbours of the final move and see if any line including final move creates a winning condition. Later we will reuse the code in this solution to create tic-tac-toe game logic.

{linenos

# AC
from typing import List

class Solution:
    def checkWin(self, r: int, c: int) -> bool:
        north = self.getConnectedNum(r, c, -1, 0)
        south = self.getConnectedNum(r, c, 1, 0)

        east = self.getConnectedNum(r, c, 0, 1)
        west = self.getConnectedNum(r, c, 0, -1)

        south_east = self.getConnectedNum(r, c, 1, 1)
        north_west = self.getConnectedNum(r, c, -1, -1)

        north_east = self.getConnectedNum(r, c, -1, 1)
        south_west = self.getConnectedNum(r, c, 1, -1)

        if (north + south + 1 >= 3) or (east + west + 1 >= 3) or \
                (south_east + north_west + 1 >= 3) or (north_east + south_west + 1 >= 3):
            return True
        return False

    def getConnectedNum(self, r: int, c: int, dr: int, dc: int) -> int:
        player = self.board[r][c]
        result = 0
        i = 1
        while True:
            new_r = r + dr * i
            new_c = c + dc * i
            if 0 <= new_r < 3 and 0 <= new_c < 3:
                if self.board[new_r][new_c] == player:
                    result += 1
                else:
                    break
            else:
                break
            i += 1
        return result

    def tictactoe(self, moves: List[List[int]]) -> str:
        self.board = [[0] * 3 for _ in range(3)]
        for idx, xy in enumerate(moves):
            player = 1 if idx % 2 == 0 else -1
            self.board[xy[0]][xy[1]] = player

        # only check last move
        r, c = moves[-1]
        win = self.checkWin(r, c)
        if win:
            return "A" if len(moves) % 2 == 1 else "B"

        return "Draw" if len(moves) == 9 else "Pending"

Leetcode 794. Valid Tic-Tac-Toe State(Medium)

A Tic-Tac-Toe board is given as a string array board. Return True if and only if it is possible to reach this board position during the course of a valid tic-tac-toe game.
The board is a 3 x 3 array, and consists of characters " ", "X", and "O". The " " character represents an empty square.
Here are the rules of Tic-Tac-Toe:
Players take turns placing characters into empty squares (" ").
The first player A always places "X" characters, while the second player B always places "O" characters.
"X" and "O" characters are always placed into empty squares, never on filled ones.
The game ends when there are 3 of the same (non-empty) character filling any row, column, or diagonal.
The game also ends if all squares are non-empty.
No more moves can be played if the game is over.

Example 1:
Input: board = ["O ", " ", " "]
Output: false
Explanation: The first player always plays "X".

Example 2:
Input: board = ["XOX", " X ", " "]
Output: false
Explanation: Players take turns making moves.

Example 3:
Input: board = ["XXX", " ", "OOO"]
Output: false

Example 4:
Input: board = ["XOX", "O O", "XOX"]
Output: true

Note: board is a length-3 array of strings, where each string board[i] has length 3.
Each board[i][j] is a character in the set {" ", "X", "O"}.

Surely, it can be solved using DFS, checking if the state given would be reached from initial state. However, this involves lots of states to search. Could we do better? There are obvious properties we can rely on. For example, the number of X is either equal to the number of O or one more. If we can enumerate a combination of necessary and sufficient conditions of checking its reachability, we can solve it in O(1) time complexity.

{linenos

# AC
from typing import List

class Solution:

    def convertCell(self, c:str):
        return 1 if c == 'X' else -1 if c == 'O' else 0

    def validTicTacToe(self, board: List[str]) -> bool:
        turn = 0
        row, col = [0, 0, 0], [0, 0, 0]
        diag1, diag2 = False, False
        for r in range(3):
            for c in range(3):
                turn += self.convertCell(board[r][c])
                row[r] += self.convertCell(board[r][c])
                col[c] += self.convertCell(board[r][c])
                if r == c:
                    diag1 += self.convertCell(board[r][c])
                if r + c == 2:
                    diag2 += self.convertCell(board[r][c])

        xWin = any(row[r] == 3 for r in range(3)) or any(col[c] == 3 for c in range(3)) or diag1 == 3 or diag2 == 3
        oWin = any(row[r] == -3 for r in range(3)) or any(col[c] == -3 for c in range(3)) or diag1 == -3 or diag2 == -3
        if (xWin and turn == 0) or (oWin and turn == 1):
            return False
        return (turn == 0 or turn == 1) and (not xWin or not oWin)

Leetcode 348. Design Tic-Tac-Toe (Medium, Locked)

Design a Tic-tac-toe game that is played between two players on a n x n grid.
You may assume the following rules:
A move is guaranteed to be valid and is placed on an empty block.
Once a winning condition is reached, no more moves is allowed.
A player who succeeds in placing n of their marks in a horizontal, vertical, or diagonal row wins the game.

Example:
Given n = 3, assume that player 1 is "X" and player 2 is "O" in the board.
TicTacToe toe = new TicTacToe(3);

toe.move(0, 0, 1); -> Returns 0 (no one wins)
|X| | |
| | | | // Player 1 makes a move at (0, 0).
| | | |

toe.move(0, 2, 2); -> Returns 0 (no one wins)
|X| |O|
| | | | // Player 2 makes a move at (0, 2).
| | | |

toe.move(2, 2, 1); -> Returns 0 (no one wins)
|X| |O|
| | | | // Player 1 makes a move at (2, 2).
| | |X|

toe.move(1, 1, 2); -> Returns 0 (no one wins)
|X| |O|
| |O| | // Player 2 makes a move at (1, 1).
| | |X|

toe.move(2, 0, 1); -> Returns 0 (no one wins)
|X| |O|
| |O| | // Player 1 makes a move at (2, 0).
|X| |X|

toe.move(1, 0, 2); -> Returns 0 (no one wins)
|X| |O|
|O|O| | // Player 2 makes a move at (1, 0).
|X| |X|

toe.move(2, 1, 1); -> Returns 1 (player 1 wins)
|X| |O|
|O|O| | // Player 1 makes a move at (2, 1).
|X|X|X|

Follow up:
Could you do better than O(n2) per move() operation?

348 is a locked problem. For each player's move, we can resort to checkWin function in second solution for 1275. We show another solution based on first solution of 1275, where 8 winning condition flags are kept and each move only touches associated several flag variables.

{linenos

# AC
class TicTacToe:

    def __init__(self, n:int):
        """
        Initialize your data structure here.
        :type n: int
        """
        self.row, self.col, self.diag1, self.diag2, self.n = [0] * n, [0] * n, 0, 0, n

    def move(self, row:int, col:int, player:int) -> int:
        """
        Player {player} makes a move at ({row}, {col}).
        @param row The row of the board.
        @param col The column of the board.
        @param player The player, can be either 1 or 2.
        @return The current winning condition, can be either:
                0: No one wins.
                1: Player 1 wins.
                2: Player 2 wins.
        """
        if player == 2:
            player = -1

        self.row[row] += player
        self.col[col] += player
        if row == col:
            self.diag1 += player
        if row + col == self.n - 1:
            self.diag2 += player

        if self.n in [self.row[row], self.col[col], self.diag1, self.diag2]:
            return 1
        if -self.n in [self.row[row], self.col[col], self.diag1, self.diag2]:
            return 2
        return 0

Optimal Strategy of Tic-Tac-Toe

Tic-tac-toe and Gomoku (Connect Five in a Row) share the same rules and are generally considered as M,n,k-game, where board size range to M x N and winning condition changes to k.

ConnectNGame class implements M,n,k-game of MxM board size. It encapsulates the logic of checking each move and also is able to undo last move to facilitate backtrack in game search algorithm later.

ConnectNGame

{linenos

class ConnectNGame:

    PLAYER_A = 1
    PLAYER_B = -1
    AVAILABLE = 0
    RESULT_TIE = 0
    RESULT_A_WIN = 1
    RESULT_B_WIN = -1

    def __init__(self, N:int = 3, board_size:int = 3):
        assert N <= board_size
        self.N = N
        self.board_size = board_size
        self.board = [[ConnectNGame.AVAILABLE] * board_size for _ in range(board_size)]
        self.gameOver = False
        self.gameResult = None
        self.currentPlayer = ConnectNGame.PLAYER_A
        self.remainingPosNum = board_size * board_size
        self.actionStack = []

    def move(self, r: int, c: int) -> int:
        """

        :param r:
        :param c:
        :return: None: game ongoing
        """
        assert self.board[r][c] == ConnectNGame.AVAILABLE
        self.board[r][c] = self.currentPlayer
        self.actionStack.append((r, c))
        self.remainingPosNum -= 1
        if self.checkWin(r, c):
            self.gameOver = True
            self.gameResult = self.currentPlayer
            return self.currentPlayer
        if self.remainingPosNum == 0:
            self.gameOver = True
            self.gameResult = ConnectNGame.RESULT_TIE
            return ConnectNGame.RESULT_TIE
        self.currentPlayer *= -1

    def undo(self):
        if len(self.actionStack) > 0:
            lastAction = self.actionStack.pop()
            r, c = lastAction
            self.board[r][c] = ConnectNGame.AVAILABLE
            self.currentPlayer = ConnectNGame.PLAYER_A if len(self.actionStack) % 2 == 0 else ConnectNGame.PLAYER_B
            self.remainingPosNum += 1
            self.gameOver = False
            self.gameResult = None
        else:
            raise Exception('No lastAction')

    def getAvailablePositions(self) -> List[Tuple[int, int]]:
        return [(i,j) for i in range(self.board_size) for j in range(self.board_size) if self.board[i][j] == ConnectNGame.AVAILABLE]

    def getStatus(self) -> Tuple[Tuple[int, ...]]:
        return tuple([tuple(self.board[i]) for i in range(self.board_size)])

Note that checkWin code is identical to second solution in 1275.

Minimax Strategy

Now we have Connect-N game logic, let's finish its minimax algorithm to solve the game.

Define a generic strategy base class, where action method needs to be overridden. Action method expects ConnectNGame class telling current game state and returns a tuple of 2 elements, the first element is the estimated or exact game result after taking action specified by second element. The second element is of form Tuple[int, int], denoting the position of the move, for instance, (1，1).

{linenos

class Strategy(ABC):

    def __init__(self):
        super().__init__()

    @abstractmethod
    def action(self, game: ConnectNGame) -> Tuple[int, Tuple[int, int]]:
        pass

MinimaxStrategy code is very similar to previous minimax algorithms. The only added piece is the corresponding move returned by action method.

{linenos

class MinimaxStrategy(Strategy):
    def action(self, game: ConnectNGame) -> Tuple[int, Tuple[int, int]]:
        self.game = copy.deepcopy(game)
        result, move = self.minimax()
        return result, move

    def minimax(self) -> Tuple[int, Tuple[int, int]]:
        game = self.game
        bestMove = None
        assert not game.gameOver
        if game.currentPlayer == ConnectNGame.PLAYER_A:
            ret = -math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.minimax()
                game.undo()
                ret = max(ret, result)
                bestMove = move if ret == result else bestMove
                if ret == 1:
                    return 1, move
            return ret, bestMove
        else:
            ret = math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.minimax()
                game.undo()
                ret = min(ret, result)
                bestMove = move if ret == result else bestMove
                if ret == -1:
                    return -1, move
            return ret, bestMove

We plot up to first 2 moves with code above. For first player O, there are possibly 9 positions, where due to symmetry, only 3 kinds of moves, which we call corner, edge and center, respectively. The following graph shows whatever 9 positions the first player takes, the best result is draw. So solution of tic-tac-toe is draw.

Plot first step of 3 kinds of moves one by one below.

Tic-tac-toe Solution and Number of States

An interesting question is the number of game states of tic-tac-toe. A loosely upper bound can be derived by \(3^9=19683\), which includes lots of inreachable states. This article Tic-Tac-Toe (Naughts and Crosses, Cheese and Crackers, etc lists number of states after each move. The total number is 5478.

Moves	Positions	Terminal Positions
0	1
1	9
2	72
3	252
4	756
5	1260	120
6	1520	148
7	1140	444
8	390	168
9	78	78
Total	5478	958

We can verify the number if we change a little of existing code to code below.

{linenos


class CountingMinimaxStrategy(Strategy):
    def action(self, game: ConnectNGame) -> Tuple[int, Tuple[int, int]]:
        self.game = copy.deepcopy(game)
        self.dpMap = {}
        result, move = self.minimax(game.getStatus())
        return result, move

    def minimax(self, gameStatus: Tuple[Tuple[int, ...]]) -> Tuple[int, Tuple[int, int]]:
        # print(f'Current {len(strategy.dpMap)}')

        if gameStatus in self.dpMap:
            return self.dpMap[gameStatus]

        game = self.game
        bestMove = None
        assert not game.gameOver
        if game.currentPlayer == ConnectNGame.PLAYER_A:
            ret = -math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.minimax(game.getStatus())
                    self.dpMap[game.getStatus()] = result, oppMove
                else:
                    self.dpMap[game.getStatus()] = result, move
                game.undo()
                ret = max(ret, result)
                bestMove = move if ret == result else bestMove
            self.dpMap[gameStatus] = ret, bestMove
            return ret, bestMove
        else:
            ret = math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)

                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.minimax(game.getStatus())
                    self.dpMap[game.getStatus()] = result, oppMove
                else:
                    self.dpMap[game.getStatus()] = result, move
                game.undo()
                ret = min(ret, result)
                bestMove = move if ret == result else bestMove
            self.dpMap[gameStatus] = ret, bestMove
            return ret, bestMove


if __name__ == '__main__':
    tic_tac_toe = ConnectNGame(N=3, board_size=3)
    strategy = CountingMinimaxStrategy()
    strategy.action(tic_tac_toe)
    print(f'Game States Number {len(strategy.dpMap)}')

Running the code proves the total number is 5478. Also illustrate some small scale game configuration results.

	3x3	4x4
k=3	5478 （Draw)	6035992 （Win）
k=4		9722011 （Draw）
k=5

According to Wikipedia M,n,k-game, below are results for some game configuration.

	3x3	4x4	5x5	6x6
k=3	Draw	Win	Win	Win
k=4		Draw	Draw	Win
k=5			Draw	Draw

What's worth mentioning is that Gomoku (Connect Five in a Row), of board size MxM >= 15x15 is proved by L. Victor Allis to be Win.

Alpha-Beta Pruning Strategy

Alpha Beta Pruning Strategy is pasted below.

{linenos

class AlphaBetaStrategy(Strategy):
    def action(self, game: ConnectNGame) -> Tuple[int, Tuple[int, int]]:
        self.game = game
        result, move = self.alpha_beta(self.game.getStatus(), -math.inf, math.inf)
        return result, move

    def alpha_beta(self, gameStatus: Tuple[Tuple[int, ...]], alpha:int=None, beta:int=None) -> Tuple[int, Tuple[int, int]]:
        game = self.game
        bestMove = None
        assert not game.gameOver
        if game.currentPlayer == ConnectNGame.PLAYER_A:
            ret = -math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.alpha_beta(game.getStatus(), alpha, beta)
                game.undo()
                alpha = max(alpha, result)
                ret = max(ret, result)
                bestMove = move if ret == result else bestMove
                if alpha >= beta or ret == 1:
                    return ret, move
            return ret, bestMove
        else:
            ret = math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    result, oppMove = self.alpha_beta(game.getStatus(), alpha, beta)
                game.undo()
                beta = min(beta, result)
                ret = min(ret, result)
                bestMove = move if ret == result else bestMove
                if alpha >= beta or ret == -1:
                    return ret, move
            return ret, bestMove

Rewrite alpha beta pruning with DP, where we omit alpha and beta parameters in alpha_beta_dp because lru_cache cannot specify effective parameters. Instead, we keep alpha and beta in a stack variable and maintain the stack according to alpha_bate_dp calling stack.

{linenos

class AlphaBetaDPStrategy(Strategy):
    def action(self, game: ConnectNGame) -> Tuple[int, Tuple[int, int]]:
        self.game = game
        self.alphaBetaStack = [(-math.inf, math.inf)]
        result, move = self.alpha_beta_dp(self.game.getStatus())
        return result, move

    @lru_cache(maxsize=None)
    def alpha_beta_dp(self, gameStatus: Tuple[Tuple[int, ...]]) -> Tuple[int, Tuple[int, int]]:
        alpha, beta = self.alphaBetaStack[-1]
        game = self.game
        bestMove = None
        assert not game.gameOver
        if game.currentPlayer == ConnectNGame.PLAYER_A:
            ret = -math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    self.alphaBetaStack.append((alpha, beta))
                    result, oppMove = self.alpha_beta_dp(game.getStatus())
                    self.alphaBetaStack.pop()
                game.undo()
                alpha = max(alpha, result)
                ret = max(ret, result)
                bestMove = move if ret == result else bestMove
                if alpha >= beta or ret == 1:
                    return ret, move
            return ret, bestMove
        else:
            ret = math.inf
            for pos in game.getAvailablePositions():
                move = pos
                result = game.move(*pos)
                if result is None:
                    assert not game.gameOver
                    self.alphaBetaStack.append((alpha, beta))
                    result, oppMove = self.alpha_beta_dp(game.getStatus())
                    self.alphaBetaStack.pop()
                game.undo()
                beta = min(beta, result)
                ret = min(ret, result)
                bestMove = move if ret == result else bestMove
                if alpha >= beta or ret == -1:
                    return ret, move
            return ret, bestMove

#Algorithm #LeetCode #Python #Gaming #Reinforcement Learning #OpenAI Gym

Combinatorial Games. Episode 2: Tic-Tac-Toe Problems in Leetcode and Solve the Game using Minimax

Leetcode Tic-Tac-Toe Problems

1275. Find Winner on a Tic Tac Toe Game (Easy)

Leetcode 794. Valid Tic-Tac-Toe State(Medium)

Leetcode 348. Design Tic-Tac-Toe (Medium, Locked)

Optimal Strategy of Tic-Tac-Toe

ConnectNGame

Minimax Strategy

Tic-tac-toe Solution and Number of States

Alpha-Beta Pruning Strategy

Comments

Your browser is out-of-date!