Beat the robots — Learning about machine learning with hexapawn

Stefan Seegerer
4 min readMay 10, 2021

One major force behind the latest developments in artificial intelligence is machine learning. Machine learning deals with algorithms that improve through experience over time. There are different ways a machine can learn. Reinforcement learning is one of them. With reinforcement learning, the computer learns to master a task by interacting with its environment through reward and punishment, trying to maximize its reward. This post uses a mini-chess game called hexapawn to explore how a computer can learn through reinforcement learning.

The game

Hexapawn (or mini chess) originates from an idea by Martin Gardner, who used it to explain machine learning as early as 1962. In this version, you take on the role of the monkeys, while the computer controls the robots.

Each piece moves like a pawn, i.e. it can only move forward and hit opposing pieces diagonally. A side wins if it manages to

  • move one of its own pieces to the other end of the board,
  • capture all of the opponent’s pieces,
  • or ensure that the opponent is not able to move in the next round.

You make the first move. You can move one of your pieces freely according to the rules of the game. Then it is the computer’s turn. It compares the current playing field with the possible moves and selects the appropriate game situation from the given possibilities (on the right).

Then the computer randomly draws one of the colored tokens next to the game situation. The color of the token determines which move is made. For example, if a red token is drawn, the robot is moved following the red arrow.

Computer moves robot along the red arrow.

This procedure is repeated until a winner is determined. Before a new round is played, the computer now adjusts its strategy as follows:

  • Computer has won: A token in the color of the last, winning turn is additionally placed on the square of that turn.
  • Human has won: The token that determined the computer player’s last move is removed.

Before we move on, it is your turn to try the game and watch how the computer is getting better each round played.

This link takes you to hexapawn: https://www.stefanseegerer.de/schlag-das-krokodil/?robots=true

Background

At first, the computer will have little chance of winning, since it chooses its moves randomly (by drawing a token). The more games the computer finishes, the better it gets: it “learns” which moves will help it win and which it should avoid because they ended in defeat in the past. In this way, the computer’s strategy is gradually refined. Since the computer is punished for losing and rewarded for winning, we also speak of reinforcement learning — learning by reward and punishment:

  • Punishment = taking away a token in a move that led to defeat.
  • Reward = adding a token to a move that led to a win.

This procedure “weeds out” the moves that resulted in defeat, so that eventually only “good” moves remain. In practice, strategies that do not lead to success would not be eliminated immediately, but only the probability of their occurrence would be reduced. Thus, the AI gradually learns which strategy to apply in which situation, but does not instantly eliminate strategies that did not lead to success this time.

Using reinforcement learning a computer can learn to win a game simply by knowing the rules of the game or possible inputs. If a computer learns to play a video game like Super Mario, it will initially just make random inputs (i.e. button presses). This could result in Mario just standing still for minutes or running into an opponent multiple times. By chance, Mario will hit a mystery box or progress within the level, thus getting rewarded. There might also be a delay between the action and the corresponding reward. But over time, the computer will learn that taking a mushroom, jumping in front of an enemy or a gap increases its reward, while running into an opponent will lead to a punishment. In this way, the computer will improve its strategy gradually trying to maximize its reward.

In this post, you have experienced that computers can learn from experience and thus move from purely random actions to an efficient game strategy. But reinforcement learning is not limited to games, it can also be applied to other use cases such as heating, ventilation or AC control (finding optimal settings to reduce energy consumption) or in robotics, when robots learn to act and behave similar to humans. Reinforcement learning can be also used in stock trading or to improve chatbots. But the idea is the same that is used in hexapawn.

PS: This game can also be played analogously, by the way.

--

--