NOTE: This article was initially posted on my Substack, at https://andresalvareziglesias.substack.com/
Hi all!
The Tic Magical Line experiment is approaching to an end. In the previous articles, we have learned how to build a full stack Django version of the TicTacToe game, inside a containerized environment with the help of Docker.
Our TicTacToe is a (sort of) MMORPG. Each player can battle against other players... but also against the CPU, disguised as a dragon.
Let's make the dragon's brain and play a bit with the mysterious world of AI and Machine Learning...
Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.
Articles in this series
- Chapter 1: Let the journey start
- Chapter 2: Create a containerized Django app with Gunicorn and Docker
- Chapter 3: Serve Django static files with NGINX
- Chapter 4: Adding a database to our stack
- Chapter 5: Applications and sites
- Chapter 6: Using the Django ORM
- Chapter 7: Users login, logout and register
- Chapter 8: Implementing the game in Player vs Player
- Chapter 9: Scheduled tasks
CPU player without Machine Learning
The TicTacToe is a simple game, and the CPU player logic can be really simple too. We can do something like this:
import random
import os
from game.tictactoe.dragonagent import DragonAgent
class DragonPlay:
def __init__(self, board, type="ai"):
self.board = board
self.type = type
def chooseMovement(self):
if self.type == "simple":
return self.simpleMovement()
else:
raise Exception("Not implemented yet!")
def getEmptyPositions(self):
emptyPositions = []
for i in range(0, 9):
if self.board[i] == "E":
emptyPositions.append(i)
return emptyPositions
def simpleMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print("No empty position to play!")
return -1
if random.choice([True, False]):
# Choose the fist empty position and play there
return emptyPositions[0]
else:
# Choose a random empty position and play there
return random.choice(emptyPositions)
This simple agent makes random movements in a very dumb way... but allows a player to play against the CPU. Very useful for testing the game logic of our Django application until now... but a bit boring at the end.
We need a smarter dragon...
CPU player with Machine Learning
Make easy things hard, just for fun. Let's create the same CPU player, but using a bit of AI and Machine Learning this time:
import random
import numpy as np
from tensorflow.keras.models import load_model
import os
from game.tictactoe.dragonagent import DragonAgent
class DragonPlay:
def __init__(self, board, type="ai"):
self.board = board
self.type = type
def chooseMovement(self):
if self.type == "simple":
return self.simpleMovement()
else:
return self.aiMovement()
def getEmptyPositions(self):
emptyPositions = []
for i in range(0, 9):
if self.board[i] == "E":
emptyPositions.append(i)
return emptyPositions
def simpleMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print("No empty position to play!")
return -1
if random.choice([True, False]):
# Choose the fist empty position and play there
return emptyPositions[0]
else:
# Choose a random empty position and play there
return random.choice(emptyPositions)
def aiMovement(self):
emptyPositions = self.getEmptyPositions()
if len(emptyPositions) == 0:
print("No empty position to play!")
return -1
agent = DragonAgent()
if os.path.exists('/game/tictactoe/model/dragon.keras'):
agent.model = load_model('/game/tictactoe/model/dragon.keras')
validMove = False
position = -1
while not validMove:
position = agent.start(self.boardToState(self.board))
if self.board[position] == "E":
validMove = True
return position
def boardToState(self, board):
state = []
for cell in board:
if cell == 'E':
state.append(0)
elif cell == 'X':
state.append(1)
elif cell == 'O':
state.append(-1)
return state
This code loads an Agent class and a Machine Learning model. The agent class is a TensorFlow based agent using the QLearning machine learning algorithm, a reinforcement algorithm that learns playing:
import numpy as np
import tensorflow as tf
class DragonAgent:
def __init__(self, alpha=0.5, discount=0.95, exploration_rate=1.0):
self.alpha = alpha
self.discount = discount
self.exploration_rate = exploration_rate
self.state = None
self.action = None
self.model = tf.keras.models.Sequential([
tf.keras.layers.Dense(32, input_shape=(9,), activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(9)
])
self.model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=alpha), loss='mse')
def start(self, state):
self.state = np.array(state)
self.action = self.get_action(state)
return self.action
def get_action(self, state):
if np.random.uniform(0, 1) < self.exploration_rate:
action = np.random.choice(9)
else:
q_values = self.model.predict(np.array([state]))
action = np.argmax(q_values[0])
return action
def learn(self, state, action, reward, next_state):
q_update = reward
if next_state is not None:
q_values_next = self.model.predict(np.array([next_state]))
q_update += self.discount * np.max(q_values_next[0])
q_values = self.model.predict(np.array([state]))
q_values[0][action] = q_update
self.model.fit(np.array([state]), q_values, verbose=0)
self.exploration_rate *= 0.99
def step(self, state, reward):
action = self.get_action(state)
self.learn(self.state, self.action, reward, state)
self.state = np.array(state)
self.action = action
return action
It's a bit confusing, we need to learn how we can use this agent to understand it. All will make sense at the end, believe me :)
How to train your dragon
In this line the previous code loaded a pre trained model:
load_model('/game/tictactoe/model/dragon.keras')
But, how can we train this model? We can teach a couple of dragons how to play to TicTacToe and reward them with each victory and punish them with each defeat. The dragons can now play one time, and other, and other, and other... You get the idea.
How can we implement this? Simple: get a TicTacToe board, a couple of DragonAgent instances and let's the play begin:
import numpy as np
from tensorflow.keras.models import load_model
import tensorflow
import os
import random
import sys
from dragonagent import DragonAgent
from tictactoe import TicTacToe
def boardToState(board):
state = []
for cell in board:
if cell == 'E':
state.append(0)
elif cell == 'X':
state.append(1)
elif cell == 'O':
state.append(-1)
return state
def agentPlay(prefix, name, game, agent, symbol):
validMove = False
while not validMove:
if game.freeBoardPositions() > 1:
position = agent.get_action(boardToState(game.board))
else:
position = game.getUniquePossibleMovement()
validMove = game.makeMove(symbol, position)
if validMove:
print(f"{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}")
return game.checkGameOver()
def agentStart(prefix, name, game, agent, symbol):
validMove = False
while not validMove:
position = agent.start(boardToState(game.board))
validMove = game.makeMove(symbol, position)
if validMove:
print(f"{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}")
return game.checkGameOver()
def playGame(prefix, agent, opponent):
emptyBoard = "EEEEEEEEE"
game = TicTacToe(emptyBoard)
# Choose who starts the game
agentIsO = random.choice([True, False])
print(f"{prefix} > NOTE: In this game the agent is {'O' if agentIsO else 'X'}")
agentInitialized = False
opponentInitialized = False
while not game.checkGameOver() and not game.noPossibleMove():
if agentIsO:
# Give an immediate reward on 1 if the agent wins
if agentInitialized:
position = agentPlay(prefix, "Agent", game, agent, 'O')
else:
position = agentStart(prefix, "Agent", game, agent, 'O')
agentInitialized = True
if game.checkGameOver():
print(f"{prefix} > Agent wins! Agent's reward is: +1")
agent.learn(boardToState(game.board), position, 1, None)
break
# Give an immediate penalty regard on -1 if the opponent wins
if opponentInitialized:
position = agentPlay(prefix, "Opponent", game, opponent, 'X')
else:
position = agentStart(prefix, "Opponent", game, opponent, 'X')
opponentInitialized = True
if game.checkGameOver():
print(f"{prefix} > Opponent wins! Agent's reward is: -1")
agent.learn(boardToState(game.board), position, -1, None)
break
else:
# Give an immediate penalty regard on -1 if the opponent wins
if opponentInitialized:
position = agentPlay(prefix, "Opponent", game, opponent, 'O')
else:
position = agentStart(prefix, "Opponent", game, opponent, 'O')
opponentInitialized = True
if game.checkGameOver():
print(f"{prefix} > Opponent wins! Agent's reward is: -1")
agent.learn(boardToState(game.board), position, -1, None)
break
# Give an immediate reward on 1 if the agent wins
if agentInitialized:
position = agentPlay(prefix, "Agent", game, agent, 'X')
else:
position = agentStart(prefix, "Agent", game, agent, 'X')
agentInitialized = True
if game.checkGameOver():
print(f"{prefix} > Agent wins! Agent's reward is: +1")
agent.learn(boardToState(game.board), position, 1, None)
break
# If no one wins, give a reward of 0
agent.step(boardToState(game.board), 0)
print(f'{prefix} > Game over! Winner: {game.winner}')
game.dumpBoard()
if (agentIsO and game.winner == 'O') or (not agentIsO and game.winner == 'X'):
return 1
elif game.winner == 'D':
return 0
else:
return -1
# Reopen the trained model if available
agent = DragonAgent()
if os.path.exists('/game/tictactoe/model/dragon.keras'):
agent.model = load_model('/game/tictactoe/model/dragon.keras')
# The opponent muest be more exploratory; set yo 1.0 to always choose random actions
# exploration_rate goes from 0.0 to 1.0)
opponent = DragonAgent(exploration_rate=0.9)
# We can optionally set the number of games from command line
try:
numberOfGames = int(sys.argv[1])
except:
numberOfGames = 10
# Uncomment to disable keras training messages
tensorflow.keras.utils.disable_interactive_logging()
# Play each game
wins = 0
draws = 0
loses = 0
for numGame in range(numberOfGames):
prefix = f"{numGame+1}/{numberOfGames}"
print(f"Playing game {prefix}...")
result = playGame(prefix, agent, opponent)
if result == 1:
wins += 1
elif result == 0:
draws += 1
else:
loses += 1
# Save the trained model after each game
agent.model.save('/game/tictactoe/model/dragon.keras')
print(f'{prefix} > Training result until now: {wins} wins, {loses} loses, {draws} draws')
print()
I'm sure that there is a better way of doing this, but remember, we are still learning, start with something that (sort of) works and improve it later 🙂
This piece of code performs any number of IA battles, learning on the way and storing the training result on a model file. Later, we can use this model file in the Tic Magical Line application.
Not very useful... but funny!
What we learned until now
This experiment has been an excuse from the beginning to the end to learn how to build a Django application inside a Dockerized environment. Everything else (the TicTacToe part, the Dragons and the machine learning) is just a bit of spice to make the learning more funny.
We have learned until now that Django is awesome. Is full of functionalities, very organized and has a toon of plugins and extensions. Very, very useful.
Now, we can use this fantastic framework to do more useful applications.
Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.
About the list
Among the Python and Docker posts, I will also write about other related topics (always tech and programming topics, I promise... with the fingers crossed), like:
- Software architecture
- Programming environments
- Linux operating system
- Etc.
If you found some interesting technology, programming language or whatever, please, let me know! I'm always open to learning something new!
About the author
I'm Andrés, a full-stack software developer based in Palma, on a personal journey to improve my coding skills. I'm also a self-published fantasy writer with four published novels to my name. Feel free to ask me anything!
Top comments (0)