Erol

Posted on Dec 10, 2020 • Edited on Jan 26, 2021

Trex Game with CNN

#cnn #python #deeplearning #imagerecognition

Trex Game with CNN

In this article, I created a model that plays Trex Game. I need to explain that, this model is not a Reinforcement Learning (RL) model. It is a simple CNN model that predicts objects and keyboard actions.

Content:

1) Getting Data

2) Train a CNN Model

3) Test the Trained Model in Game

Getting Data

First I started with getting data. In order to do that, I used some libraries;

import keyboard
import uuid
import time
from PIL import Image
from mss import mss

Using the library called keyboard I saved my action for example "up", "down" and "right".
To import that, first, the library must be installed:

pip install keyboard

Using this library called mss, I can record my screen for the game. When I run all lines, I'll switch the game screen and the model that is trained with the data I got, will predict action according to images. Also, this library (mss) helps me to cut off some areas in the screen. Thus, the model can only focuses on determined area. But at the first, this library must be installed like others. For that:

pip install mss

After import libraries, I will set coordinates of the game, because in the screen there are a lot of useless stuff, in order to remove them I will determine some coordinates using Paint :)

mon = {"top":370,
       "left":700,
       "width":200,
       "height":145}

Using the library called mss, I can easily cut off the area that I want the model to see only. Therefore I defined mss

sct = mss()

Now, I will create a function that I will use for recording.

i = 0
def record_screen(record_id, key):
    global i # I will use this i inside and outside of the function. (I have an other i)

    i += 1
    print(f"{key}, {i}") #key: char of keyboard, i: num of press for char
    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im.save(f"data/img/{key}_{record_id}_{i}.png")

After this record function, I need to define an other function for the exit. When I want to exit in recording, I will press "esc" and then the exit function will be called.
For that:

is_exit = False

def exit():
    global is_exit
    is_exit = True

keyboard.add_hotkey("esc", exit)

After all, I can set last stuff. Here I will define

record_id = uuid.uuid4()
while True:

    if is_exit: break

    try:
        if keyboard.is_pressed(keyboard.KEY_UP):
            record_screen(record_id=record_id, key="up")
            time.sleep(0.1)

        elif keyboard.is_pressed(keyboard.KEY_DOWN):
            record_screen(record_id=record_id, key="down")
            time.sleep(0.1)

        elif keyboard.is_pressed("right"):
            record_screen(record_id=record_id, key="right")
            time.sleep(0.1)

    except RuntimeError: continue

Now, I am ready to get data in game.

2) Train a CNN Model

In this content, I will train a CNN model using the data I got. The model will predict keyboard actions according to images. For example, if the model recognizes a cactus, the action will be "UP", or if the model recognizes a bird, the action will be "DOWN" or "UP". In order to take place that, first I start with that import libraries.

import glob
import os
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from PIL import Image
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt

after libraries, I'll define path of images:

imgs = glob.glob("data/img/*.png")

I'll resize my images

width = 125
height = 50

Before training, I need to apply some operations like resize, normalization.
Here our labels are keyboard actions like "UP", "DOWN" "RIGHT". To reach this actions:
First of all I need to find file names, this line gives me that:

filename = os.path.basename(img)

OUTPUT: "down_022f78bc-435f-4978-8524-ff1ea1a40d9a_1.png"

Then I'll use split method and separate them as "_", after that the first index will be our label.

label = filename.split("_")[0]

OUTPUT: "down"

Also, you can check the record_screen function to how to save the images in the first section.
After determined labels, I'll resize and normalization images

im = np.array(Image.open(img).convert("L").resize((width, height)))

Here is the all collectively steps what I told above;

X = [] # images ("cactus", "bird")
y = [] # labels ("up", "right", "down")

for img in imgs:

    filename = os.path.basename(img)
    label = filename.split("_")[0] # up, down, right
    im = np.array(Image.open(img).convert("L").resize((width, height)))
    im = im/255 # normalization
    X.append(im)
    y.append(label)

For slipt data as train and test, I must convert them to array

X = np.array(X)
X = X.reshape(X.shape[0], width, height, 1)

Now, I will apply firstly Label Encoding and then One Hot Encoding to Y (labels).
Thus, firstly labels will be numeric for example;
UP --> 0
DOWN --> 1
RIGHT --> 2
After Label Encoding I will apply One Hot Encoding
Thus, labels will be Binary Value, for example,
0 --> 000
1 --> 010
2 -->001
In order to do that, I define a function:

def one_hot_labels(values):

    # Label Encoding -> One Hot Encoding

    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(values)
    integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)

    onehot_encoder = OneHotEncoder(sparse=False)
    onehot_encoded = onehot_encoder.fit_transform(integer_encoded)

    return onehot_encoded

# One Hote Encoding
Y = one_hot_labels(y)

Using X and Y, I will split my data in 0.25 rate, the part of %75 will be train, other %25 will be test size.

# train test split
train_X, test_X, train_y, test_y = train_test_split(X, Y , test_size = 0.25, random_state = 2)

After Split Data, I will create Convolutional Neural Network

# CNN Model
model = Sequential()
model.add(Conv2D(32, kernel_size = (3,3), activation = "relu", input_shape = (width, height, 1)))
model.add(Conv2D(64, kernel_size = (3,3), activation = "relu"))
model.add(MaxPooling2D(pool_size = (2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = "relu"))
model.add(Dropout(0.4))
model.add(Dense(3, activation = "softmax"))

model.compile(loss="categorical_crossentropy",
              optimizer="Adam",
              metrics=["acc"])

# training            
model.fit(train_X, train_y, epochs = 35, batch_size = 64)

After training, I will evaluate Train and Test score

score_train = model.evaluate(train_X, train_y)
print("Training Score: %", score_train[1]*100)

score_test = model.evaluate(test_X, test_y)
print("Test Score: %", score_test[1]*100)

To use trained model in the game, I need to save it.

# save weights
open("trex_model.json","w").write(model.to_json())
model.save_weights("trex_weight.h5")

and finally, I have a Trained CNN Model.

3) Test the Trained Model in Game

In this section, I will test my model in the game as real-time.

For that, first I will import my libraries

from keras.models import model_from_json
import numpy as np
from PIL import Image
import keyboard
import time
import os
from mss import mss

For screen, I'll set up my sizes

mon = {"top":370,
       "left":700,
       "width":200,
       "height":145}

sct = mss()

# size of images
width = 125
height = 50

Now, I can upload my Trained CNN Model

# load model
model = model_from_json(open("trex_model.json", "r").read())
model.load_weights("trex_weight.h5")

I will determine my labels again because the model predicts images, I make it choose in this list according to output of the model. I do not touch the keyboard, the model does.

#down:0, right:1, up:2
labels = ["Down", "Right", "Up"]

framerate_time = time.time()
counter = 0
i = 0
delay = 0.4
key_down_pressed = False
while True:

    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im2 = np.array(im.convert("L").resize((width, height)))
    im2 = im2 / 255  # normalization

    X = np.array([im2])
    X = X.reshape(X.shape[0], width, height, 1)
    r = model.predict(X)

    result = np.argmax(r)

    if result == 0: #down: 0
        keyboard.press(keyboard.KEY_DOWN)
        key_down_pressed = True

    elif result == 2: #up: 2

        if key_down_pressed:
            keyboard.release(keyboard.KEY_DOWN)
            time.sleep(delay)

        keyboard.press(keyboard.KEY_UP)

        if i < 1500:
            time.sleep(0.3)

        elif 1500 < i and i < 5000:
            time.sleep(0.2)

        else:
            time.sleep(0.17)

        keyboard.press(keyboard.KEY_DOWN)
        keyboard.release(keyboard.KEY_DOWN)

    counter += 1

    if (time.time() - framerate_time) > 1:

        counter = 0
        framerate_time = time.time()

        if i <= 1500:
            delay -= 0.003

        else:
            delay -= 0.005

        if delay < 0:
            delay = 0

        print("----------------")
        print(f"Down: {r[0][0]}\nRight: {r[0][1]}\nUp: {r[0][2]}")

        i += 1

Now, I need to explain what this while loop does.

First, I got an area according to the pixel (mon), and then I converted it. After Converting, I applied resize and normalization

    img = sct.grab(mon)
    im = Image.frombytes("RGB", img.size, img.rgb)
    im2 = np.array(im.convert("L").resize((width, height)))
    im2 = im2 / 255  # normalization

After these steps, I turned it array and reshaped it. Because the input is known for model.
The np.argmax() returns the indices of the maximum values along an axis. I use it due to find the label maximum probability.

    X = np.array([im2])
    X = X.reshape(X.shape[0], width, height, 1)
    r = model.predict(X)

    result = np.argmax(r)

If the result is 0 it means DOWN, using keyboard library, model will press DOWN.
And after that, I changed it as key_down_pressed = True

    if result == 0: #down: 0
    keyboard.press(keyboard.KEY_DOWN)
    key_down_pressed = True

If the result is 2 it means UP, again using keyboard library model will press UP
Here, i<1500 is random number. (1500 is frame)

    elif result == 2: #up: 2

        if key_down_pressed:
            keyboard.release(keyboard.KEY_DOWN)
            time.sleep(delay)

        keyboard.press(keyboard.KEY_UP)

        if i < 1500:
            time.sleep(0.3)

        elif 1500 < i and i < 5000:
            time.sleep(0.2)

        else:
            time.sleep(0.17)

After that, the dinosaur downs again but I must release it because, if I don't, the dinosaur keeps to stay down.

        keyboard.press(keyboard.KEY_DOWN)
        keyboard.release(keyboard.KEY_DOWN)

Now here, I set up delay

    counter += 1

    if (time.time() - framerate_time) > 1:

        counter = 0
        framerate_time = time.time()

        if i <= 1500:
            delay -= 0.003 # 3 msec

        else:
            delay -= 0.005 # 5 msec

        if delay < 0:
            delay = 0

        print("----------------")
        print(f"Down: {r[0][0]}\nRight: {r[0][1]}\nUp: {r[0][2]}")

        i += 1

You can find the project link:
https://github.com/ierolsen/Trex-Game-with-CNN

DEV Community

Trex Game with CNN

Trex Game with CNN

Content:

1) Getting Data

2) Train a CNN Model

3) Test the Trained Model in Game

Getting Data

2) Train a CNN Model

3) Test the Trained Model in Game

Top comments (0)

Read next

Resolving psycopg2.errors.InsufficientPrivilege

The 2024 Nobel Prize in Physics: An Achievement for AI - More Career Opportunities

Optimizing OpenAI’s GPT-4o-mini to Detect AI-Generated Text Using DSPy

Speed up CI with uv ⚡