Introduction
AWS DeepRacer is an intriguing concept. It's a miniature autonomous vehicle that offers an introduction to the world of reinforcement learning, a branch of machine learning. You begin by training your own model in a virtual sandbox, tinkering with reward functions and hyperparameters. The real excitement comes with the DeepRacer League - an international competition where your model is tested. A blend of competition and learning, the DeepRacer serves as a unique, hands-on path into AI.
The issue with DeepRacer is the cost, it involves a lot of trial and error and naturally nobody wants to share too much specific information as that could make the competition more difficult for them!
Therefore I thought I would try some experiments, training on EC2 instances which train faster and at a reduced cost to the console. I luckily have credits to use so it comes at no actual cost.
Experiments
All the below were ran on the A to Z Speedway track (reInvent2019_wide_cw) in a clockwise direction. A world record pace for this track is around 7-8 seconds.
Experiment 1 - Pursuit Function and High Top Speed
Reward Function
def reward_function(params):
if params["all_wheels_on_track"] and params["steps"] > 0:
reward = ((params["progress"] / params["steps"]) * 100) + (params["speed"]**2)
else:
reward = 0.01
return float(reward)
Hyperparameters
Hyperparameter | Value |
---|---|
Entropy | 0.01 |
Gradient descent batch size | 128 |
Learning rate | 0.0003 |
Discount factor | 0.995 |
Loss type | huber |
Number of experience episodes between each policy-updating iteration | 25 |
Number of epochs | 10 |
Action Space
Type | Continuous |
Speed | 1.1 : 4 |
Steering angle | -30 : 30 |
Training Time
Ran for 3 hours, but on a large server so not equivalent to using the Deep Racer console.
Results
Final Evaluation Fastest Lap | 10.597 |
Final Evaluation Fastest Lap Off-track Number | 1 |
Final Evaluation Laps | 10.597, 14.401, 16.068 |
Final Evaluation Total Off-track | 3 |
Experiment 2 - Pursuit Function and Medium Top Speed
A brand new model, with all the same as above but the action space has a smaller top speed of 3 to see if that makes the car more stable and quicker at learning with less chance of coming off-track.
Action Space
Type | Continuous |
Speed | 1.1 : 3 |
Steering angle | -30 : 30 |
Training Time
Ran for 3 hours again.
Results
Final Evaluation Fastest Lap | 10.000 |
Final Evaluation Fastest Lap Off-track Number | 0 |
Final Evaluation Laps | 10.170, 10.000, 11.398 |
Final Evaluation Total Off-track | 0 |
Experiment 3 - Pushing the top speed
A clone of Experiment 2, meaning it is built on top of the model, rather than from scratch. Configuration was the same as above but the action space has a slightly faster top speed of 3.5 to see if that makes the car quicker but hopefully stays stable.
Action Space
Type | Continuous |
Speed | 1.1 : 3.5 |
Steering angle | -30 : 30 |
Training Time
Ran for 1 hour.
Results
Final Evaluation Fastest Lap | 09.257 |
Final Evaluation Fastest Lap Off-track Number | 0 |
Final Evaluation Laps | 09.257, 09.730, 10.730 |
Final Evaluation Total Off-track | 0 |
Conclusion
Training with a maximum of 3ms was a much healthier training session - it was learning right until the end, evaluating at 100% completion and started to level off around 8k reward, whereas the attempt with a maximum speed of 4ms struggled to get more than 5k reward and wasn't managing to finish a lap during training or evaluation.
Overall this isn't too surprising because the Reward Function rewards going as fast as possible, so it'll always be trying to go at it's top speed, and if that speed is too high then it'll spin out a lot. The issue is that training too slow means it might be consistent but can it then be trained quicker later on so it can finish with a strong fast result? The numbers baked into the Neural Network might be too low to ever be useful - it's potentially learned bad behaviours!
The third experiment showed this not to be the case though, after an hour of training Experiment 2 again but with a slightly faster top speed it managed to train in a healthy way and decrease the lap time without coming off the track during evaluation. When racing against a community circuit it would leave the track (only just) once per 3 lap race around 2/3 of the time though.
Top comments (0)