DEV Community

Maksim Bober
Maksim Bober

Posted on • Updated on

Predicting outcomes of Kentucky Derby 2022 race

Context

Kentucky Derby is a horse race that takes place each year in Louisville, Kentucky, United States. I learned about this race from one of my coworkers. He has been doing a side project analyzing past races to predict the outcome of the upcoming horse race. I didn't know anything about horse racing and thought it would be fun to do the same project. I would make predictions and see if they are accurate or not. I would also learn a lesson about how to profit in the real world using DS.

Predictions

As a result of my analysis I found that two candidates are likely to finish in the top 3.

Horse Trainer Jockey
Charge It Todd A. Pletcher Luis Saez
Simplification Antonio Sano Jose Ortiz

Why?

Mostly because of this.
After running a community detection algorithm on top of the social network of jockey and trainer I found this.
Image description

There are clear community clusters in the dataset. After suggestion of my coworker to find out what percentage of the means falls into the cluster it turned out that 50% of all wins that happened from 2015-2021 fall into the red cluster.

After finding out if there are trainers from this winning cluster participate in the current race I found out that there are 2 trainers participating but they have 4 horses so there are 4 choices to choose from. Then I norrowed down my scope to two by finding out which jockeys have won before (in my notebook you can see why it's reasonable to make such an assumption).

Why not to use ML?

I'm not sure that this problem is truly "random". I have an intuition that it runs simillar to an old boy network so by doing a community analysis I might get a better shot at getting an edge.

Why do I think that it's an old boy network? Well, primarily because of Bob Baffert, who has a history of drugging his horses and being disqualified from participating in horse competition until 2023. Coincidently he falls into this red cluster that wins 50% of the time. It might be a coincidence, but knowing about Tour De France scandal and Doping in Olympic games by Russia, I would not be surprised that something like this might exist.

Outcomes

So the results of the horse race came in, none of my predictions made into the top 3. Simplification came in close but Charge It has came in 8. So the predictions weren't as good as I thought.

Finished Horse Trainer Jockey
8 Charge It Todd A. Pletcher Luis Saez
4 Simplification Antonio Sano Jose Ortiz

The winners were:

Finished Horse Trainer Jockey
1 Rich Strike Eric Reed Sonny Leon
2 Epicenter Steve Asmussen Joel Rosario
3 Zandon Chad C. Brown Flavien Prat

Interesting to see that at least one of the trainers that appeared in our winning cluster has made it into top 3, so we weren't too far out with our analysis after all.

Resources

Top comments (0)