Context
Kentucky Derby is a horse race that takes place each year in Louisville, Kentucky, United States. I learned about this race from one of my coworkers. He has been doing a side project analyzing past races to predict the outcome of the upcoming horse race. I didn't know anything about horse racing and thought it would be fun to do the same project. I would make predictions and see if they are accurate or not. I would also learn a lesson about how to profit in the real world using DS.
Predictions
As a result of my analysis I found that two candidates are likely to finish in the top 3.
Horse | Trainer | Jockey |
---|---|---|
Charge It | Todd A. Pletcher | Luis Saez |
Simplification | Antonio Sano | Jose Ortiz |
Why?
Mostly because of this.
After running a community detection algorithm on top of the social network of jockey and trainer I found this.
There are clear community clusters in the dataset. After suggestion of my coworker to find out what percentage of the means falls into the cluster it turned out that 50% of all wins that happened from 2015-2021 fall into the red cluster.
After finding out if there are trainers from this winning cluster participate in the current race I found out that there are 2 trainers participating but they have 4 horses so there are 4 choices to choose from. Then I norrowed down my scope to two by finding out which jockeys have won before (in my notebook you can see why it's reasonable to make such an assumption).
Why not to use ML?
I'm not sure that this problem is truly "random". I have an intuition that it runs simillar to an old boy network so by doing a community analysis I might get a better shot at getting an edge.
Why do I think that it's an old boy network? Well, primarily because of Bob Baffert, who has a history of drugging his horses and being disqualified from participating in horse competition until 2023. Coincidently he falls into this red cluster that wins 50% of the time. It might be a coincidence, but knowing about Tour De France scandal and Doping in Olympic games by Russia, I would not be surprised that something like this might exist.
Outcomes
So the results of the horse race came in, none of my predictions made into the top 3. Simplification came in close but Charge It has came in 8. So the predictions weren't as good as I thought.
Finished | Horse | Trainer | Jockey |
---|---|---|---|
8 | Charge It | Todd A. Pletcher | Luis Saez |
4 | Simplification | Antonio Sano | Jose Ortiz |
The winners were:
Finished | Horse | Trainer | Jockey |
---|---|---|---|
1 | Rich Strike | Eric Reed | Sonny Leon |
2 | Epicenter | Steve Asmussen | Joel Rosario |
3 | Zandon | Chad C. Brown | Flavien Prat |
Interesting to see that at least one of the trainers that appeared in our winning cluster has made it into top 3, so we weren't too far out with our analysis after all.
Top comments (0)