Predicting Outcomes of Horse Racing using Machine Learning

Main Article Content

Meenakshi Gupta
Latika Singh

Abstract

Machine learning with its vast framework is making its way into every aspect of modern society. The segment of betting sports particularly horse racing calls for the attention from a large spectrum of research community owing to its value to the stakeholders and the amount of money involved. Horse racing prediction is a complex problem as there are a large number of influencing variables. The present study aims to contribute in this domain by training machine learning algorithms for predicting horse racing results or outcomes. For this, data for a whole racing season from 2017 to 2019 of races conducted by Turf Club of India was considered which amounts to over 14,700 races.  Six algorithms namely Logistic Regression, Random Forest, Naive Bayes, and k-Nearest Neighbors) k-NN were used to predict the winning horse for each race. Synthetic Minority Oversampling Technique (SMOTE) technique was applied to the imbalanced horse racing data set and the attributes of the horse race repository were analyzed. The results were compared with other sampling methods to evaluate the relative effectiveness of this method. The proposed framework is able to give an accuracy of 97.6% which is substantially higher when compared to other similar studies. The research can be beneficial to the stakeholders as well as researchers in the same area to do further analysis and experiments.

Article Details

How to Cite
Gupta, M. ., & Singh, L. . (2023). Predicting Outcomes of Horse Racing using Machine Learning. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 38–47. https://doi.org/10.17762/ijritcc.v11i9.8119
Section
Articles

References

“Grand National prize money set at £1m as large crowds expected at Aintree.” https://sbcnews.co.uk/retail/2022/01/13/grand-national-prize-money-set-at-1m-as-large-crowds-expected-at-aintree/ (accessed Jan. 25, 2023).

D. Lange, “• Horse racing track market value US 2021 | Statista.” https://www.statista.com/statistics/1017245/us-horse-racing-tracks-market-size/ (accessed Feb. 19, 2022).

“Global Sports Betting Market Size & Growth Report, 2030.” https://www.grandviewresearch.com/industry-analysis/sports-betting-market-report# (accessed Jan. 25, 2023).

N. M. Allinson and D. Merritt, “Successful prediction of horse racing results using a neural network,” in IEE Colloquium on Neural Networks: Design Techniques and Tools, 1991, pp. 1–4.

L. C. Hei, C. L. Wai, and S. B. P. M. R. Lyu, “Research in Collective Intelligence through Horse Racing in Hong Kong”.

I. L. Tom Hope, Yehezkel Resheff, “Learning Tensorflow,” J. Chem. Inf. Model., vol. 53, no. 9, pp. 1689–1699, 2013, Accessed: Jan. 25, 2023. [Online]. Available: https://www.oreilly.com/library/view/learning-tensorflow/9781491978504/

R. P. Schumaker and J. W. Johnson, “An investigation of svm regression to predict longshot greyhound races,” Commun. IIMA, vol. 8, no. 2, p. 7, 2008.

J. Williams and Y. Li, “A case study using neural networks algorithms: horse racing predictions in Jamaica,” in Proceedings of the International Conference on Artificial Intelligence (ICAI 2008), 2008, pp. 16–22.

E. Davoodi and A. R. Khanteymoori, “Horse racing prediction using artificial neural networks,” Recent Adv. Neural Networks, Fuzzy Syst. Evol. Comput., vol. 2010, pp. 155–160, 2010.

N. Silverman, “A hierarchical bayesian analysis of horse racing,” J. Predict. Mark., vol. 6, no. 3, pp. 1–13, 2012.

S. Pudaruth, N. Medard, and Z. B. Dookhun, “Horse Racing Prediction at the Champ De Mars using a Weighted Probabilistic Approach,” Int. J. Comput. Appl., vol. 72, no. 5, 2013.

N. Silverman and M. Suchard, “Predicting horse race winners through a regularized conditional logistic regression with frailty,” J. Predict. Mark., vol. 7, no. 1, pp. 43–52, 2013.

T. Takahashi, “The effect of age on the racing speed of Thoroughbred racehorses,” J. equine Sci., vol. 26, no. 2, pp. 43–48, 2015.

R. P. Bunker and F. Thabtah, “A machine learning framework for sport result prediction,” Appl. Comput. Informatics, vol. 15, no. 1, pp. 27–33, Jan. 2019, doi: 10.1016/j.aci.2017.09.005.

R. P. Schumaker, “Machine Learning the Harness Track: A Temporal Investigation of Race History on Prediction,” J. Int. Technol. Inf. Manag., vol. 27, no. 2, pp. 2–24, 2018.

“The Best Indian Horse Racing information site for live Results, Live Odds,form guide, selection, Videos, Photos, Reviews.” https://www.inhorseracing.com/blog (accessed Jan. 25, 2023).

A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, “A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE,” Int. J. Comput. Intell. Syst., vol. 12, no. 2, p. 1412, 2019.

S. Mishra, P. K. Mallick, L. Jena, and G.-S. Chae, “Optimization of skewed data using sampling-based preprocessing approach,” Front. Public Heal., vol. 8, p. 274, 2020.

“SMOTE — Version 0.10.1.” https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html (accessed Jan. 25, 2023).

S. Shrivastava, P. M. Jeyanthi, and S. Singh, “Failure prediction of Indian Banks using SMOTE, Lasso regression, bagging and boosting,” http://www.editorialmanager.com/cogentecon, vol. 8, no. 1, Jan. 2020, doi: 10.1080/23322039.2020.1729569.

C. Soto Valero, “Predicting win-loss outcomes in MLB regular season games-a comparative study using data mining methods,” Int. J. Comput. Sci. Sport, vol. 15, no. 2, pp. 91–112, 2016, doi: 10.1515/IJCSS-2016-0007.

D. Coomans and D. L. Massart, “Alternative k-nearest neighbour rules in supervised pattern recognition: Part 1. k-Nearest neighbour classification by using alternative voting rules,” Anal. Chim. Acta, vol. 136, pp. 15–27, 1982.

PN Tan, M.Steinbach, A.Karpatne, and V.Kumar, “Introduction to Data Mining.” https://www.pearson.com/en-us/subject-catalog/p/introduction-to-data-mining/P200000003204/9780137506286 (accessed Jan. 25, 2023).

V. Jackins, S. Vimal, M. Kaliappan, and M. Y. Lee, “AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes,” J. Supercomput., vol. 77, no. 5, pp. 5198–5219, 2021.

G. Biau and E. Scornet, “A random forest guided tour,” Test, vol. 25, no. 2, pp. 197–227, 2016.

“Understanding the Types and Classes of Horse Races.” https://www.liveabout.com/understanding-the-types-and-classes-of-horse-races-1880414 (accessed Jan. 25, 2023).

“Why is pedigree important in horse racing? | myracing.” https://myracing.com/guides/guide-to-racing/pedigree-important-horse-racing/ (accessed Jan. 25, 2023).

“Thoroughbred breeding theories - Wikipedia.” https://en.wikipedia.org/wiki/Thoroughbred_breeding_theories (accessed Jan. 25, 2023).

“M.A.M. Ramaswamy: King of the course - India Today.” https://www.indiatoday.in/magazine/sport/story/19830930-mam-ramaswamy-indias-biggest-racehorse-owner-771050-2013-07-17 (accessed Jan. 25, 2023).

Kumar Sharan, “News Horse Racing - Aluminium or steel: What is your shoe? - by Sharan Kumar - Racing India’s first and foremost website on horse racing India.” https://www.racingpulse.in/code/stpageprint.aspx?pgid=36194 (accessed Jan. 25, 2023).

“Colt (horse) - Wikipedia.” https://en.wikipedia.org/wiki/Colt_(horse) (accessed Jan. 25, 2023).

J. Wong, T. Manderson, M. Abrahamowicz, D. L. Buckeridge, and R. Tamblyn, “Can Hyperparameter Tuning Improve the Performance of a Super Learner?: A Case Study,” Epidemiology, vol. 30, no. 4, p. 521, Jul. 2019, doi: 10.1097/EDE.0000000000001027.