# Big Data in Sports Game Predictions and How It is Used in Sports Gambling

Big data in sports is being used more and more as technology advances and this has a very big impact, especially when it comes to sports gambling. Sports gambling has been around for a while and it is gaining popularity with it being legalized in more places across the world. It is a very lucrative industry and the bookmakers use everything they can to make sure the overall odds are in their favor so they can reduce the risk of paying out to the betters and ensure a steady return. Sports statistics and data is more important than ever for bookmakers to come up with the odds they put out to the public. Odds are no longer just determined by expert analyzers for a specific sport. The compilation of odds uses a lot of historical data about team and player performance and looks at the most intricate details in order to ensure accuracy. Bookmakers spend a lot of money to employ the best statisticians and the best algorithms. There are also many companies that solely focus on sports data analysis, who often work with bookmakers around the world. On the other hand, big data for sports game analysis is also used by gamblers to gain a competitive edge. Many different algorithms have been created by researchers and gamblers to try to beat the bookmakers, some more successful than others. Oftentimes these not only involve examining sports data, but also analysing data from different bookmakers odds in order to determine the best bets to place. Overall, big data is very important in this field and this research paper aims to show the various techniques that are used by different stakeholders.

Status: final, Type: Report

Mansukh Kandhari, fa20-523-331, Edit

Keywords: sports, sportsbook, betting, gambling, data analysis, machine learning, punter(British word for gambler)

## 1. Introduction

Big Data in sports has been used for years by various stakeholders in this industry to do everything from predicting game outcomes to injury prevention. It is also becoming very prevalent in the area of sports gambling. Ever since the Supreme court decision in Murphy v. National Collegiate Athletic Association that overturned a ban on sports betting, the majority of states in the US have passed legislation to allow sports gambling 1. In 2019, the global sports betting market was valued at 85.047 US Dollars so this is an already very big industry that is expanding 2. There are various platforms that allow betting in this industry including tangible sports books, casinos, racetracks, and many online and mobile gambling apps. The interesting thing about big data in sports betting is that it is being used on both sides in this market. It is used by bookmakers to create game models and come up with different spreads and odds, but big data analysis is also being used by gamblers to gain a competetive advantage and place more accurate bets. Various prediction models using machine learning and big data analytics have been created and they can sometimes be very accurate. For example, Google correctly predicted 14 out of the 16 matches in the 2014 world cup and Microsoft did even better by correctly predicting 15 out of the 16 matches during that year 3. Many big companies have spent a lot of time gathering lots of data and creating prediction algorithms, inlcuding ESPN’s Football Power index that gives the probility of one team beating another, Analytics Powerhouse 538 that determines scores of games using their ELO method, and Accuscore which runs Mone Carlo Simulations on worldwide sporting events 4. Bookmakers use all their possible tools and algorithms to put out the best odds that will give them a return. They often employ teams of statisticians that use the most advanced prediction models and information from data analysis companies to come up with their odds. If sports data analysis is vastly being used by people other than bookies and prediction models can often be very accurate, one might wonder how people haven’t made millions off sports betting and how bookmakers are still in business? This report analyzes how big data analytics are used by bookmakers to come up with the odds they put out for games while also examining how it is used on the gamblers side. It aims to analysize various prediction models created by sports betters, researchers, and AI companies, and see how they compare to the way big data is used by bookmakers. Besides giving an analysis of how big data is used in this field, it will show if betting guided by prediction models can give a consistent return.

## 2. Background

Many people with an interest in sports betting prediction have created models, some that involve machine learning and AI. A few of these projects have had some very interesting results using different types of data and analysis techniques. Jordan Bailey created an NBA prediction model based on the over under bets 5. For context, bookmakers will set a point total for a game and bettors can bet on whether the actual score will be over or under the point total set by the book. This type of betting is offered for many types of sports. Using NBA box scores for 5 previous seasons and data on historical betting lines created by various bookmakers, Bailey created a logistical regression model that would return a prediction on if the score was over or under a point total set by a bookmaker. Two models were created, one that predicted if a game would be over a set line and one that predicted if it would be under a set line. The datasets for these models were structured in a way where each specific game was represented as the box scores for the 3 previous games for each team, so 6 previous games 5. When creating the model, the first four seasons were used as the training set and the fifth season was used as the testing data to make predictions on 6. In order to determine the significance of results, Bailey set up a “confidence threshold” of 62 percent for the probability his model returned on a game being over or under. The prediction was “confident” if the probability of the prediction was above the set threshold. When testing the models, the over model predicted 88 games confidently and 52 games correctly, with an accuracy of 59.09 percent. The under model predicted 96 games confidently and correctly predicted 52 games with an accuracy of 54.16 percent 5. To simulate how the model would perform on betting with 10,000 dollars, a bet was made every time the model predicted a confident bet for the 2018 NBA season. The accuracy on the bets were 52.52 percent and the total after the simulation was 11,880 dollars.

## 3. How bookmakes Determine odds

It is no secret that bookmakers use a lot of data and apply various statistical techniques to come up with betting odds. The statistical techniques used and the data that bookmakers look at vary from sport to sport, for example, a popular method for modeling soccer uses the Poisson distribution since it can be very accurate but also because it makes it easy to add time decay to the inputs 7. Big data and data accuracy plays such a big part for bookmakers that many companies in the sports betting market have multi million dollar contracts with big leagues like the NFL 8. The NBA also recently extended their contracts with Sportradar and Genius Sports group that will have the rights to distribute official NBA data to licenced sports betting operators in the United States 9. Companies like Sportradar collect and analyze official data and provide services to various bookmakers. The accuracy of data can be very important, a difference of something as little as one yard can make such a big difference; therefore, the industry values the accuracy of data that the leagues itself can provide. Bookmakers employ various mathematicians to analyze historical sports data to come up with odds; however, sports data isn’t the only thing that bookmakers look at when determining how they will make odds for a game. At the end of the day, the gambling industry is a numbers game that thrives on ensuring the probability is in the houses favor. Bookmakers use various techniques involving big data and factors such as public opinion to do so.

Big data is being used more and more in various industries, and in the gambling industry, it isn’t just used to come up with odds. One major way it is used is by gathering data about user demographics 10. When using an online sportsbook, the casino can gather data about a users age, location, gender, excetera which can provide valuable insights that can be used for product development and marketing purposes. By using user demographic data to provide targeted advertising, casinos and bookmakers can attract more betters which will increase revenue. As said by former odds compiler Matthew Trenhaile, “Their [bookmakers] product is entertainment and not the selling of an intellectual contest between punter and bookmaker. It is foolish to think this has ever been the product that bookmakers have sold.They sell an adrenaline rush and anyone who thinks great characters pitting themselves against the punters and taking anyone on in horse racing betting rings is what betting used to be about is kidding himself or herself.” Due to the probability of making money in sports betting, and really every type of gambling, being in the houses favor, online casinos and sportsbooks use big data to increase the number of bets placed by customers; nevertheless, using models to come up with odds is the heart of this industry which makes it the most important way big data analytics is used by bookmakers.

When odds are being made for a sports book, a lot of things are taken into consideration in the process. Bookmakers have teams of statisticians that analyze historical data of the teams in order to come up with game prediction models, often using machine learning based algorithms11. When bookmakers are actually making the odds, these statistical models created from large amounts of data aren’t the only thing they use. When bookmakers are creating odds, their goal isn’t to come up with an accurate game prediction, it is to have the lowest probability of paying out, so they will add a margin in order to statistically ensure a profit regardless of the outcome. Some times public opinion is used by bookmakers to sway their odds. For example, if a team has been on an unexpected winning streak, the bookmakers will often overestimate their odds, even against a team that will statistically do better than them, since people will be more inclined to take that bet resulting in the bookmaker reducing their probability of paying out 12. Furthermore, bookmakers will often “hedge” bets to cover potential losses if an unexpected outcome occurs. For example, if a large amount of people are betting on a team regardless of the odds, the bookmakers will have a large payout if that outcome occurs, so they will start offering more favorable odds on the opposite outcome so they can bring in bets that would cover their losses 11. At the end of the day, when bookmakers set out their odds they will always make sure they are statistically in their favor. Even though bookmakers heavily analyze sports data in order to come up with prediction models, the odds put out don’t exactly reflect the true probability of a game outcome. Game prediction models is used to come up with the most probable event occurring but bookmakers add a margin that is skews the actual probability in order to statistically ensure a profit 13. An example of a coin toss can show how these margins work 13. If one were to bet on a coin toss, there is a 50 percent change of heads winning and a 50 percent change of tails winning so that means neither side is favored and the market of this bet is 100 percent. As a bookmaker is trying to ensure a profit, they will add a margin to the actual game winning probability in order to mitigate risk and ensure that the odds are in their favor. The margins that the bookmakers put on the actual probability is determined by many factors such as public opinion and perception of a team 14. Gamblers are actually able to calculate the margins that the bookmakers put on a bet using a relatively simple formula. This formula varies for the type of sports the gambler is trying to calculate the odds for. In a two way market such as tennis or basketball, a person can figure out the bookmakers margin using the decimal odds places for both sides 13. This formula is 100(1/decimal odds) + 1000(1/other decimal odds). The amount the market percentage is over 100 is the margin the bookmaker has on the better; therefore, the margin percentage the bookmaker has over the gambler can be determined by subtracting 100 from that formula.

## 5. Poisson Model

One of the most popular models for soccer game predictions is the Poisson distribution model. According to former odds compiler Matthew Trenhaile, the Poisson distribution model for soccer prediction can be very accurate and is very useful since it is easy to add time decay to the inputs 7. Refinements can easily be made as a game progresses and goal input changes to easily re calculate the odds, which is useful for bookmakers. The Poisson distribution for soccer game predictions is not only used on a large scale by bookmakers to calculate odds, but it is also often used by even small time bettors to determine how they will bet. A Poisson model for soccer games can even be created on Excel for betters who want to place their bets more accurately. This process works by using historical data of how many goals a team scored and how many goals they let in and comparing it to a leagues average in order to determine the number of goals each team is likely to score in a game. It starts with calculating the average number of goals scored for home games and away games for the whole league and determining a team’s “attack strength” and “defense strength” 15. The attack strength is a team’s average number of goals per game divided by the league average of goals per game. Similarly, a teams defense strength is determined by dividing a teams average number of goals conceded by the leagues average number of goals conceded. The goal expectancy for the home team is calculated by multiplying the team’s attack strength with the away team’s defense strength and the league’s average number of home goals. The goal expectancy for the away team is calculated by multiplying the away teams attack strength with the home teams defense strength and multiplying it by the leagues average number of away goals 15. With this information, one can determine the probability for the range of goal outcomes on both sides using a formula created by the French mathematician Simeon Denis Poisson. The Poisson distribution indicates the probability of a given number of events occurring over a fixed interval, so it can be used to determine the probability of the number of goals scored in a soccer game. The formula for this for soccer prediction is P(x events in interval) = (e-μ) (μx) / x! . This formula determines the probability of the number of goals being scored (x) using Euler’s number (e) and the goal expectancy (μ). With this formula, we can see the probability each team has for scoring a number of goals in a game. Usually the distribution is done for 0-5 goals to see the percentages of each team scoring on the goal interval. This can be used by bookmakers to determine odds and by gamblers to make well educated bets.

## 6. Algorithms and prediction models

Gamblers often use various models, driven by big data, in order to help them place more accurate bets. Many different models have been created in the sports field that use factors such as historical sports data in order to come up with game prediction models. A lot of people who have come up with good prediction models will not share how they work and sometimes offer a subscription service where eager gamblers can pay to receive game picks. When it comes to making the best sports betting algorithms, it isn’t just about the amount of data a person can acquire. Creating algorithms that predict well takes understanding the sport and the meaning behind each type of data. In regards to creating sports prediction algorithms and the data that goes behind it, Micheal Beuoy, an actuary and popular sports data analyst said, “I think it takes discipline combined with a solid understanding of the sport you’re trying to analyze. If you don’t understand the context behind your numbers, no amount of advanced analysis and complicated algorithms is going to help you make sense of them. You need discipline because it is very easy to lock in on a particular theory and only pay attention to the data points that confirm that theory. A good practice is to always set aside a portion of your data before you start analysing. Once you’ve built what you think is your “best” model or theory, you can then test it against this alternative dataset.” Creating sports prediction algorithms requires a lot of different types of analysis and the methods that yield the best results are always changing.

Creating good models requires understanding the sport well and using specific types of data in the algorithms. A creator of an NBA game prediction algorithm who runs a website called Fast Break Bets, which sells a game prediction service, primarily uses NBA game statistics known as efficiency metrics to come up with his model 16. As the creator of the algorithm is profiting off eager gamblers, the exacts of how it works have not been released but the creator explains the type of data he uses to make his algorithm work. The NBA is a game of efficiency since there is a shot clock and possessions are changed very quickly, so the score total of games can greatly vary by how fast or slow paced a team is. The creator of this algorithm uses an offensive and defensive rating that measures how many points a team scores and allows per 100 possessions, since 100 possessions is close to the NBA average of possessions per game 16. The algorithm also uses effective field goal percentage, turnover rate, and rebounding rate with offensive and defensive rating to optimize the efficiency metrics. Another major factor that this creator uses in the algorithm is the NBA season schedule and how often a team plays games in a time span 16. This is due to the fact that players get fatigued playing games close to each other and coaches will therefore limit the amount of time some of the players will be on the court in order to give them a rest. This is important since player statistics can greatly vary from person to person on a team. Using this information of efficiency metrics, player performance, and the frequency of games played, the creator was able to create a prediction model that works well enough for people to pay for his game picks.

In research done by Manuel Silvero, he studied 5 famous algorithms that used Neural Network and Machine learning and concluded that their accuracy varies from 50-70 percent, depending on the sport 17. Purucker in 1996 was one of the first computational sports prediction model and used an Artificial Neural Network with backward propagation 18. It was 61 percent accurate. In 2003, Khan expanded Puruckers work and was more acurate. Data on 208 matches was collected and the elements used were total yardage differential, rushing yardage differential, turnover differential, away team indicator and home team indicator 18. The first 192 matches of the season were used as the training data set for the model. When tested on the remaining games of the season, the models predicted at a 75 percent accuracy. This was compared to predictions created by 8 ESPN sportscasters for the same games and they only predicted 63 percent of those matches correctly.

One of the most accurate models created, in terms of receiving a good return on a bet, was created by Lisandro Kaunitz of the University of Tokyo, and relied on data from odds that bookmakers put out rather than historical game data 12. When it comes to statistical models for sports, big data from historical sports games are often analyzed in order to come up with game predictions and to gain insight on things like team, player, and position performance. In the market of sports betting, these models are used to come up with odds and also by bettors to place bets. Gamblers have came up with different game prediction models in order to beat the books, mainly comprising of historical sports data while sometimes also using historical betting data 12. Kaunitz created a model that mainly focused on analysing data of the odds created by bookmakers, rather than sports team data, to predict good bets to place. The basis of his model relied on a technique bookmakers use to reduce their payout risk, known as hedging. This concept and the way bookmakers use it to reduce their risk of payout is covered in section 3. Kaunitz betting model worked by gathering the odds for a game created by various bookmakers and determining the average odds available. Using statistical analysis of odds offered, Kaunitz was able to determine any outliers from the average odds for a game 12. Using these outliers, Kaunitz could determine if a bet would favor them or not. After various simulations and models, Kaunitz’s and his team took their strategy into the real world, and their bets payed out 47.2 percent of the time. They received an 8.5 percent return and profited $957.50 in 265 bets 12. Due to their impressive returns, bookmakers caught on and started to limit the amount that they could bet ## 7. Conclusion Overall, big data plays a very important role in the sports betting industry and it is used by various stakeholders. Bookmakers use it to come up with odds and gamblers use it for a competitive advantage. Although data analysis is very important on both ends, this research shows that it is very hard to receive a consistent return as a gambler. From 1989 to 2000 for NFl betting, the bookmakers favorite won 66.7 percent of the time and from 2001 and 2012, the bookmakers favorite won 66.9 percent of the time 19. Even though technology has advanced and people use the most sophisticated algorithms to come up with prediction models, the bookmaker seems to have the advantage. This is due to the fact that bookmakers spend tons of money gathering the most accurate data and employ some of the best statisticians and sports analysing firms, but also due to the way they hedge bets and use public opinion to modify odds in order prevent potential losses. Bookmakers adjust for a margin when they are compiling their odds, because just like everything in the gambling industry, the probability is set up so that the house will always win in the long run. Gamblers have created various algorithms in order to make the most educated sports bet. These use historical team data but sometimes also use data from betting odds. Some of the best betting algorithms work by analyzing bookmakers' odds and determining where the odds are significantly different from the expected outcome of the game. As seen with Lisandro Kaunitz from the University of Tokyo, when bookmakers see that gamblers are beating the system they can start to limit a person’s bets. Overall, big data plays a huge role in the sports gambling industry. Even though what happens on the field or the court is often based on chance, there are significant trends that can be seen when statistically analyzing sports data. At the end of the day, big data plays a big role in this industry for bookmakers and gamblers alike. ## 8. References 1. INSIGHT: Sports Betting in States Races on a Year After SCOTUS Overturns Ban," Bloomberg Law, 04-Jun-2019. [Online]. Available: https://news.bloomberglaw.com/us-law-week/insight-sports-betting-in-states-races-on-a-year-after-scotus-overturns-ban↩︎ 2. Research and Markets, “Global Sports Betting Market Worth$85 Billion in 2019 - Industry Assessment and Forecasts Throughout 2020-2025,” GlobeNewswire News Room, 31-Aug-2020. [Online]. Available: https://www.globenewswire.com/news-release/2020/08/31/2086041/0/en/Global-Sports-Betting-Market-Worth-85-Billion-in-2019-Industry-Assessment-and-Forecasts-Throughout-2020-2025.html↩︎

3. R. Delgado, “How Big Data is Changing the Gambling World: Articles: Chief Data Officer,” Articles | Chief Data Officer | Innovation Enterprise, 01-Sep-2016. [Online]. Available: https://channels.theinnovationenterprise.com/articles/how-big-data-is-changing-the-gambling-world. [Accessed: 29-Nov-2020]. ↩︎

4. J. Zalcman, “HOW TO CREATE A SPORTS BETTING ALGORITHM,” Oddsfactory, 16-Nov-2020. [Online]. Available: https://theoddsfactory.com/how-to-create-a-sports-betting-algorithm/. [Accessed: 2020]. ↩︎

5. J. Bailey, “Applying Data Science to Sports Betting,” Medium, 18-Sep-2018. [Online]. Available: https://medium.com/@jxbailey23/applying-data-science-to-sports-betting-1856ac0b2cab. [Accessed: 2020]. ↩︎

6. J. Bailey, “Jordan-Bailey/DSI_Capstone_Project,” DSI_Capstone_Project, 14-Sep-2018. [Online]. Available: https://github.com/Jordan-Bailey/DSI_Capstone_Project/blob/master/Technical_Report.md. [Accessed: 01-Dec-2020]. ↩︎

7. M. Trenhaile, “How Bookmakers Create their Odds, from a Former Odds Compiler,” Medium, 29-Jun-2017. [Online]. Available: https://medium.com/@TrademateSports/how-bookmakers-create-their-odds-from-a-former-odds-compiler-5b36b4937439. [Accessed: Nov-2020]. ↩︎

8. K. J. Brooks, “The new game in town for pro sports leagues: Selling stats,” CBS News, 09-Jan-2020. [Online]. Available: https://www.cbsnews.com/news/nba-nfl-sports-nascar-leagues-selling-stats-to-gambling-companies/. [Accessed: 2020]. ↩︎

9. C. Murphy, “NBA extends data partnerships with Sportradar and Genius Sports Group,” SBC Americas, 29-Oct-2020. [Online]. Available: https://sbcamericas.com/2020/10/29/nba-extends-data-partnerships-with-sportradar-and-genius-sports-group/. [Accessed: 2020]. ↩︎

10. “How Big Data Analytics Are Transforming the Global Gambling Industry,” Analytics Insight, 17-Jan-2020. [Online]. Available: https://www.analyticsinsight.net/how-big-data-analytics-are-transforming-the-global-gambling-industry/. [Accessed: Oct-2020]. ↩︎

11. arXiv, “The Secret Betting Strategy That Beats Online Bookmakers,” MIT Technology Review, 19-Oct-2017. [Online]. Available: https://www.technologyreview.com/2017/10/19/67760/the-secret-betting-strategy-that-beats-online-bookmakers/. [Accessed: 2020]. ↩︎

12. A. Dörr, “How to apply predictive analytics to Premiership football to beat the bookies,” Dataconomy, 19-Mar-2019. [Online]. Available: https://dataconomy.com/2019/03/how-to-apply-predictive-analytics-to-premiership-football-to-beat-the-bookies%EF%BB%BF/. [Accessed: 2020]. ↩︎

13. S. Hubbard, “Betting Margins Explained: How to Calculate Sports Margins,” BettingLounge, 24-Sep-2020. [Online]. Available: https://bettinglounge.co.uk/guides/sports-betting-explained/betting-margins/. [Accessed: 2020]. ↩︎

14. “Sportsbook Profit Margins,” Sports Insights, 18-Sep-2015. [Online]. Available: https://www.sportsinsights.com/betting-tools/sportsbook-profit-margins/. [Accessed: 2020]. ↩︎

15. B. Cronin, “Poisson Distribution: Predict the score in soccer betting,” Pinnacle, 27-Apr-2017. [Online]. Available: https://www.pinnacle.com/en/betting-articles/Soccer/how-to-calculate-poisson-distribution/MD62MLXUMKMXZ6A8. [Accessed: 2020]. ↩︎

16. Stephen, “NBA Betting Model Explained: Sports Betting Picks, Tips, and Blog,” FAST BREAK BETS, 11-Nov-2017. [Online]. Available: https://www.fastbreakbets.com/nba-picks/nba-betting-model-explained/. [Accessed: 05-Dec-2020]. ↩︎

17. M. Silverio, “My findings on using machine learning for sports betting: Do bookmakers always win?,” Medium, 26-Aug-2020. [Online]. Available: https://towardsdatascience.com/my-findings-on-using-machine-learning-for-sports-betting-do-bookmakers-always-win-6bc8684baa8c. [Accessed: 2020]. ↩︎

18. R. P. Bunker and F. Thabtah, “A machine learning framework for sport result prediction,” Applied Computing and Informatics, 19-Sep-2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2210832717301485. [Accessed: 2020]. ↩︎

19. M. Beouy, “BGO - The Casino of the Future,” bgo Online Casino. [Online]. Available: <https://www.bgo.com/casino-of-the-future/the-future-of-sports-betting/. [Accessed: 05-Dec-2020]>. ↩︎