Big Data Applications in the Gaming Industry

Gaming is one of the fastest growing aspects of the modern entertainment industry. It’s a rapidly evolving market, where trends can change in a near instant, meaning that companies need to be ready for near anything when making decisions that may impact development times, targets and milestones. Companies need to be able to see market trends as they happen, not post factum, which frequently means predicting things based off of freshly incoming data. Big data is also used for development of the games themselves, allowing for new experiences and capabilities. It’s a relatively new use for big data, but as AI capabilities in games are developed further this is becoming a very important method of providing more immersive experiences. Last use case that will be talked about, is monetization in games, as big data has also found a use there as well.

Check Report Status Status: final, Type: Report

Aleksandr Linde, fa20-523-340, Edit


Gaming is one of the fastest growing aspects of the modern entertainment industry. It’s a rapidly evolving market, where trends can change in a near instant, meaning that companies need to be ready for near anything when making decisions that may impact development times, targets and milestones. Companies need to be able to see market trends as they happen, not post factum, which frequently means predicting things based off of freshly incoming data. Big data is also used for development of the games themselves, allowing for new experiences and capabilities. It’s a relatively new use for big data, but as AI capabilities in games are developed further this is becoming a very important method of providing more immersive experiences. Last use case that will be talked about, is monetization in games, as big data has also found a use there as well.


Keywords: gaming, big data, product development, computer science, technology, microtransactions, artifician intelligence

1. Introduction

The video game market is one of the fastest growing aspects of the modern entertainment industry, and in 2020 brought in 92 billion USD worldwide out of a total worldwide entertainment market value income of 199.64 billion USD. With global player count reaching 2.7 billion users, more and more people choose to spend some of their leisure time behind a controller or a keyboard 12. This phenomenon isn’t exactly a new thing. Originally getting its start in 1972 with the release of the Magnavox Odyssey, the first home console with replaceable cartridges. The scale of this achievement was hardly recognized, as back then if you wanted to play something, the only option was arcades. Arcades were a social experience but being able to play the exact same titles, if slightly downgraded, at home was a breakthrough. These first gen consoles were quite clunky and by modern standards unimpressive, yet they were the vital first step for birthing what we now know today as the video game market. At that point games had been a thing for a around a decade, but they were primarily limited to a group of computer hobbyists, who would exchange copies of homebrew software amongst themselves. In this format it would be impossible to get any sort of mainstream popularity. With time, home consoles had changed this dramatically. Games become a mainstream phenomenon, which means that suddenly the potential player pool is a lot larger, and as a result we see an explosion in the popularity of gaming.

1.1 Market Expansion & Segmentation

Over the decades, this market has grown into a massive global phenomenon, becoming one of the primary forms of media alongside film, music, and art. Thanks to all of this we have seen 3 distinct market segments emerge.

  1. Personal Computers – (Laptops and desktops)
  2. Consoles – (Nintendo Switch, 3DS, Xbox, Playstation)
  3. Mobile Phones – (Anything with the Google Play store and IOS Appstore)

Each segment has some interesting specifics. Mobile games account for 33% of all app downloads, 74% of all mobile consumer spending and 10% of all raw time spent in apps. By the end of 2019 the amount of people who played mobile games topped 2.4 billion 3. An important reason for this is accessibility, since mobile phones as of today, are the most commonly bought tech item in the world. In many developing nations, mobile phones are a commonplace piece of tech that is owned by the majority of adults 4.As the global population increases in wealth and size, this trend is only set to increase. Meaning that any company that ignores the mobile phone market is loosing out of massive sums of money. The same is true, but for a lesser scale, in personal computers, meaning that as the population grows, the PC market will expand as well. This is less true for consoles but machines from the last console generation sold a combined 221 million units, with a yearly revenue of 48.7 Billion in 2019 5. Far cry from what mobile games make, but still rather significant, spelling good fortunes for the health of the industry in the coming years. As a result of all this, it is only natural that companies will focus more and more of their attention on the developing world for expansion This is where big data can help significantly,

2. Big data in mobile gaming spaces.

A big reason for the integration of big data analytics into mobile games comes from the intelligence edge that it provides you in a competitive market space. Being able to track Key Performance Indicators, or KPI’s for short allows you to rapidly shift strategies to fix growing problems as soon as you see them. For example, customer retention is one of, if not the most important metric in any gaming product, but or mobile gaming this is especially critical since gaming attention spans are getting shorter, and in a market where attention spans are already low from the get-go, hemorrhaging customers can spell doom for any app 6. Thus, an important big data application is the CLTV – customer lifetime analysis, which tracks customer retention and average return per customer on a specific platform. Machine learning in this case factors in a lot of different variables that effect just how long the user will end up using your app before putting it down to go do something else. Reducing the churn, or the rate at which you intake and expel customers is thus a key business consideration when bringing a new gaming product to market 7. This is more important for mobile game customers since the ROI per user on mobile platforms is generally lower, thus maximizing user count rapidly becomes an extremely important tactic to get a self-sustaining userbase that can bring in stable profits 8. But how are games monetized in the first place, and how does big data play into it?

2.1 Mobile game monetizaton

Most mobile games operate on the freemium model, where the base game is free, but you pay for things such as boosts, upgrades and cosmetics 9. Systems such as this still allow you to earn most things in the game normally, yet make it prohibitively difficult to do so, requiring a large time investment. What happens often in this case is that players will spend money to save time and effort that would have normally gone into grinding out these items for free [9]. So why is this effective? Because the initial entry barrier is quite literally nonexistent and the main monetization fees don’t actually cost that much at first, making up mostly 5-10$ purchases10. This doesn’t seem like much, but this eventually reaches into the sunk cost fallacy where users get so ingrained in an ecosystem and simply don’t want to leave. Mobile monetization platform Tapjoy recently used big data analysis to identify 5 different categories of mobile users 11. The one that interests us the most is the whales. Whales are called as such due to how despite making up only 10% of a game’s population, they will usually be responsible for 70% of the cash flow from games. Many developers work specifically to design systems that aren’t exactly fun but work more to trap players like this within the ecosystem. While we may not agree with this rather predatory practice, its another big data application to be cognizant of. It works off the same principle that gambling does, via enticing potential rewards that don’t actually have publicly avaliable information about your actual chance to win. When the positive effect happens, it’s usually minor, but still works like a Skinner box trigger where the user gets the positive feedback that further keeps them in the churn loop 12. Is it scummy? Many players think so but using big data in this case allows us to hyper target these users. Big data analytics generate user reports that show directly what users are most susceptible, how to reach them and most importantly how to stick them into the endless loop where they keep dumping more and more money into a game that many don’t really even love anymore 13 . For companies this is great, since it earns you a customer who is guaranteed to stay and dump money in a marketplace where getting the average user to pay even 5$ for anything is already a feat that many cant handle. Using big data lets developers and companies be 10 steps ahead of potential users. By the time they realize they are addicted its far too late. Similar systems are making their way into our desktop and console spaces as well.

2.2 Profits from Monetization.

In fact, microtransactions that function this way can now be found on every platform and genre, all due to how insanely profitable it is. This year, gaming industry valuations rose 30% because of the absurd amounts of money that get pulled in via microtransactions 14. All of this, possible only due to the massively increasing use of big data analytical tools. Why make a good game, when you can hyper tailor the monetization so that players are guaranteed to stop caring about how good your game actually is when they get sufficiently sucked in enough. This trend is only increasing since its predicted that by 2023, 99% of all game downloads will be free to play with various forms of microtransaction based monetization 14. However, its not all doom and gloom. Big data can also be used for some other really interesting applications when it comes to developing the game itself, and not just the predatory monetization methodology.

3. Big Data and AI Development on In-Game AI Systems

Last summer, the world of online poker had quite the shock when a machine learning algorithm beat 4 seasoned poker vets 15. But poker is actually a fairly simple game, so why is this important? It’s a big deal due to how most games feature AI in one form or another. When one plays strategy games, they can usually immediately tell if they are playing versus an AI or vs an actual player due to how current AI tech still isn’t nearly as good as a regular player. However, in simple games like poker, you can make systems that almost perfectly copy human behavior. All that is required is that they are trained on the users of the game, and then become nearly indistinguishable from a regular player. This opens up massive new possibilities, because soon we will be able to mimic whole players so that even games that are functionally dead due to lower player counts can still have users enjoy content that was made for multiplayer and such. Plus, just having smarter AI for non-player characters and enemies would be a nice touch. Currently most games that have AI opponents function on a system that is called a Finite State Machine 16. Systems like this have a strict instruction set that they can’t really deviate from, nor make new strategies.This causes everythign to feel scripted and dumb, yet this is also a fairly lightweight method of ai control. FSM can be refined into what is called a Monte Carlo Search Tree (MCST) algorithm, where computers will on their own, make decision trees based on the reward value of the tree endpoint. MCST’s can become massive, so in order to cut down on the sheer amount of processing power that is needed, developers will make the AI randomly select a few paths, which will then in turn be implemented as actions. This cut down version adds the randomness that players expect from other players, but also removes a lot of rigidity of traditional AI systems 16. Using machine learning models on real player actions allows MCST models to be really close to how an actual player acts in specific situations. However, training machine learning models on players also has another, really interesting purpose – dual AI systems.

3.1. AI implementation

Some games, feature AI that is divided into two separate algorithms, the director, and the controller. Director AI, has only one objective, make the game experience as enjoyable as possible. It is a macro level passive AI that bases game triggers and events off of player action. For example, causing random noises when it detects player stress via their control inputs17. This means that the system can detect whether it needs to up the ante on what is actually happening in game, introduce new enemies, change environmental effects etc. This molds the experience into a completely unique system that learns from every player that has ever played the game. All of this data goes into crafting emerging experiences that can’t be replicated via a rigid AI system. But this is only one piece of the puzzle as the other component of this system is the controller AI. Controller AI is the sidekick that the director AI uses to help immerse the player. We will explain how it works in detail, but what is important is that the controller AI is ultimately subordinate to the director. In a normal AI system, the algorithm knows everything, can see everything and will pretend it has no idea about what the player is doing, yet is ultimately aware of their actions. This system is the easiest to make, but also can seem to players like the AI is cheating or being exceedingly stupid at times. Let’s use the example of an alien hunting down a player. Normally, the AI would just head to the players location and just see them once they enter detection range. What two tier systems do is limit the information flow to the controller AI from the director AI. The director sees all, but it limits what the controller can visualize. Instead of saying, Go to area, find player in location, attack player, what the controller gets as input is – Go to area and look for player. The director ai can set the area as anything. But what matters is that the end alien cannot actually see everything that’s happening. What it does is learn from the player in previous encounters. Early on in the game, it might just do a cursory pass around an area and leave, while later on, it might have found the player in a specific location and thus will now check for them in places like it. The alien learns as it goes, and the beauty of this system is that if you select a higher difficulty, then the game can just draw upon cloud data to fill in that early learning steps, dropping in smarter, more intelligent enemies much earlier in the game 1617. This is only possible via big data analysis. Any other solution would mean either massive amounts of very rigid code, or very blatant information cheating. Best example of this is Alien: Isolation a 2014 title by Creative Assembly, that impressed the gaming world with just how scary the AI implementation was. In our opinion this is the best example of a two tier system so far. Systems like this are only possible with big data implementation and the proper ML algorithms. As games advance to more and more realistic worlds, this approach, in our opinion is going to become the norm, since it allows for really flexible systems that are engaging to play around with. However, this isn’t the only way developers can make systems fairer and fun for the end user. Another area where big data is gaining lots of traction is level design.

4. Level Design and Balance, An Unlikely Big Data Application

When crafting virtual worlds its always difficult to strike a balance between breadth and scope. A common approach is to hand craft levels in games that do not require that much breadth. This approach has great benefits in terms of the attention to detail and depth of narrative, yet crafting each level by hand takes a while. Big data in this case helps with testing, since previously playtesting had players run the map time after time. Now, you can run hundreds of models in addition to the players, compare their paths, and find the best solution for what would increase player enjoyment. However, the main drawback of this approach is its great cost for the developer. Much moreso than just using a script to autogenerate terrain. Some developers will focus entirely on auto-generation as a means of level design instead, which can provide near infinite possibilities for content (see Minecraft), but has the danger of being exceedingly bland to the end user. Big data analytics are a way of alleviating this via looking at what terrain players prefer more, and thus adjusting level generation parameters accordingly. Thus, removing some of the randomness that such worlds depend on. However, this isn’t the main use of such analytics. A big reason why developers collect massive amounts of data about levels is actually balance issues. This is especially true for Esports titles that dominate the current games markets. Titles where minute advantages in map design get exploited to their fullest extent 18. Using data collected from tens of thousands of matches, we can see what paths are most taken, what are optimal firing angles (most esports titles are shooters) and any detail that can give players a leg up over the competition. This allows maps to be fine tuned to produce the most memorable player experiences, feel hand crafted and also be as balanced as possible.

4.1. Balance

Balance isn’t just a competitive thing however, multiplayer games live and die due to balance, as unbalanced gameplay drives away players, leaving only people who are ok with it, which in turn drives away new players 19. Ultimately this increases player churn, and you end up with a dead game 20. Normally you would just use player feedback, but if you have mountains of raw data, ML and big data can also allow you to really fine tune specific aspects of balance. This isn’t only limited to maps, but also skills, abilities and puzzles. Balance has always been a really fine line between enjoyment and fairness yet it’s the developer’s job to ensure that no one is left out in the cold on purpose. Some genre’s are more balance intensive and thus require more data to make things fair even when taking into randomness into action.

A great example are MOBAs, Multiplayer Online Battle Arenas, where being down even 1 player can cause a team to get crushed in a 4v5 matchup. A single person leaving completely changes the dynamic of the game, and without big data its hard to compensate for events like that, since they are at their core, massively unbalanced. But being able to account for literally any situation, allows developers to craft systems that can handle disbalance better in these cases 21. Anything that includes player vs player is very difficult to balance for as well. A good use for big data in this case would be truly fair matchmaking. Currently most systems of this sort use only things like win rate and total playtime, in order to pair up players. However, if we start watching for minute flags on how well players actually play, we can make systems that are specifically made to balance players as much as possible by looking at the minute details of how they actually play.

5. What the future holds

One thing we can definitely count on is the sheer amount of data that is going to be generated from now on by gaming. With always online experiences being the norm, machines send data reports back to the main system which matches them with other players in the same instance to allow playing in the same world. As more and more players enter the community, we will run into the issue of where an absurdly massive amount of data is being generated, and not all of it is positive. Its going to get harder to get the full picture since now looking over all the data would make the task of sifting through it extremely difficult. What is most exciting is the developments of self-learning AI based on the mountains of this newfound data. Currently unless you use two tier AI systems, there isn’t a way to make AI believable, and having bad AI can potentially ruin a game for most players. With more advancements in reliable AI systems based on potent data analysis, there exists the potential for AI that is near indistinguishable from players 22.

6. Conclusion

Gaming has gone from a very niche and simple pastime to one of the biggest entertainment markets in the world. From mobile phones, to supercharged desktop setups, the field is expansive to the point where there is something for anyone who wants to have blow a couple of hours behind a screen. The speed at which the field is developing is truly astonishing, and a good portion of it is being driven by the previously discussed Big Data use cases. By being able to use all this data for developmental purposes, game companies and publishers can craft truly memorable and interesting experiences.It has been demonstrated that big data analytics is having some very profound effects on the video game markets from pretty much every side. The end goal of this developemnt is systems that can predict nearly any game situation and adjuist parameters accordingly to maximise player enjoyment. It means, more advanced AI systems that feel as lifelike as humanly possible that can populate virtual worlds. It means systems that can properly react to player inputs in order to create dynamic worlds that aren’t just a static canvas that a player is thrust into upon booting up the game. This may seem far off, especially with how a lot of development teams dont exactly use Big Data in the right way, but the best examples are getting quite close in its implementation.


I would like to thank Dr Gregor von Laszewski for helping me despite me taking a long time with quite a few assignments. I want to thank the AI team for their massive help with any issues that arose in during the class period, as well as the excellent avaliablity of their help whenever it was needed. Would like to thank Dr. Geoffery Fox for his very informative and thorough class.

7. Refernces

  1. The Average Gamer: How the Demographics Have Shifted. (n.d.). Retrieved December 08, 2020, from ↩︎

  2. Nakamura, Y. (2019, January 23). Peak Video Game? Retrieved December 08, 2020, from ↩︎

  3. Kaplan, O. (2019, August 22). Mobile gaming is a $68.5 billion global business, and investors are buying in. Retrieved December 08, 2020, from ↩︎

  4. Silver, L., Smith, A., Johnson, C., Jiang, J., Anderson, M., & Rainie, L. (2020, August 25). 1. Use of smartphones and social media is common across most emerging economies. Retrieved December 08, 2020, from ↩︎

  5. Ali, A. (2020, November 10). The State of the Multi-Billion Dollar Console Gaming Market. Retrieved December 08, 2020, from ↩︎

  6. Filippo, A. (2019, December 17). Our attention spans are changing, and so must game design. Retrieved December 08, 2020, from ↩︎

  7. Addepto. (2019, March 07). Benefits of Big Data Analytics in the Mobile Gaming Industry. Retrieved December 08, 2020, from ↩︎

  8. Rands, D., & Rands, K. (2018, January 26). How big data is disrupting the gaming industry. Retrieved December 08, 2020, from ↩︎

  9. Matrofailo, I. (2015, December 21). Retention and LTV as Core Metrics to Measure Mobile Game Performance. Retrieved December 08, 2020, from ↩︎

  10. Batt, S. (2018, October 04). What Is a “Whale” In Mobile Gaming? Retrieved December 08, 2020, from ↩︎

  11. Shaul, B. (2016, March 01). Infographic: ‘Whales’ Account for 70% of In-App Purchase Revenue. Retrieved December 08, 2020, from ↩︎

  12. Perez, D. (2012, January 13). Skinner’s Box and Video Games: How to Create Addictive Games - LevelSkip - Video Games. Retrieved December 08, 2020, from ↩︎

  13. Muench Frederickm (2014, March 18), The New Skinner Box: We and Mobile Analytics, December 7th 2020, ↩︎

  14. Gardner, M. (2020, September 19). Report: Gaming Industry Value To Rise 30%–With Thanks To Microtransactions. Retrieved December 08, 2020, from ↩︎

  15. Gardner, M. (2020, June 11). What’s The Future Of Gaming? Industry Professors Tell Us What To Expect. Retrieved December 08, 2020, from ↩︎

  16. Maass, L. (2019, July 01). Artificial Intelligence in Video Games. Retrieved December 08, 2020, from ↩︎

  17. Burford, G. (2016, April 26). Alien Isolation’s Artificial Intelligence Was Good…Too Good. Retrieved December 08, 2020, from ↩︎

  18. Ozyazgan, E. (2019, December 14). The Data Science Boom in Esports. Retrieved December 08, 2020, from ↩︎

  19. Cormack, L. (2018, June 29). Balancing game data with player data - DR Studios/505 Games. Retrieved December 08, 2020, from ↩︎

  20. Sergeev, A. (2019, July 15). Analytics of Map Design: Use Big Data to Build Levels. Retrieved December 08, 2020, from ↩︎

  21. Site Admin. (2017, March 18). Retrieved December 08, 2020, from ↩︎

  22. Is AI in Video Games the Future of Gaming? (2020, November 21). Retrieved December 08, 2020, from ↩︎