Big Data 2020

This course introduces the students to Cloud Big Data Applications. The notes are prepared for the course taught in 2020.

This is an image

Class Material

As part of this class, we will be using a variety of sources. To simplify the presentation we provide them in a variety of smaller packaged material including books, lecture notes, slides, presentations and code.

Note: We will regularly update the course material, so please always download the newest version. Some browsers try to be fancy and cache previous page visits. So please make sure to refresh the page.

We will use the following material:

Course Lectures and Management

Course Lectures Course Lectures. These meeting notes are updated weekly (Web)

Lectures on Particular Topics

Introduction to AI-Driven Digital Transformation

Introduction to AI-Driven Digital Transformation (Web) Introduction to AI-Driven Digital Transformation (Web)

Big Data Usecases Survey

Big Data Usecases Survey This module covers 51 usecases of Big data that emerged from a NIST (National Institute for Standards and Technology) study of Big data. We cover the NIST Big Data Public Working Group (NBD-PWG) Process and summarizes the work of five subgroups: Definitions and Taxonomies Subgroup, Reference Architecture Subgroup, Security and Privacy Subgroup, Technology Roadmap Subgroup and the Requirements andUse Case Subgroup. 51 use cases collected in this process are briefly discussed with a classification of the source of parallelism and the high and low level computational structure. We describe the key features of this classification.

Introduction to Google Colab

A Gentle Introduction to Google Colab (Web) A Gentle Introduction to Google Colab (Web)
A Gentle Introduction to Python on Google Colab (Web) A Gentle Introduction to Python on Google Colab (Web)
MNIST Classification on Google Colab (Web) MNIST Classification on Google Colab (Web)



Physics Big Data Applications and Analytics Discovery of Higgs Boson Part I (Unit 8) Section Units 9-11 Summary: This section starts by describing the LHC accelerator at CERN and evidence found by the experiments suggesting existence of a Higgs Boson. The huge number of authors on a paper, remarks on histograms and Feynman diagrams is followed by an accelerator picture gallery. The next unit is devoted to Python experiments looking at histograms of Higgs Boson production with various forms of shape of signal and various background and with various event totals. Then random variables and some simple principles of statistics are introduced with explanation as to why they are relevant to Physics counting experiments. The unit introduces Gaussian (normal) distributions and explains why they seen so often in natural phenomena. Several Python illustrations are given. Random Numbers with their Generators and Seeds lead to a discussion of Binomial and Poisson Distribution. Monte-Carlo and accept-reject methods. The Central Limit Theorem concludes discussion.


Sports Sports sees significant growth in analytics with pervasive statistics shifting to more sophisticated measures. We start with baseball as game is built around segments dominated by individuals where detailed (video/image) achievement measures including PITCHf/x and FIELDf/x are moving field into big data arena. There are interesting relationships between the economics of sports and big data analytics. We look at Wearables and consumer sports/recreation. The importance of spatial visualization is discussed. We look at other Sports: Soccer, Olympics, NFL Football, Basketball, Tennis and Horse Racing.

Health and Medicine

Sports Health and Medicine sector has become a much more needed service than ever. With the uprising of the Covid-19, resource usage, monitoring, research on anti-virals and many more challenging tasks were on the shoulders of scientists. To face such challenges, AI can become a worthy partner in solving some of the related problems efficiently and effectively.

AI in Banking

AI in Banking AI in banking has become a vital component in providing best services to the peopel. AI provides securing bank transactions, providing suggestions and many other services for the clients. And legacy banking systems are also being reinforced with novel AI techniques to migrate business models with technology.

Transportation Systems

Transportation Systems Transportation systems is a vital component in human life. With the dawn of AI, transportation systems are also reinforced to provide better service for the people. Analyzing tera-bytes of data collected in day-to-day transportation activities are used to analyze issues and provide a better experience for the user.

Space and Energy

Space and Energy Energy is a term we find in everyday life. Conserving energy and smart usage is vital in managing energy demands. Here the role played by AI has become significant in recent years. Many efforts have been taken by industry leaders like Bill Gates to provide better solutions for efficient energy consumption. Apart from that Space explorations are also being reinforced with AI. Better communication, remote sensing, data analysis have become key components in succeeding the challenge to unravel the mysteries in the universe.

Mobility (Industry)

Mobility (Industry) Mobility is a key part in everyday life. From the personal car to space exploring rockets, there are many places that can be enhanced by using AI. Autonomous vehicles and sensing features provide safety and efficiency. Many motorcar companies have already moved towards AI to power the vehicles and provide new features for the drivers.

Cloud Computing

Cloud Computing Cloud computing is a major component of Today's service infrastructures. Artificial intelligence, micro-services, storage, virtualization and parallel computing are some of the key aspects of cloud computing.


Commerce Commerce is a field which is reinforced with AI and technologies to provide a better service to the clients. Amazon is one of the leading companies in e-commerce. The recommendation engines play a major role in e-commerce.

Complementary Material

  • When working with books, ePubs typically display better than PDF. For ePub, we recommend using iBooks on macOS and calibre on all other systems.


Piazza Piazza. The link for all those that participate in the IU class to its class Piazza.

Scientific Writing with Markdown

Markdown Scientific Writing with Markdown (ePub) (PDF)

Git Pull Request

Git Pull Request Git Pull Request. Here you will learn how to do a simple git pull request either via the GitHub GUI or the git command line tools

Introduction to Linux

This course does not require you to do much Linux. However, if you do need it, we recommend the following as starting point listed

The most elementary Linux features can be learned in 12 hours. This includes bash, editor, directory structure, managing files. Under Windows, we recommend using gitbash, a terminal with all the commands built-in that you would need for elementary work.

Linux Introduction to Linux (ePub) (PDF)

Older Course Material

Older versions of the material are available at

Lecture Notes 2020 Lecture Notes 2020 (ePub) (PDF)
Big Data Applications (Nov. 2019) Big Data Applications (Nov. 2019) (ePub) (PDF)
Big Data Applications (2018) Big Data Applications (2018) (ePub) (PDF)


You can contribute to the material with useful links and sections that you find. Just make sure that you do not plagiarize when making contributions. Please review our guide on plagiarism.

Computer Needs

This course does not require a sophisticated computer. Most of the things can be done remotely. Even a Raspberry Pi with 4 or 8GB could be used as a terminal to log into remote computers. This will cost you between $50 - $100 dependent on which version and equipment. However, we will not teach you how to use or set up a Pi or another computer in this class. This is for you to do and find out.

In case you need to buy a new computer for school, make sure the computer is upgradable to 16GB of main memory. We do no longer recommend using HDD’s but use SSDs. Buy the fast ones, as not every SSD is the same. Samsung is offering some under the EVO Pro branding. Get as much memory as you can effort. Also, make sure you back up your work regularly. Either in online storage such as Google, or an external drive.

Last modified June 16, 2021 : merge content (ecf6e8f7)