Big Data Use Cases Survey

4 Lectures on Big Data Use Cases Survey

This unit has four lectures (slide=decks). The survey is 6 years old but the illustrative scope of Big Data Applications is still valid and has no better alternative. The problems and use of clouds has not changed. There has been algorithmic advances (deep earning) in some cases. The lectures are

    1. Overview of NIST Process
    1. The 51 Use cases divided into groups
    1. Common features of the 51 Use Cases
    1. 10 Patterns of data – computer – user interaction seen in Big Data Applications

There is an overview of these lectures below. The use case overview slides recorded here are available as Google Slides

.

Lecture set 1. Overview of NIST Big Data Public Working Group (NBD-PWG) Process and Results

This is first of 4 lectures on Big Data Use Cases. It describes the process by which NIST produced this survey

Presentation or Google Slides

Use Case 1-1 Introduction to NIST Big Data Public Working Group

The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing a consensus definition, taxonomies, secure reference architectures, and technology roadmap. The aim is to create vendor-neutral, technology and infrastructure agnostic deliverables to enable big data stakeholders to pick-and-choose best analytics tools for their processing and visualization requirements on the most suitable computing platforms and clusters while allowing value-added from big data service providers and flow of data between the stakeholders in a cohesive and secure manner.

Introduction (13:02)

Use Case 1-2 Definitions and Taxonomies Subgroup

The focus is to gain a better understanding of the principles of Big Data. It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub-components on how they interact/ relate with each other according to their similarities and differences. For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences. In particular data, Science and Big Data terms are discussed.

Taxonomies (7:42)

Use Case 1-3 Reference Architecture Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus-based approach to orchestrate vendor-neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick-and-choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value-added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Results include a reference architecture with well-defined components and linkage as well as several exemplars.

Architecture (10:05)

Use Case 1-4 Security and Privacy Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. The Top Ten Big Data Security and Privacy Challenges from the CSA (Cloud Security Alliance) BDWG are studied. Specialized use cases include Retail/Marketing, Modern Day Consumerism, Nielsen Homescan, Web Traffic Analysis, Healthcare, Health Information Exchange, Genetic Privacy, Pharma Clinical Trial Data Sharing, Cyber-security, Government, Military and Education.

Security (9:51)

Use Case 1-5 Technology Roadmap Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks are gathered input from NBD subgroups and study the taxonomies for the actors' roles and responsibility, use cases and requirements, and secure reference architecture; gain an understanding of what standards are available or under development for Big Data; perform a thorough gap analysis and document the findings; identify what possible barriers may delay or prevent the adoption of Big Data; and document vision and recommendations.

Technology (4:14)

Use Case 1-6 Requirements and Use Case Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains.Tasks are gather use case input from all stakeholders; derive Big Data requirements from each use case; analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment; develop a set of general patterns capturing the essence of use cases (not done yet) and work with Reference Architecture to validate requirements and reference architecture by explicitly implementing some patterns based on use cases. The progress of gathering use cases (discussed in next two units) and requirements systemization are discussed.

Requirements (27:28)

Use Case 1-7 Recent Updates of work of NIST Public Big Data Working Group

This video is an update of recent work in this area. The first slide of this short lesson discusses a new version of use case survey that had many improvements including tags to label key features (as discussed in slide deck 3) and merged in a significant set of security and privacy fields. This came from the security and privacy working group described in lesson 4 of this slide deck. A link for this new use case form is https://bigdatawg.nist.gov/_uploadfiles/M0621_v2_7345181325.pdf

A recent December 2018 use case form for Astronomy’s Square Kilometer Array is at https://docs.google.com/document/d/1CxqCISK4v9LMMmGox-PG1bLeaRcbAI4cDIlmcoRqbDs/edit?usp=sharing This uses a simplification of the official new form.

The second (last) slide in update gives some useful on latest work. NIST’s latest work just published is at https://bigdatawg.nist.gov/V3_output_docs.php Related activities are described at http://hpc-abds.org/kaleidoscope/

Lecture set 2: 51 Big Data Use Cases from NIST Big Data Public Working Group (NBD-PWG)

Presentation or Google Slides

Use Case 2-1 Government Use Cases

This covers Census 2010 and 2000 - Title 13 Big Data; National Archives and Records Administration Accession NARA, Search, Retrieve, Preservation; Statistical Survey Response Improvement (Adaptive Design) and Non-Traditional Data in Statistical Survey Response Improvement (Adaptive Design).

Government Use Cases (17:43)

Use Case 2-2 Commercial Use Cases

This covers Cloud Eco-System, for Financial Industries (Banking, Securities & Investments, Insurance) transacting business within the United States; Mendeley - An International Network of Research; Netflix Movie Service; Web Search; IaaS (/infrastructure as a Service) Big Data Business Continuity & Disaster Recovery (BC/DR) Within A Cloud Eco-System; Cargo Shipping; Materials Data for Manufacturing and Simulation driven Materials Genomics.

This lesson is divided into 3 separate videos

Part 1

(9:31)

Part 2

(19:45)

Part 3

(10:48)

Use Case 2-3 Defense Use Cases

This covers Large Scale Geospatial Analysis and Visualization; Object identification and tracking from Wide Area Large Format Imagery (WALF) Imagery or Full Motion Video (FMV) - Persistent Surveillance and Intelligence Data Processing and Analysis.

Defense Use Cases (15:43)

Use Case 2-4 Healthcare and Life Science Use Cases

This covers Electronic Medical Record (EMR) Data; Pathology Imaging/digital pathology; Computational Bioimaging; Genomic Measurements; Comparative analysis for metagenomes and genomes; Individualized Diabetes Management; Statistical Relational Artificial Intelligence for Health Care; World Population Scale Epidemiological Study; Social Contagion Modeling for Planning, Public Health and Disaster Management and Biodiversity and LifeWatch.

Healthcare and Life Science Use Cases (30:11)

Use Case 2-5 Deep Learning and Social Networks Use Cases

This covers Large-scale Deep Learning; Organizing large-scale, unstructured collections of consumer photos; Truthy: Information diffusion research from Twitter Data; Crowd Sourcing in the Humanities as Source for Bigand Dynamic Data; CINET: Cyberinfrastructure for Network (Graph) Science and Analytics and NIST Information Access Division analytic technology performance measurement, evaluations, and standards.

Deep Learning and Social Networks Use Cases (14:19)

Use Case 2-6 Research Ecosystem Use Cases

DataNet Federation Consortium DFC; The ‘Discinnet process’, metadata -big data global experiment; Semantic Graph-search on Scientific Chemical and Text-based Data and Light source beamlines.

Research Ecosystem Use Cases (9:09)

Use Case 2-7 Astronomy and Physics Use Cases

This covers Catalina Real-Time Transient Survey (CRTS): a digital, panoramic, synoptic sky survey; DOE Extreme Data from Cosmological Sky Survey and Simulations; Large Survey Data for Cosmology; Particle Physics: Analysis of LHC Large Hadron Collider Data: Discovery of Higgs particle and Belle II High Energy Physics Experiment.

Astronomy and Physics Use Cases (17:33)

Use Case 2-8 Environment, Earth and Polar Science Use Cases

EISCAT 3D incoherent scatter radar system; ENVRI, Common Operations of Environmental Research Infrastructure; Radar Data Analysis for CReSIS Remote Sensing of Ice Sheets; UAVSAR Data Processing, DataProduct Delivery, and Data Services; NASA LARC/GSFC iRODS Federation Testbed; MERRA Analytic Services MERRA/AS; Atmospheric Turbulence - Event Discovery and Predictive Analytics; Climate Studies using the Community Earth System Model at DOE’s NERSC center; DOE-BER Subsurface Biogeochemistry Scientific Focus Area and DOE-BER AmeriFlux and FLUXNET Networks.

Environment, Earth and Polar Science Use Cases (25:29)

Use Case 2-9 Energy Use Case

This covers Consumption forecasting in Smart Grids.

Energy Use Case (4:01)

Lecture set 3: Features of 51 Big Data Use Cases from the NIST Big Data Public Working Group (NBD-PWG)

This unit discusses the categories used to classify the 51 use-cases. These categories include concepts used for parallelism and low and high level computational structure. The first lesson is an introduction to all categories and the further lessons give details of particular categories.

Presentation or Google Slides

Use Case 3-1 Summary of Use Case Classification

This discusses concepts used for parallelism and low and high level computational structure. Parallelism can be over People (users or subjects), Decision makers; Items such as Images, EMR, Sequences; observations, contents of online store; Sensors – Internet of Things; Events; (Complex) Nodes in a Graph; Simple nodes as in a learning network; Tweets, Blogs, Documents, Web Pages etc.; Files or data to be backed up, moved or assigned metadata; Particles/cells/mesh points. Low level computational types include PP (Pleasingly Parallel); MR (MapReduce); MRStat; MRIter (iterative MapReduce); Graph; Fusion; MC (Monte Carlo) and Streaming. High level computational types include Classification; S/Q (Search and Query); Index; CF (Collaborative Filtering); ML (Machine Learning); EGO (Large Scale Optimizations); EM (Expectation maximization); GIS; HPC; Agents. Patterns include Classic Database; NoSQL; Basic processing of data as in backup or metadata; GIS; Host of Sensors processed on demand; Pleasingly parallel processing; HPC assimilated with observational data; Agent-based models; Multi-modal data fusion or Knowledge Management; Crowd Sourcing.

Summary of Use Case Classification (23:39)

Use Case 3-2 Database(SQL) Use Case Classification

This discusses classic (SQL) database approach to data handling with Search&Query and Index features. Comparisons are made to NoSQL approaches.

Database (SQL) Use Case Classification (11:13)

Use Case 3-3 NoSQL Use Case Classification

This discusses NoSQL (compared in previous lesson) with HDFS, Hadoop and Hbase. The Apache Big data stack is introduced and further details of comparison with SQL.

NoSQL Use Case Classification (11:20)

Use Case 3-4 Other Use Case Classifications

This discusses a subset of use case features: GIS, Sensors. the support of data analysis and fusion by streaming data between filters.

Use Case Classifications I (12:42)

Use Case 3-5

This discusses a subset of use case features: Classification, Monte Carlo, Streaming, PP, MR, MRStat, MRIter and HPC(MPI), global and local analytics (machine learning), parallel computing, Expectation Maximization, graphs and Collaborative Filtering.

Case Classifications II (20:18)

Use Case 3-6

This discusses the classification, PP, Fusion, EGO, HPC, GIS, Agent, MC, PP, MR, Expectation maximization and benchmarks.

Use Case 3-7 Other Benchmark Sets and Classifications

This video looks at several efforts to divide applications into categories of related applications It includes “Computational Giants” from the National Research Council; Linpack or HPL from the HPC community; the NAS Parallel benchmarks from NASA; and finally the Berkeley Dwarfs from UCB. The second part of this video describes efforts in the Digital Science Center to develop Big Data classification and to unify Big Data and simulation categories. This leads to the Ogre and Convergence Diamonds. Diamonds have facets representing the different aspects by which we classify applications. See http://hpc-abds.org/kaleidoscope/

Lecture set 4. The 10 Use Case Patterns from the NIST Big Data Public Working Group (NBD-PWG)

Presentation or Google Slides

In this last slide deck of the use cases unit, we will be focusing on 10 Use case patterns. This includes multi-user querying, real-time analytics, batch analytics, data movement from external data sources, interactive analysis, data visualization, ETL, data mining and orchestration of sequential and parallel data transformations. We go through the different ways the user and system interact in each case. The use case patterns are divided into 3 classes 1) initial examples 2) science data use case patterns and 3) remaining use case patterns.

Resources

Some of the links bellow may be outdated. Please let us know the new links and notify us of the outdated links.

Last modified June 17, 2021 : add aliasses (6b7beab5)