Big Data Use Cases Survey
14 minute read
This unit has four lectures (slide=decks). The survey is 6 years old but the illustrative scope of Big Data Applications is still valid and has no better alternative. The problems and use of clouds has not changed. There has been algorithmic advances (deep earning) in some cases. The lectures are
-
- Overview of NIST Process
-
- The 51 Use cases divided into groups
-
- Common features of the 51 Use Cases
-
- 10 Patterns of data – computer – user interaction seen in Big Data Applications
There is an overview of these lectures below. The use case overview slides recorded here are available as Google Slides
.
Lecture set 1. Overview of NIST Big Data Public Working Group (NBD-PWG) Process and Results
This is first of 4 lectures on Big Data Use Cases. It describes the process by which NIST produced this survey
Use Case 1-1 Introduction to NIST Big Data Public Working Group
The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing a consensus definition, taxonomies, secure reference architectures, and technology roadmap. The aim is to create vendor-neutral, technology and infrastructure agnostic deliverables to enable big data stakeholders to pick-and-choose best analytics tools for their processing and visualization requirements on the most suitable computing platforms and clusters while allowing value-added from big data service providers and flow of data between the stakeholders in a cohesive and secure manner.
Introduction (13:02)
Use Case 1-2 Definitions and Taxonomies Subgroup
The focus is to gain a better understanding of the principles of Big Data. It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub-components on how they interact/ relate with each other according to their similarities and differences. For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences. In particular data, Science and Big Data terms are discussed.
Taxonomies (7:42)
Use Case 1-3 Reference Architecture Subgroup
The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus-based approach to orchestrate vendor-neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick-and-choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value-added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Results include a reference architecture with well-defined components and linkage as well as several exemplars.
Architecture (10:05)
Use Case 1-4 Security and Privacy Subgroup
The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. The Top Ten Big Data Security and Privacy Challenges from the CSA (Cloud Security Alliance) BDWG are studied. Specialized use cases include Retail/Marketing, Modern Day Consumerism, Nielsen Homescan, Web Traffic Analysis, Healthcare, Health Information Exchange, Genetic Privacy, Pharma Clinical Trial Data Sharing, Cyber-security, Government, Military and Education.
Security (9:51)
Use Case 1-5 Technology Roadmap Subgroup
The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks are gathered input from NBD subgroups and study the taxonomies for the actors' roles and responsibility, use cases and requirements, and secure reference architecture; gain an understanding of what standards are available or under development for Big Data; perform a thorough gap analysis and document the findings; identify what possible barriers may delay or prevent the adoption of Big Data; and document vision and recommendations.
Technology (4:14)
Use Case 1-6 Requirements and Use Case Subgroup
The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains.Tasks are gather use case input from all stakeholders; derive Big Data requirements from each use case; analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment; develop a set of general patterns capturing the essence of use cases (not done yet) and work with Reference Architecture to validate requirements and reference architecture by explicitly implementing some patterns based on use cases. The progress of gathering use cases (discussed in next two units) and requirements systemization are discussed.
Requirements (27:28)
Use Case 1-7 Recent Updates of work of NIST Public Big Data Working Group
This video is an update of recent work in this area. The first slide of this short lesson discusses a new version of use case survey that had many improvements including tags to label key features (as discussed in slide deck 3) and merged in a significant set of security and privacy fields. This came from the security and privacy working group described in lesson 4 of this slide deck. A link for this new use case form is https://bigdatawg.nist.gov/_uploadfiles/M0621_v2_7345181325.pdf
A recent December 2018 use case form for Astronomy’s Square Kilometer Array is at https://docs.google.com/document/d/1CxqCISK4v9LMMmGox-PG1bLeaRcbAI4cDIlmcoRqbDs/edit?usp=sharing This uses a simplification of the official new form.
The second (last) slide in update gives some useful on latest work. NIST’s latest work just published is at https://bigdatawg.nist.gov/V3_output_docs.php Related activities are described at http://hpc-abds.org/kaleidoscope/
Lecture set 2: 51 Big Data Use Cases from NIST Big Data Public Working Group (NBD-PWG)
Use Case 2-1 Government Use Cases
This covers Census 2010 and 2000 - Title 13 Big Data; National Archives and Records Administration Accession NARA, Search, Retrieve, Preservation; Statistical Survey Response Improvement (Adaptive Design) and Non-Traditional Data in Statistical Survey Response Improvement (Adaptive Design).
Government Use Cases (17:43)
Use Case 2-2 Commercial Use Cases
This covers Cloud Eco-System, for Financial Industries (Banking, Securities & Investments, Insurance) transacting business within the United States; Mendeley - An International Network of Research; Netflix Movie Service; Web Search; IaaS (/infrastructure as a Service) Big Data Business Continuity & Disaster Recovery (BC/DR) Within A Cloud Eco-System; Cargo Shipping; Materials Data for Manufacturing and Simulation driven Materials Genomics.
This lesson is divided into 3 separate videos
Part 1
(9:31)
Part 2
(19:45)
Part 3
(10:48)
Use Case 2-3 Defense Use Cases
This covers Large Scale Geospatial Analysis and Visualization; Object identification and tracking from Wide Area Large Format Imagery (WALF) Imagery or Full Motion Video (FMV) - Persistent Surveillance and Intelligence Data Processing and Analysis.
Defense Use Cases (15:43)
Use Case 2-4 Healthcare and Life Science Use Cases
This covers Electronic Medical Record (EMR) Data; Pathology Imaging/digital pathology; Computational Bioimaging; Genomic Measurements; Comparative analysis for metagenomes and genomes; Individualized Diabetes Management; Statistical Relational Artificial Intelligence for Health Care; World Population Scale Epidemiological Study; Social Contagion Modeling for Planning, Public Health and Disaster Management and Biodiversity and LifeWatch.
Healthcare and Life Science Use Cases (30:11)
Use Case 2-5 Deep Learning and Social Networks Use Cases
This covers Large-scale Deep Learning; Organizing large-scale, unstructured collections of consumer photos; Truthy: Information diffusion research from Twitter Data; Crowd Sourcing in the Humanities as Source for Bigand Dynamic Data; CINET: Cyberinfrastructure for Network (Graph) Science and Analytics and NIST Information Access Division analytic technology performance measurement, evaluations, and standards.
Deep Learning and Social Networks Use Cases (14:19)
Use Case 2-6 Research Ecosystem Use Cases
DataNet Federation Consortium DFC; The ‘Discinnet process’, metadata -big data global experiment; Semantic Graph-search on Scientific Chemical and Text-based Data and Light source beamlines.
Research Ecosystem Use Cases (9:09)
Use Case 2-7 Astronomy and Physics Use Cases
This covers Catalina Real-Time Transient Survey (CRTS): a digital, panoramic, synoptic sky survey; DOE Extreme Data from Cosmological Sky Survey and Simulations; Large Survey Data for Cosmology; Particle Physics: Analysis of LHC Large Hadron Collider Data: Discovery of Higgs particle and Belle II High Energy Physics Experiment.
Astronomy and Physics Use Cases (17:33)
Use Case 2-8 Environment, Earth and Polar Science Use Cases
EISCAT 3D incoherent scatter radar system; ENVRI, Common Operations of Environmental Research Infrastructure; Radar Data Analysis for CReSIS Remote Sensing of Ice Sheets; UAVSAR Data Processing, DataProduct Delivery, and Data Services; NASA LARC/GSFC iRODS Federation Testbed; MERRA Analytic Services MERRA/AS; Atmospheric Turbulence - Event Discovery and Predictive Analytics; Climate Studies using the Community Earth System Model at DOE’s NERSC center; DOE-BER Subsurface Biogeochemistry Scientific Focus Area and DOE-BER AmeriFlux and FLUXNET Networks.
Environment, Earth and Polar Science Use Cases (25:29)
Use Case 2-9 Energy Use Case
This covers Consumption forecasting in Smart Grids.
Energy Use Case (4:01)
Lecture set 3: Features of 51 Big Data Use Cases from the NIST Big Data Public Working Group (NBD-PWG)
This unit discusses the categories used to classify the 51 use-cases. These categories include concepts used for parallelism and low and high level computational structure. The first lesson is an introduction to all categories and the further lessons give details of particular categories.
Use Case 3-1 Summary of Use Case Classification
This discusses concepts used for parallelism and low and high level computational structure. Parallelism can be over People (users or subjects), Decision makers; Items such as Images, EMR, Sequences; observations, contents of online store; Sensors – Internet of Things; Events; (Complex) Nodes in a Graph; Simple nodes as in a learning network; Tweets, Blogs, Documents, Web Pages etc.; Files or data to be backed up, moved or assigned metadata; Particles/cells/mesh points. Low level computational types include PP (Pleasingly Parallel); MR (MapReduce); MRStat; MRIter (iterative MapReduce); Graph; Fusion; MC (Monte Carlo) and Streaming. High level computational types include Classification; S/Q (Search and Query); Index; CF (Collaborative Filtering); ML (Machine Learning); EGO (Large Scale Optimizations); EM (Expectation maximization); GIS; HPC; Agents. Patterns include Classic Database; NoSQL; Basic processing of data as in backup or metadata; GIS; Host of Sensors processed on demand; Pleasingly parallel processing; HPC assimilated with observational data; Agent-based models; Multi-modal data fusion or Knowledge Management; Crowd Sourcing.
Summary of Use Case Classification (23:39)
Use Case 3-2 Database(SQL) Use Case Classification
This discusses classic (SQL) database approach to data handling with Search&Query and Index features. Comparisons are made to NoSQL approaches.
Database (SQL) Use Case Classification (11:13)
Use Case 3-3 NoSQL Use Case Classification
This discusses NoSQL (compared in previous lesson) with HDFS, Hadoop and Hbase. The Apache Big data stack is introduced and further details of comparison with SQL.
NoSQL Use Case Classification (11:20)
Use Case 3-4 Other Use Case Classifications
This discusses a subset of use case features: GIS, Sensors. the support of data analysis and fusion by streaming data between filters.
Use Case Classifications I (12:42)
Use Case 3-5
This discusses a subset of use case features: Classification, Monte Carlo, Streaming, PP, MR, MRStat, MRIter and HPC(MPI), global and local analytics (machine learning), parallel computing, Expectation Maximization, graphs and Collaborative Filtering.
Case Classifications II (20:18)
Use Case 3-6
This discusses the classification, PP, Fusion, EGO, HPC, GIS, Agent, MC, PP, MR, Expectation maximization and benchmarks.
Use Case 3-7 Other Benchmark Sets and Classifications
This video looks at several efforts to divide applications into categories of related applications It includes “Computational Giants” from the National Research Council; Linpack or HPL from the HPC community; the NAS Parallel benchmarks from NASA; and finally the Berkeley Dwarfs from UCB. The second part of this video describes efforts in the Digital Science Center to develop Big Data classification and to unify Big Data and simulation categories. This leads to the Ogre and Convergence Diamonds. Diamonds have facets representing the different aspects by which we classify applications. See http://hpc-abds.org/kaleidoscope/
Lecture set 4. The 10 Use Case Patterns from the NIST Big Data Public Working Group (NBD-PWG)
In this last slide deck of the use cases unit, we will be focusing on 10 Use case patterns. This includes multi-user querying, real-time analytics, batch analytics, data movement from external data sources, interactive analysis, data visualization, ETL, data mining and orchestration of sequential and parallel data transformations. We go through the different ways the user and system interact in each case. The use case patterns are divided into 3 classes 1) initial examples 2) science data use case patterns and 3) remaining use case patterns.
Resources
- NIST Big Data Public Working Group (NBD-PWG) Process
- Big Data Definitions
- Big Data Taxonomies
- Big Data Use Cases and Requirements
- Big Data Security and Privacy
- Big Data Architecture White Paper Survey
- Big Data Reference Architecture
- Big Data Standards Roadmap
Some of the links bellow may be outdated. Please let us know the new links and notify us of the outdated links.
DCGSA Standard Cloud(this link does not exist any longer)- On line 51 Use Cases
- Summary of Requirements Subgroup
- Use Case 6 Mendeley
- Use Case 7 Netflix
- Use Case 8 Search
http://www.slideshare.net/kleinerperkins/kpcb-internet-trends-2013(this link does not exist any longer),- https://web.archive.org/web/20160828041032/http://webcourse.cs.technion.ac.il/236621/Winter2011-2012/en/ho_Lectures.html (Archived Pages),
- http://www.ifis.cs.tu-bs.de/teaching/ss-11/irws,
- http://www.slideshare.net/beechung/recommender-systems-tutorialpart1intro,
- http://www.worldwidewebsize.com/
- Use Case 9 IaaS (infrastructure as a Service) Big Data Business Continuity & Disaster Recovery (BC/DR) Within A Cloud Eco-System provided by Cloud Service Providers (CSPs) and Cloud Brokerage Service Providers (CBSPs)
- Use Case 11 and Use Case 12 Simulation driven Materials Genomics
- Use Case 13 Large Scale Geospatial Analysis and Visualization
- Use Case 14 Object identification and tracking from Wide Area Large
Format Imagery (WALF) Imagery or Full Motion Video (FMV) -
Persistent Surveillance
- https://web.archive.org/web/20160828235002/http://www.militaryaerospace.com/topics/m/video/79088650/persistent-surveillance-relies-on-extracting-relevant-data-points-and-connecting-the-dots.htm (Archived Pages),
- http://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/
- Use Case 15 Intelligence Data Processing and Analysis
-
http://www.afcea-aberdeen.org/files/presentations/AFCEAAberdeen_DCGSA_COLWells_PS.pdf, -
http://stids.c4i.gmu.edu/STIDS2011/papers/STIDS2011_CR_T1_SalmenEtAl.pdf,
-
http://stids.c4i.gmu.edu/papers/STIDSPapers/STIDS2012/_T14/_SmithEtAl/_HorizontalIntegrationOfWarfighterIntel.pdf,(this link does not exist any longer) -
https://www.youtube.com/watch?v=l4Qii7T8zeg(this link does not exist any longer) -
http://dcgsa.apg.army.mil/(this link does not exist any longer)
-
- Use Case 16 Electronic Medical Record (EMR) Data:
- Regenstrief Institute
- Logical observation identifiers names and codes
- Indiana Health Information Exchange
Institute of Medicine Learning Healthcare System(this link does not exist any longer)
- Use Case 17
- Pathology Imaging/digital pathology
https://web.cci.emory.edu/confluence/display/HadoopGIS(this link does not exist any longer)
- Use Case 19 Genome in a Bottle Consortium:
www.genomeinabottle.org(this link does not exist any longer)
- Use Case 20 Comparative analysis for metagenomes and genomes
- Use Case 25
- Use Case 26 Deep Learning: Recent popular press coverage of deep
learning technology:
- http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html
- http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html
- http://www.wired.com/2013/06/andrew_ng/,
A recent research paper on HPC for Deep Learning(this link does not exist any longer)- Widely-used tutorials and references for Deep Learning:
- Use Case 27 Organizing large-scale, unstructured collections of consumer photos
- Use Case 28
Use Case 30 CINET: Cyberinfrastructure for Network (Graph) Science and Analytics(this link does not exist any longer)- Use Case 31 NIST Information Access Division analytic technology performance measurement, evaluations, and standards
- Use Case 32
- DataNet Federation Consortium DFC: The DataNet Federation Consortium,
- iRODS
Use Case 33 The ‘Discinnet process’, big data global experiment(this link does not exist any longer)- Use Case 34 Semantic Graph-search on Scientific Chemical and Text-based Data
- Use Case 35 Light source beamlines
- http://www-als.lbl.gov/
https://www1.aps.anl.gov/(this link does not exist any longer)
- Use Case 36
- Use Case 37 DOE Extreme Data from Cosmological Sky Survey and Simulations
- Use Case 38 Large Survey Data for Cosmology
- Use Case 39 Particle Physics: Analysis of LHC Large Hadron Collider Data: Discovery of Higgs particle
- Use Case 40 Belle II High Energy Physics Experiment
- Use Case 41 EISCAT 3D incoherent scatter radar system
- Use Case 42 ENVRI, Common Operations of Environmental Research
Infrastructure
- ENVRI Project website
- ENVRI Reference Model (Archive Pages)
ENVRI deliverable D3.2 : Analysis of common requirements of Environmental Research Infrastructures(this link does not exist any longer)- ICOS,
- Euro-Argo
- EISCAT 3D (Archived Pages)
- LifeWatch
- EPOS
- EMSO
- Use Case 43 Radar Data Analysis for CReSIS Remote Sensing of Ice Sheets
- Use Case 44 UAVSAR Data Processing, Data Product Delivery, and Data Services
- Use Case 47 Atmospheric Turbulence - Event Discovery and Predictive Analytics
- Use Case 48 Climate Studies using the Community Earth System Model at DOE’s NERSC center
- Use Case 50 DOE-BER AmeriFlux and FLUXNET Networks
- Use Case 51 Consumption forecasting in Smart Grids
- https://web.archive.org/web/20160412194521/http://dslab.usc.edu/smartgrid.php (Archived Pages)
- https://web.archive.org/web/20120130051124/http://ganges.usc.edu/wiki/Smart_Grid (Archived Pages)
https://www.ladwp.com/ladwp/faces/ladwp/aboutus/a-power/a-p-smartgridla?_afrLoop=157401916661989&_afrWindowMode=0&_afrWindowId=null#%40%3F_afrWindowId%3Dnull%26_afrLoop%3D157401916661989%26_afrWindowMode%3D0%26_adf.ctrl-state%3Db7yulr4rl_17(this link does not exist any longer)- http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6475927