Big Data Use Cases Survey

Tags:

13 minute read

This section covers 51 values of X and an overall study of Big data that emerged from a NIST (National Institute for Standards and Technology) study of Big data. The section covers the NIST Big Data Public Working Group (NBD-PWG) Process and summarizes the work of five subgroups: Definitions and Taxonomies Subgroup, Reference Architecture Subgroup, Security and Privacy Subgroup, Technology Roadmap Subgroup and the Requirements andUse Case Subgroup. 51 use cases collected in this process are briefly discussed with a classification of the source of parallelism and the high and low level computational structure. We describe the key features of this classification.

NIST Big Data Public Working Group

This unit covers the NIST Big Data Public Working Group (NBD-PWG) Process and summarizes the work of five subgroups: Definitions and Taxonomies Subgroup, Reference Architecture Subgroup, Security and Privacy Subgroup, Technology Roadmap Subgroup and the Requirements and Use Case Subgroup. The work of latter is continued in next two units.

Presentation Overview (45)

Introduction to NIST Big Data Public Working

The focus of the (NBD-PWG) is to form a community of interest from industry, academia, and government, with the goal of developing a consensus definitions, taxonomies, secure reference architectures, and technology roadmap. The aim is to create vendor-neutral, technology and infrastructure agnostic deliverables to enable big data stakeholders to pick-and-choose best analytics tools for their processing and visualization requirements on the most suitable computing platforms and clusters while allowing value-added from big data service providers and flow of data between the stakeholders in a cohesive and secure manner.

Video Introduction (13:02)

Definitions and Taxonomies Subgroup

The focus is to gain a better understanding of the principles of Big Data. It is important to develop a consensus-based common language and vocabulary terms used in Big Data across stakeholders from industry, academia, and government. In addition, it is also critical to identify essential actors with roles and responsibility, and subdivide them into components and sub-components on how they interact/ relate with each other according to their similarities and differences.

For Definitions: Compile terms used from all stakeholders regarding the meaning of Big Data from various standard bodies, domain applications, and diversified operational environments. For Taxonomies: Identify key actors with their roles and responsibilities from all stakeholders, categorize them into components and subcomponents based on their similarities and differences. In particular data Science and Big Data terms are discussed.

Video Taxonomies (7:42)

Reference Architecture Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus-based approach to orchestrate vendor-neutral, technology and infrastructure agnostic for analytics tools and computing environments. The goal is to enable Big Data stakeholders to pick-and-choose technology-agnostic analytics tools for processing and visualization in any computing platform and cluster while allowing value-added from Big Data service providers and the flow of the data between the stakeholders in a cohesive and secure manner. Results include a reference architecture with well defined components and linkage as well as several exemplars.

Video Architecture (10:05)

Security and Privacy Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus secure reference architecture to handle security and privacy issues across all stakeholders. This includes gaining an understanding of what standards are available or under development, as well as identifies which key organizations are working on these standards. The Top Ten Big Data Security and Privacy Challenges from the CSA (Cloud Security Alliance) BDWG are studied. Specialized use cases include Retail/Marketing, Modern Day Consumerism, Nielsen Homescan, Web Traffic Analysis, Healthcare, Health Information Exchange, Genetic Privacy, Pharma Clinical Trial Data Sharing, Cyber-security, Government, Military and Education.

Video Security (9:51)

Technology Roadmap Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus vision with recommendations on how Big Data should move forward by performing a good gap analysis through the materials gathered from all other NBD subgroups. This includes setting standardization and adoption priorities through an understanding of what standards are available or under development as part of the recommendations. Tasks are gather input from NBD subgroups and study the taxonomies for the actors' roles and responsibility, use cases and requirements, and secure reference architecture; gain understanding of what standards are available or under development for Big Data; perform a thorough gap analysis and document the findings; identify what possible barriers may delay or prevent adoption of Big Data; and document vision and recommendations.

Video Technology (4:14)

Interfaces Subgroup

This subgroup is working on the following document: NIST Big Data Interoperability Framework: Volume 8, Reference Architecture Interface.

This document summarizes interfaces that are instrumental for the interaction with Clouds, Containers, and HPC systems to manage virtual clusters to support the NIST Big Data Reference Architecture (NBDRA). The Representational State Transfer (REST) paradigm is used to define these interfaces allowing easy integration and adoption by a wide variety of frameworks. . This volume, Volume 8, uses the work performed by the NBD-PWG to identify objects instrumental for the NIST Big Data Reference Architecture (NBDRA) which is introduced in the NBDIF: Volume 6, Reference Architecture.

This presentation was given at the 2nd NIST Big Data Public Working Group (NBD-PWG) Workshop in Washington DC in June 2017. It explains our thoughts on deriving automatically a reference architecture form the Reference Architecture Interface specifications directly from the document.

The workshop Web page is located at

https://bigdatawg.nist.gov/workshop2.php

The agenda of the workshop is as follows:

https://bigdatawg.nist.gov/2017_NIST_Big_Data_PWG_WorkshopAgenda_with_Speakers_Bio.pdf

The Web cas of the presentation is given bellow, while you need to fast forward to a particular time

Webcast: Interface subgroup: https://www.nist.gov/news-events/events/2017/06/2nd-nist-big-data-public-working-group-nbd-pwg-workshop
- see: Big Data Working Group Day 1, part 2 Time start: 21:00 min, Time end: 44:00
Slides: https://github.com/cloudmesh/cloudmesh.rest/blob/master/docs/NBDPWG-vol8.pptx?raw=true
Document: https://github.com/cloudmesh/cloudmesh.rest/raw/master/docs/NIST.SP.1500-8-draft.pdf

You are welcome to view other presentations if you are interested.

Requirements and Use Case Subgroup

The focus is to form a community of interest from industry, academia, and government, with the goal of developing a consensus list of Big Data requirements across all stakeholders. This includes gathering and understanding various use cases from diversified application domains.Tasks are gather use case input from all stakeholders; derive Big Data requirements from each use case; analyze/prioritize a list of challenging general requirements that may delay or prevent adoption of Big Data deployment; develop a set of general patterns capturing the essence of use cases (not done yet) and work with Reference Architecture to validate requirements and reference architecture by explicitly implementing some patterns based on use cases. The progress of gathering use cases (discussed in next two units) and requirements systemization are discussed.

Video Requirements (27:28)

51 Big Data Use Cases

This units consists of one or more slides for each of the 51 use cases - typically additional (more than one) slides are associated with pictures. Each of the use cases is identified with source of parallelism and the high and low level computational structure. As each new classification topic is introduced we briefly discuss it but full discussion of topics is given in following unit.

Presentation 51 Use Cases (100)

Government Use Cases

This covers Census 2010 and 2000 - Title 13 Big Data; National Archives and Records Administration Accession NARA, Search, Retrieve, Preservation; Statistical Survey Response Improvement (Adaptive Design) and Non-Traditional Data in Statistical Survey Response Improvement (Adaptive Design).

Video Government Use Cases (17:43)

Commercial Use Cases

This covers Cloud Eco-System, for Financial Industries (Banking, Securities & Investments, Insurance) transacting business within the United States; Mendeley - An International Network of Research; Netflix Movie Service; Web Search; IaaS (Infrastructure as a Service) Big Data Business Continuity & Disaster Recovery (BC/DR) Within A Cloud Eco-System; Cargo Shipping; Materials Data for Manufacturing and Simulation driven Materials Genomics.

Video Commercial Use Cases (17:43)

Defense Use Cases

This covers Large Scale Geospatial Analysis and Visualization; Object identification and tracking from Wide Area Large Format Imagery (WALF) Imagery or Full Motion Video (FMV) - Persistent Surveillance and Intelligence Data Processing and Analysis.

Video Defense Use Cases (15:43)

Healthcare and Life Science Use Cases

This covers Electronic Medical Record (EMR) Data; Pathology Imaging/digital pathology; Computational Bioimaging; Genomic Measurements; Comparative analysis for metagenomes and genomes; Individualized Diabetes Management; Statistical Relational Artificial Intelligence for Health Care; World Population Scale Epidemiological Study; Social Contagion Modeling for Planning, Public Health and Disaster Management and Biodiversity and LifeWatch.

Video Healthcare and Life Science Use Cases (30:11)

This covers Large-scale Deep Learning; Organizing large-scale, unstructured collections of consumer photos; Truthy: Information diffusion research from Twitter Data; Crowd Sourcing in the Humanities as Source for Bigand Dynamic Data; CINET: Cyberinfrastructure for Network (Graph) Science and Analytics and NIST Information Access Division analytic technology performance measurement, evaluations, and standards.

Video Deep Learning and Social Networks Use Cases (14:19)

Research Ecosystem Use Cases

DataNet Federation Consortium DFC; The ‘Discinnet process’, metadata -big data global experiment; Semantic Graph-search on Scientific Chemical and Text-based Data and Light source beamlines.

Video Research Ecosystem Use Cases (9:09)

Astronomy and Physics Use Cases

This covers Catalina Real-Time Transient Survey (CRTS): a digital, panoramic, synoptic sky survey; DOE Extreme Data from Cosmological Sky Survey and Simulations; Large Survey Data for Cosmology; Particle Physics: Analysis of LHC Large Hadron Collider Data: Discovery of Higgs particle and Belle II High Energy Physics Experiment.

Video Astronomy and Physics Use Cases (17:33)

Environment, Earth and Polar Science Use Cases

EISCAT 3D incoherent scatter radar system; ENVRI, Common Operations of Environmental Research Infrastructure; Radar Data Analysis for CReSIS Remote Sensing of Ice Sheets; UAVSAR Data Processing, DataProduct Delivery, and Data Services; NASA LARC/GSFC iRODS Federation Testbed; MERRA Analytic Services MERRA/AS; Atmospheric Turbulence - Event Discovery and Predictive Analytics; Climate Studies using the Community Earth System Model at DOE’s NERSC center; DOE-BER Subsurface Biogeochemistry Scientific Focus Area and DOE-BER AmeriFlux and FLUXNET Networks.

Video Environment, Earth and Polar Science Use Cases (25:29)

Energy Use Case

This covers Consumption forecasting in Smart Grids.

Video Energy Use Case (4:01)

Features of 51 Big Data Use Cases

This unit discusses the categories used to classify the 51 use-cases. These categories include concepts used for parallelism and low and high level computational structure. The first lesson is an introduction to all categories and the further lessons give details of particular categories.

Presentation Features (43)

Summary of Use Case Classification

This discusses concepts used for parallelism and low and high level computational structure. Parallelism can be over People (users or subjects), Decision makers; Items such as Images, EMR, Sequences; observations, contents of online store; Sensors – Internet of Things; Events; (Complex) Nodes in a Graph; Simple nodes as in a learning network; Tweets, Blogs, Documents, Web Pages etc.; Files or data to be backed up, moved or assigned metadata; Particles/cells/mesh points. Low level computational types include PP (Pleasingly Parallel); MR (MapReduce); MRStat; MRIter (Iterative MapReduce); Graph; Fusion; MC (Monte Carlo) and Streaming. High level computational types include Classification; S/Q (Search and Query); Index; CF (Collaborative Filtering); ML (Machine Learning); EGO (Large Scale Optimizations); EM (Expectation maximization); GIS; HPC; Agents. Patterns include Classic Database; NoSQL; Basic processing of data as in backup or metadata; GIS; Host of Sensors processed on demand; Pleasingly parallel processing; HPC assimilated with observational data; Agent-based models; Multi-modal data fusion or Knowledge Management; Crowd Sourcing.

Video Summary of Use Case Classification (23:39)

Database(SQL) Use Case Classification

This discusses classic (SQL) database approach to data handling with Search&Query and Index features. Comparisons are made to NoSQL approaches.

Video Database (SQL) Use Case Classification (11:13)

NoSQL Use Case Classification

This discusses NoSQL (compared in previous lesson) with HDFS, Hadoop and Hbase. The Apache Big data stack is introduced and further details of comparison with SQL.

Video NoSQL Use Case Classification (11:20)

Other Use Case Classifications

This discusses a subset of use case features: GIS, Sensors. the support of data analysis and fusion by streaming data between filters.

Video Use Case Classifications I (12:42) This discusses a subset of use case features: Pleasingly parallel, MRStat, Data Assimilation, Crowd sourcing, Agents, data fusion and agents, EGO and security.

Video Use Case Classifications II (20:18)

This discusses a subset of use case features: Classification, Monte Carlo, Streaming, PP, MR, MRStat, MRIter and HPC(MPI), global and local analytics (machine learning), parallel computing, Expectation Maximization, graphs and Collaborative Filtering.

Video Use Case Classifications III (17:25)

\TODO{These resources have not all been checked to see if they still exist this is currently in progress}

Resources

Some of the links bellow may be outdated. Please let us know the new links and notify us of the outdated links.

DCGSA Standard Cloud
On line 51 Use Cases
Summary of Requirements Subgroup
~~Use Case 6 Mendeley~~ (this link does not exist any longer)
~~Use Case 7 Netflix~~
Use Case 8 Search
- http://www.slideshare.net/kleinerperkins/kpcb-internet-trends-2013,
- http://webcourse.cs.technion.ac.il/236621/Winter2011-2012/en/ho_Lectures.html,
- http://www.ifis.cs.tu-bs.de/teaching/ss-11/irws,
- http://www.slideshare.net/beechung/recommender-systems-tutorialpart1intro,
- http://www.worldwidewebsize.com/
Use Case 9 IaaS (Infrastructure as a Service) Big Data Business Continuity & Disaster Recovery (BC/DR) Within A Cloud Eco-System provided by Cloud Service Providers (CSPs) and Cloud Brokerage Service Providers (CBSPs)
Use Case 11 and Use Case 12 Simulation driven Materials Genomics
Use Case 13 Large Scale Geospatial Analysis and Visualization
- http://www.opengeospatial.org/standards
- http://geojson.org/
- http://earth-info.nga.mil/publications/specs/printed/CADRG/cadrg.html
Use Case 14 Object identification and tracking from Wide Area Large Format Imagery (WALF) Imagery or Full Motion Video (FMV) - Persistent Surveillance
- http://www.militaryaerospace.com/topics/m/video/79088650/persistent-surveillance-relies-on-extracting-relevant-data-points-and-connecting-the-dots.htm,
- http://www.defencetalk.com/wide-area-persistent-surveillance-revolutionizes-tactical-isr-45745/
Use Case 15 Intelligence Data Processing and Analysis
- http://www.afcea-aberdeen.org/files/presentations/AFCEAAberdeen_DCGSA_COLWells_PS.pdf,
- http://stids.c4i.gmu.edu/STIDS2011/papers/STIDS2011_CR_T1_SalmenEtAl.pdf,
- ~~http://stids.c4i.gmu.edu/papers/STIDSPapers/STIDS2012/_T14/_SmithEtAl/_HorizontalIntegrationOfWarfighterIntel.pdf,~~
- ~~https://www.youtube.com/watch?v=l4Qii7T8zeg~~
- ~~http://dcgsa.apg.army.mil/~~
Use Case 16 Electronic Medical Record (EMR) Data:
- Regenstrief Institute
- Logical observation identifiers names and codes
- Indiana Health Information Exchange
- ~~Institute of Medicine Learning Healthcare System~~
Use Case 17
- Pathology Imaging/digital pathology
- https://web.cci.emory.edu/confluence/display/HadoopGIS
Use Case 19 Genome in a Bottle Consortium:
- ~~www.genomeinabottle.org~~
Use Case 20 Comparative analysis for metagenomes and genomes
Use Case 25
- Biodiversity
- LifeWatch
Use Case 26 Deep Learning: Recent popular press coverage of deep learning technology:
- http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html
- http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html
- http://www.wired.com/2013/06/andrew_ng/,
- ~~A recent research paper on HPC for Deep Learning~~
- Widely-used tutorials and references for Deep Learning:
  - http://ufldl.stanford.edu/wiki/index.php/Main_Page
  - http://deeplearning.net/
Use Case 27 Organizing large-scale, unstructured collections of consumer photos
Use Case 28
- Truthy: Information diffusion research from Twitter Data
- http://cnets.indiana.edu/groups/nan/truthy/
- http://cnets.indiana.edu/groups/nan/despic/
~~Use Case 30 CINET: Cyberinfrastructure for Network (Graph) Science and Analytics~~
Use Case 31 NIST Information Access Division analytic technology performance measurement, evaluations, and standards
Use Case 32
- DataNet Federation Consortium DFC: The DataNet Federation Consortium,
- iRODS
Use Case 33 The ‘Discinnet process’, big data global experiment
Use Case 34 Semantic Graph-search on Scientific Chemical and Text-based Data
- http://www.eurekalert.org/pub_releases/2013-07/aiop-ffm071813.php
- http://xpdb.nist.gov/chemblast/pdb.pl
Use Case 35 Light source beamlines
- http://www-als.lbl.gov/
- https://www1.aps.anl.gov/
Use Case 36
- CRTS survey
- CSS survey
- For an overview of the classification challenges
Use Case 37 DOE Extreme Data from Cosmological Sky Survey and Simulations
- http://www.lsst.org/lsst/
- http://www.nersc.gov/
- http://www.nersc.gov/assets/Uploads/HabibcosmosimV2.pdf
Use Case 38 Large Survey Data for Cosmology
- http://desi.lbl.gov/
- http://www.darkenergysurvey.org/
Use Case 39 Particle Physics: Analysis of LHC Large Hadron Collider Data: Discovery of Higgs particle
- http://grids.ucs.indiana.edu/ptliupages/publications/Where%20does%20all%20the%20data%20come%20from%20v7.pdf,
- http://www.es.net/assets/pubs_presos/High-throughput-lessons-from-the-LHC-experience.Johnston.TNC2013.pdf
~~Use Case 40 Belle II High Energy Physics Experiment~~ (old link does not exist, new link: https://www.belle2.org)
Use Case 41 EISCAT 3D incoherent scatter radar system
Use Case 42 ENVRI, Common Operations of Environmental Research Infrastructure
- ENVRI Project website
- ~~ENVRI Reference Model~~
- ~~ENVRI deliverable D3.2 : Analysis of common requirements of Environmental Research Infrastructures~~
- ICOS,
- Euro-Argo
- EISCAT 3D
- LifeWatch
- EPOS
- EMSO
Use Case 43 Radar Data Analysis for CReSIS Remote Sensing of Ice Sheets
Use Case 44 UAVSAR Data Processing, Data Product Delivery, and Data Services
- http://uavsar.jpl.nasa.gov/,
- http://www.asf.alaska.edu/program/sdc,
- http://geo-gateway.org/main.html
Use Case 47 Atmospheric Turbulence - Event Discovery and Predictive Analytics
- http://oceanworld.tamu.edu/resources/oceanography-book/teleconnections.htm
- http://www.forbes.com/sites/toddwoody/2012/03/21/meet-the-scientists-mining-big-data-to-predict-the-weather/
Use Case 48 Climate Studies using the Community Earth System Model at DOE’s NERSC center
- http://www-pcmdi.llnl.gov/
- http://www.nersc.gov/
- http://science.energy.gov/ber/research/cesd/
- http://www2.cisl.ucar.edu/
Use Case 50 DOE-BER AmeriFlux and FLUXNET Networks
- http://ameriflux.lbl.gov/
- http://www.fluxdata.org/default.aspx
Use Case 51 Consumption forecasting in Smart Grids
- ~~http://smartgrid.usc.edu/~~ (old link does not exsit, new link: http://dslab.usc.edu/smartgrid.php)
- http://ganges.usc.edu/wiki/Smart_Grid
- https://www.ladwp.com/ladwp/faces/ladwp/aboutus/a-power/a-p-smartgridla?_afrLoop=157401916661989&_afrWindowMode=0&_afrWindowId=null#%40%3F_afrWindowId%3Dnull%26_afrLoop%3D157401916661989%26_afrWindowMode%3D0%26_adf.ctrl-state%3Db7yulr4rl_17
- http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6475927

Last modified June 17, 2021 : add aliasses (6b7beab5)