Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kirk Borne George Mason University, Fairfax, VA ● Dynamic Events in Massive Data Streams, from Astrophysics to Marketing.

Similar presentations


Presentation on theme: "Kirk Borne George Mason University, Fairfax, VA ● Dynamic Events in Massive Data Streams, from Astrophysics to Marketing."— Presentation transcript:

1 Kirk Borne George Mason University, Fairfax, VA ● www.kirkborne.net @KirkDBorne Dynamic Events in Massive Data Streams, from Astrophysics to Marketing Automation

2 1) Correlation Discovery  Finding patterns and dependencies, which reveal new natural laws or new scientific principles 2) Novelty Discovery Finding new, rare, one-in-a-[million / billion / trillion] objects and events 3) Class Discovery Finding new classes of objects and behaviors Learning the rules that constrain class boundaries 4) Association Discovery Finding unusual (improbable) co-occurring associations Data Science = 4 Types of Discovery (Learning from Data)

3 This graph says it all … 3 Steps to Discovery – Learning from Data Graphic provided by Professor S. G. Djorgovski, Caltech Unsupervised Learning : Cluster Analysis – partition the data items into clusters, without bias, ignoring any initially assigned categories = Class Discovery ! Supervised Learning : Classification – for each new data item, assign it to a known class (i.e., a known category or cluster) = Predictive Power Discovery ! Semi-supervised Learning : Outlier/Novelty Detection – identify data items that are outside the bounds of the known classes of behavior = Surprise Discovery ! 3

4 Goal of Data Science: Take Data to Information to Knowledge to Insights (and Action!) From Sensors (Measurement & Data Collection)… … to Sentinels (Monitoring & Alerts) … … to Sense-making (Data Science) … … to Cents-making (ROI) … Productizing your Big Data 4

5 Empowering Data Scientists at Booz-Allen The Field Guide to Data Science http://www.boozallen.com/insights/2013/11/data-science-field-guide National Data Science Bowl http://www.datasciencebowl.com/ Explore Data Science https://exploredatascience.com/ 5

6 The BIG Big Data Challenge: Identifying, characterizing, & responding to millions of events in real-time streaming data Astronomy example:  Real-Time Event Mining: deciding which events (out of millions) need follow-up investigation & response (triage for maximum scientific return) Web Analytics example:  Web Behavior Modeling and Automated System Response (from online interactions & web browse patterns, personalization, user segmentation, 1-to-1 marketing, advanced analytics discovery,…) Many other examples:  Health alerts (from EHRs and national health systems)  Tsunami alerts (from geo sensors everywhere)  Cybersecurity alerts (from network logs)  Social event alerts or early warnings (from social media)  Preventive Fraud alerts (from financial applications)  Predictive Maintenance alerts (from machine / engine sensors) 6

7 The MIPS model for Dynamic Data-Driven Application Systems (DDDAS) MIPS = –Measurement – Inference – Prediction – Steering This applies to any Network of Sensors: –Web user interactions & actions (web analytics data), Cyber network usage logs, Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health & Epidemic monitoring systems, Financial transactions, National Security, Utilities and Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical sky events, … Machine Learning enables the “IP” part of MIPS: –Autonomous (or semi-autonomous) Classification –Intelligent Data Understanding –Rule-based –Model-based –Neural Networks –Markov Models –Bayes Inference Engines Alert & Response systems: LSST 10million events “Mars Rover” anywhere Automation of any data- driven operational system 7 http://dddas.org

8 The MIPS model for Dynamic Data-Driven Application Systems (DDDAS) MIPS = –Measurement – Inference – Prediction – Steering This applies to any Network of Sensors: –Web user interactions & actions (web analytics data), Cyber network usage logs, Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health & Epidemic monitoring systems, Financial transactions, National Security, Utilities and Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical sky events, … Machine Learning enables the “IP” part of MIPS: –Autonomous (or semi-autonomous) Classification –Intelligent Data Understanding –Rule-based –Model-based –Neural Networks –Markov Models –Bayes Inference Engines Alert & Response systems: LSST 10million events “Mars Rover” anywhere Automation of any data- driven operational system 8 http://dddas.org From Sensors to Sentinels to Sense

9 Creating Value from Big Data : The 3 D2D’s o Knowledge Discovery – Data-to-Discovery (D2D) o Data-driven Decision Support – Data-to-Decisions (D2D) o Big ROI (Return On Innovation) !!! – Data-to-Dollars (D2D) or Data-to-Dividends 9

10 Creating Value from Big Data Knowledge Discovery – LSST (discovery) o Data-driven Decision Support – Mars Rovers (decisions) o Big ROI (Return On Innovation) !!! – Innovative Applications of Big Data everywhere 10

11 Astronomy Big Data Example The LSST (Large Synoptic Survey Telescope) 11

12 LSST = Large Synoptic Survey Telescope http://www.lsst.org/ 8.4-meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) 12

13 LSST = Large Synoptic Survey Telescope http://www.lsst.org/ 8.4-meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) Construction began August 2014 (funded by NSF and DOE) 13

14 LSST = Large Synoptic Survey Telescope http://www.lsst.org/ 8.4-meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) –100-200 Petabyte image archive –20-40 Petabyte database catalog 14

15 LSST in time and space: – When? ~2022-2032 – Where? Cerro Pachon, Chile LSST Key Science Drivers: Mapping the Dynamic Universe – Complete inventory of the Solar System (Near-Earth Objects; killer asteroids???) – Nature of Dark Energy (Cosmology; Supernovae at edge of the known Universe) – Optical transients (10 million daily event notifications sent within 60 seconds) – Digital Milky Way (Dark Matter; Locations and velocities of 20 billion stars!) Architect’s design of LSST Observatory 15

16 LSST Summary http://www.lsst.org/ http://www.lsst.org/ 3-Gigapixel camera One 6-Gigabyte image every 20 seconds 30 Terabytes every night for 10 years Repeat images of the entire night sky every 3 nights: Celestial Cinematography 100-Petabyte final image data archive anticipated – all data are public!!! 20-Petabyte final database catalog anticipated Real-Time Event Mining: ~10 million events per night, every night, for 10 years! –Follow-up observations required to classify these –Which ones should we follow up? … … Decisions! Decisions! ( = D2D !) 16

17 LSST Database = massive complexity challenge http://www.lsst.org/ http://www.lsst.org/ 20-Petabyte final database catalog anticipated ~20 trillion rows (astronomical sources), 200-300 columns How can we find descriptive and predictive models of trillions of astrophysical sources? … … Soft10ware.com’s fast statistical modeling technology offers an excellent approach: –Non-parametric, multi-model generation –Rapid model evaluation & ranking 17

18 But the LSST is not the biggest Big Data Astronomy project being planned … 18

19 SKA (starting in 2024) 19

20 SKA = Square Kilometer Array joint project: Australia and South Africa http://www.ska.gov.au/ 20 http://www.extremetech.com/extreme/124561-ibm-to-build-exascale-supercomputer-for-the-worlds-largest-million-antennae-telescope

21 21 That’s a data processing job for fast streaming real-time in-memory analytics = MapR (Apache Spark) is excellent choice http://www.mapr.com/

22 Creating Value from Big Data o Knowledge Discovery – LSST (discovery) Data-driven Decision Support – Mars Rovers (decisions) o Big ROI (Return On Innovation) !!! – Innovative Applications of Big Data everywhere 22

23 The Mars Rover Metaphor 23

24 Mars Rover: intelligent data-gatherer, mobile data mining agent, and autonomous decision-support system Rove around the surface of Mars and take samples of rocks (experimental data type: mass spectroscopy = data histogram = feature vector) Intelligent Data Operations in Action: Classification (assign rocks to known classes) Supervised Learning (search for rocks with known compositions) Unsupervised Learning (discover what types of rocks are present, without preconceived biases) Clustering (find the set of unique classes of rocks) Association Mining (find unusual associations) Deviation/Outlier Detection (one-of-kind; interesting?) On-board Intelligent Data Understanding & Decision Support Systems (Fuzzy Logic & Decision Trees & Cased-Based Reasoning ) = = Science Goal Monitoring : – “stay here and do more” ; or else “follow trend to most interesting location” – “send results to Earth immediately” ; or “send results later” 24

25 Mars Rover = metaphor for any Decision Science Engine (enabling data-to-decisions = D2D!) http://legacy.samsi.info/200506/astro/presentations/tut1loredo-7.pdf 25 New knowledge and insights are acquired by mining actionable data from all streaming inputs (sensors!) Decisions are based on the new knowledge mined, prior experience, and the “business” decisioning rules embedded within the pipeline (sentinels!) “Smart Sensors” act autonomously in real-time, without human intervention, to learn (= make sense!)

26 Voyager 1 becomes first human-made object to leave solar system http://www.nasa.gov/mission_pages/voyager/ The Movie : http://youtu.be/L4hf8HyP0LI Artist’s concept of the long journey into the abyss of deep space: http://bit.ly/13ecxov The Long Tail of the Data tells the story – – entering a new region of space – – the solar particle flux essentially vanishes! http://bit.ly/QKLHM7 Speed: 61,000 km/hr Distance: 19 billion km from Earth

27 Creating Value from Big Data o Knowledge Discovery – LSST (discovery) o Data-driven Decision Support – Mars Rovers (decisions) Big ROI (Return On Innovation) !!! – Innovative Applications of Big Data everywhere 27

28 Cognitive Analytics: for Insights, Innovation, & Decision Support – based on massive amounts of multi-channel information From Devices…… … Intentions… … Location, weather, and other geographic attributes… … Demographics… 28 © SYNTASA.com 2015

29 Enter... Advanced Analytics Automation! Learning from Data (Data Science): –Outlier / Anomaly / Novelty / Surprise detection –Clustering (= New Class discovery, Segmentation) –Correlation & Association discovery –Classification, Diagnosis, Prediction … for D2D: –Data-to-Discoveries –Data-to-Decisions –Data-to-Dividends (big ROI = Return on Innovation) 29

30 Automating Analytics as-as-Service Based on the SYNTASA.com Marketing Analytics-as-a-Service TM (MAaaS) Your own “Mars Rover in a box” – Your business rules determine the goals, decision points, alerts, and responses. – Moving beyond historical hindsight and oversight (Descriptive & Diagnostic Analytics) to new world of insight and foresight (Predictive & Prescriptive), eventually achieving right sight (Cognitive Analytics = the 360 view, enabling the right action, for the right customer, at the right place, at the right time). Mining multi-channel big data streams (across your organization) Target Marketing, Personalization, and Segmentation (“segment of one”) Decision Automation in a rich content (Big Data) environment 30 Based on Marketing Analytics-as-a-Service TM (MAaaS) from http://www.syntasa.com/ Domain Modeling & Event Response

31 Summary From Sensors to Sense(making) Learning from Data: Data Science (on dynamic data from scientific instruments, web logs, customer interactions, social media, financial systems, machine / engine logs,…) Text Analytics (on structured & unstructured reports) Causal Analysis (from “all data” = the 360 o view = big data) Achieving the 3 V’s of Productizing Big Data: Visualization – Explorability (Telling the data’s story) Value – Exploitability (Using the correct data) Veracity – Provability (Using the data correctly)


Download ppt "Kirk Borne George Mason University, Fairfax, VA ● Dynamic Events in Massive Data Streams, from Astrophysics to Marketing."

Similar presentations


Ads by Google