Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Data for Discovery and Innovation: From Data Management to Digital.

Similar presentations


Presentation on theme: "Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Data for Discovery and Innovation: From Data Management to Digital."— Presentation transcript:

1 Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Group @KirkDBorne Data for Discovery and Innovation: From Data Management to Digital Leadership http://www.boozallen.com/datascience Data Scientist and Astrophysicist http://phys.org/news/2013-10-deluge-big- space.html http://www.dynamiccio.com/2011/11/the-hidden-power-of-knowledge-management.php

2 Ever since we first began to explore our world… http://www.philaprintshop.com/geological.html

3 …We asked questions about the world around us. https://jefflynchdev.wordpress.com/tag/adobe-photoshop-lightroom-3/page/5/

4 So, we have collected evidence (data), which leads to more questions, which leads to more data collection... http://www.kgs.ku.edu/Publications/Bulletins/Sub9/

5 It was the best of times, it was the worst of times… January 1986 – Shuttle Challenger disaster!!! August 1986 – Hubble Space Telescope (HST) was scheduled for launch, but postponed until April 1990. 1986-1990: Time of reflection, re-tooling, improvements, and … … a new look at Scientific Data Management

6 It was the best of times, it was the worst of times… 1986-1990: new look at Scientific Data Management! –In the pre-1986 era, NASA managers decided that HST didn’t need a data archive, just a “Data Management Facility” (e.g., that wooden crate holding the Ark of the Covenant at the end of Indiana Jones movie “Raiders of the Lost Ark”) –After some “lobbying” by HST science managers, the concept of a Hubble Science Data Archive was born! (and Borne! – who eventually became HST Data Archive Project Scientist!)

7 The New Data “Management” Data Reuse for Discovery is the “new normal” The Hubble Data Archive became a widely used research tool for scientists, who conducted “secondary” investigations on the data that were initially collected for some other primary research program. Now, the number of refereed papers for HST science is larger for archival research than for primary observation programs. Another project (on the ground, not in space) > The Sloan Digital Sky Survey carried out a survey of ¼ of the sky. The Sloan project scientists had their own primary science programs, but the real value came in the community re-use of the data – nearly 6000 refereed papers thus far! http://blog.sdss3.org/2013/03/26/sdss-has-now-been-used-by-over-5000-refereed-papers/ Big Science Data = focused on Discovery, not Management!

8 The New Data “Management” Data Reuse for Discovery is the “new normal” The Hubble Data Archive became a widely used research tool for scientists, who conducted “secondary” investigations on the data that were initially collected for some other primary research program. Now, the number of refereed papers for HST science is larger for archival research than for primary observation programs. Another project (on the ground, not in space) > The Sloan Digital Sky Survey carried out a survey of ¼ of the sky. The Sloan project scientists had their own primary science programs, but the real value came in the community re-use of the data – over 6000 refereed papers thus far! http://blog.sdss3.org/2013/03/26/sdss-has-now-been-used-by-over-5000-refereed-papers/ Big Data = focused on Discovery, not Management!

9 9 The Large Hadron Collider (LHC) Atlas Experiment – over 1 Petabyte/sec data generated in this huge detector (just one of several LHC experiments) http://atlas.ch/ Hello !

10 LSST = Large Synoptic Survey Telescope http://www.lsst.org/ 8.4-meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) –100-200 Petabyte image archive –20-40 Petabyte database catalog

11 LSST = Large Synoptic Survey Telescope http://www.lsst.org/ 8.4-meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) Construction began August 1, 2014 –100-200 Petabyte image archive –20-40 Petabyte database catalog

12 LSST in time and space: – When? ~2022-2032 – Where? Cerro Pachon, Chile LSST Key Science Drivers: Mapping the Dynamic Universe – Complete inventory of the Solar System (Near-Earth Objects; killer asteroids???) – Nature of Dark Energy (Cosmology; Supernovae at edge of the known Universe) – Optical transients (10 million daily event notifications sent within 60 seconds) – Digital Milky Way (Dark Matter; Locations and velocities of 20 billion stars!) Architect’s design of LSST Observatory

13 LSST Summary http://www.lsst.org/ http://www.lsst.org/ 3-Gigapixel camera One 6-Gigabyte image every 20 seconds 30 Terabytes every night for 10 years Repeat images of the entire night sky every 3 nights: Celestial Cinematography 100-200 Petabyte final image data archive anticipated – all data are public!!! 20-40 Petabyte final database catalog anticipated Real-Time Event Mining: ~10 million events per night, every night, for 10 years! –Follow-up observations required to classify these –Which ones should we follow up? … TRIAGE! … Decision Support through Data Science! 13

14 14 Visualizing 10 nights of LSST data… 14 The CD Sea in Kilmington, England (600,000 CDs ~ 300 TB)

15 The LSST Big Data Challenges

16 But the LSST is not the biggest Big Data Astronomy project being planned … 16

17 SKA (starting in 2024) 17

18 SKA = Square Kilometer Array joint project: Australia and South Africa http://www.ska.gov.au/ 18 http://www.extremetech.com/extreme/124561-ibm-to-build-exascale-supercomputer-for-the-worlds-largest-million-antennae-telescope

19 Data Science = Data-Oriented Discovery Scientific experiments can now be run against the data collection. Hypotheses are inferred, questions are posed, experiments are designed & run, results are analyzed, hypotheses are tested & refined! This is the 4 th Paradigm of Science This is especially (and correctly) true if the data collection is the “full” data set for a given domain: –astronomical sky surveys, human genome (1000 Genomes Project), large-scale simulations, earth observing system, ocean observatories, social networks, banking, retail, marketing, telecomm, energy exploration, national security, cybersecurity, … and the list goes on and on …

20 The Fourth Paradigm: Data-Intensive Scientific Discovery http://research.microsoft.com/en-us/collaboration/fourthparadigm/ http://research.microsoft.com/en-us/collaboration/fourthparadigm/ The 4 Scientific Paradigms: 1.Experiment (sensors) 2.Theory (modeling) 3.Simulation (HPC) 4.Data Exploration (KDD)

21 Huge quantities of data are acquired everywhere: Big Data is a big issue in all aspects of life: science, social networks, transportation, business, healthcare, government, national security, media, education, etc.

22 Data Science Use Case examples Retail (Dynamic Pricing, Smart Supply Chain, Precision Demand Forecasting) Marketing (Personalized Real-time Ad Campaigns for Next Best Offer) Smart Highways (monitoring vehicles, weather, road conditions, closures) Precision Traffic (Self-driving & Self-parking Connected Cars) Smart Cities (Growth, Dynamic Street-lighting, Smart Energy Usage) Law Enforcement (Predictive, Prescriptive personnel & resource placements) Healthcare (Wearables, Personalized Medicine, Patient/Provider Monitoring) Online Education (Personalized Learning, Real-time interventions) Forests, Farms, Vineyards,… (Precision Planning, Nurturing, Harvesting) Financial / Banking / Insurance (Real-time Risk Mitigation, Fraud detection) Organizations (Smart Ergonomics, Improved Employee Workflow, Process Mining for Efficiencies) Machines (Early Warning, Prescriptive Maintenance, Smart Obsolescence, Internet of Things IoT, and Industrial IoT) Invisibles (under-the-skin smart sensors – not only measure, but also learn, react, and proactively respond) = The Internet of Emotions!

23 Smart X Precision Y Personalized Z http://blog.autoserviceintelligence.com/a-smart-crm-in-action-building-consumer-trust-part-2 The XYZ of Data Science: (intelligence at the point of data collection)

24 Data Science = A National Imperative 1. National Academies report: Bits of Power: Issues in Global Access to Scientific Data, (1997) http://www.nap.edu/catalog.php?record_id=5504http://www.nap.edu/catalog.php?record_id=5504 2. NSF (National Science Foundation) report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from http://www.sis.pitt.edu/~dlwkshop/report.pdf http://www.sis.pitt.edu/~dlwkshop/report.pdf 3. NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from http://www.ncar.ucar.edu/cyber/cyberreport.pdf http://www.ncar.ucar.edu/cyber/cyberreport.pdf 4. NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdfhttp://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf 5. NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from http://www.cra.org/reports/cyberinfrastructure.pdfhttp://www.cra.org/reports/cyberinfrastructure.pdf 6. NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure, (2005) downloaded from http://www.nsf.gov/od/oci/reports/atkins.pdfhttp://www.nsf.gov/od/oci/reports/atkins.pdf 7. NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from http://www.arl.org/bm~doc/digdatarpt.pdfhttp://www.arl.org/bm~doc/digdatarpt.pdf 8. National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from http://www.nap.edu/catalog.php?record_id=11019 http://www.nap.edu/catalog.php?record_id=11019 9. NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from http://www.nsf.gov/od/oci/ci_v5.pdfhttp://www.nsf.gov/od/oci/ci_v5.pdf 10. JISC/NSF Workshop report on Data-Driven Science & Repositories, (2007) downloaded from http://www.sis.pitt.edu/~repwkshop/NSF-JISC-report.pdf http://www.sis.pitt.edu/~repwkshop/NSF-JISC-report.pdf 11. DOE report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, (2007) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/DOE-Visualization-Report-2007.pdfhttp://www.sc.doe.gov/ascr/ProgramDocuments/Docs/DOE-Visualization-Report-2007.pdf 12. DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf 13. NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from http://www.nitrd.gov/about/Harnessing_Power_Web.pdfhttp://www.nitrd.gov/about/Harnessing_Power_Web.pdf 14. National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from http://www.nap.edu/catalog.php?record_id=12615http://www.nap.edu/catalog.php?record_id=12615 15. NSF report: Data-Enabled Science in the Mathematical and Physical Sciences, (2010) downloaded from http://www.cra.org/ccc/docs/reports/DES-report_final.pdf 16. National Big Data Research and Development Initiative, (2012) downloaded from http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf 17. National Academies report: Frontiers in Massive Data Analysis, (2013) downloaded from http://www.nap.edu/catalog.php?record_id=18374http://www.nap.edu/catalog.php?record_id=18374

25 Creating Value from Big Data : The 3 D2D’s o Knowledge Discovery – Data-to-Discovery (D2D) o Data-driven Decision Support – Data-to-Decisions (D2D) o Big ROI (Return On Innovation) – Data-to-Dollars (D2D) or Data-to-Dividends – Innovative Applications of sense-making from sensors and sentinels everywhere 25

26 Big Data and Data Science in Education

27 Big Data Science for the Masses: start small, Think Big This is the CD Sea in Kilmington, England (600,000 CDs ~ 300 TB). 1)Big Data in Education Work with data in all learning settings: Informatics (Data Science) enables transparent reuse and analysis of data in inquiry-based classroom learning. Learning is enhanced when students work with real data and information (especially online data) that are related to the topic (any topic) being studied. http://serc.carleton.edu/usingdata/ (“Using Data in the Classroom”) 2)An Education in Big Data Students are specifically trained to: access & query big data repositories; conduct meaningful inquiries into data; mine, visualize, and analyze the data; make objective data-driven inferences, discoveries, and decisions; and communicate “stories” through data. 3)Big Data for Education = Learning Analytics Visualize This: A sea of Data (sea of CDs) We need more Data Scientists in order to discover the unknown unknowns in BIG DATA collections more efficiently and more effectively. “Big Data” are different! “Big Data” is different!

28 Data Literacy For All – A Reading List http://rocketdatascience.org/?p=356 http://www.boozallen.com/datascience


Download ppt "Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Data for Discovery and Innovation: From Data Management to Digital."

Similar presentations


Ads by Google