Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Data for Discovery and Innovation: From Data Management to Digital.

Slides:



Advertisements
Similar presentations
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Advertisements

Current NIST Definition NIST Big data consists of advanced techniques that harness independent resources for building scalable data systems when the characteristics.
The Changing Research Data Paradigm One agency’s response Changes to Implementation of NSF’s Data Sharing Policy NOAA’s second annual Environmental Data.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
SDSC Computing the 21st Century Talk Given to the NSF Sugar Panel May 27, 1998.
Kirk Borne George Mason University, Fairfax, VA ● Dynamic Events in Massive Data Streams, from Astrophysics to Marketing.
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
1 LSST: Dark Energy Tony Tyson Director, LSST Project University of California, Davis Tony Tyson Director, LSST Project University of California, Davis.
Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011.
© Copyright IBM Corporation 2008 Smart Grid Overview US Chamber of Commerce Kieran McLoughlin Smart Grid Solution Leader Global Energy & Utility Industry.
The Background Even though the telescope has only been in orbit since 1990, the idea of a “space-based optical observatory” actually came after WWII in.
A student project. What is a space telescope?  A space telescope is a telescope that orbits around Earth in space.  Like other telescopes they take.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Transforming Data-Driven Publications and Decision Support Joan L. Aron, Ph.D. Consultant Federal Big Data Working Group COM.BigData 2014.
© 2011 IBM Corporation Smarter Software for a Smarter Planet The Capabilities of IBM Software Borislav Borissov SWG Manager, IBM.
DATA SCIENCE IN EDUCATION AND FOR DISCOVERY Kirk D. Borne School of Physics, Astronomy, & Computational Sciences George Mason University
UNIT NINE: Matter and Motion in the Universe  Chapter 26 The Solar System  Chapter 27 Stars  Chapter 28 Exploring the Universe.
Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
Our Mission The OPTICON Integrated Infrastructure Initiative brings together all of Western Europe's owners and operators of large observatories and data.
“Sometime in the 2010s, if all goes well, the Large Synoptic Survey Telescope (LSST) will start to bring a vision of the heavens to Earth. Suspended.
IoT, Big Data and Emerging Technologies
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
BUSINESS DRIVEN TECHNOLOGY
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Astronomy Chapter Nineteen: Galaxies and the Universe 19.1 Tools of Astronomers 19.2 Stars 19.3 Galaxies and the Universe.
Perspectives on Cyberinfrastructure Daniel E. Atkins Professor, University of Michigan School of Information & Dept. of EECS October 2002.
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
Experts in numerical algorithms and High Performance Computing services Challenges of the exponential increase in data Andrew Jones March 2010 SOS14.
Global climate system - link together many of the topics on the basis of the most recent modeling for future trends Climate patterns - short-term time.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
G. Miknaitis SC2006, Tampa, FL Observational Cosmology at Fermilab: Sloan Digital Sky Survey Dark Energy Survey SNAP Gajus Miknaitis EAG, Fermilab.
Context and Priorities April 9,  Why FHWA Focuses on Improving Operations  FHWA Operations Program Areas  Key Current Program Priorities.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Data Archives: Migration and Maintenance Douglas J. Mink Telescope Data Center Smithsonian Astrophysical Observatory NSF
Geosciences Directorate Overview May 23, Directorate for Geosciences Mission Support research in atmospheric, earth and ocean sciences Address nation’s.
Slideshow P8: The history of astronomy. We know that the night skies were studied and constellations of stars were identified over 2400 years ago. From.
Project Coordinator; Create-Net
Internet of Things (Ref: Slideshare)
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
8th Oct 2015 DIRECTOR BUSINESS DEVELOPMENT, WATER, INDIA PMP, C. ENG. MICE, MIET RAJESH PATWARDHAN SMART SOLUTIONS – ENERGY AND WATER.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Arizona Astronomical Data Hub AAS 227: Dark/Orphaned Data P. Bryan Heidorn ORCID: University of January 2016.
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Expedition Workshop Towards Scalable Data Management June 10, 2008 Chris Greer Director, NCO.
1 LSST Town Hall 227 th meeting of the AAS 1/7/2016 Pat Eliason, LSSTC Executive Office Pat Osmer, LSSTC Senior Advisor.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Big Data in Indian Agriculture D. Rama Rao Director, NAARM.
Inquiry Primer Version 1.0 Part 4: Scientific Inquiry.
BIG DATA USES CASES & LESSONS LEARNED Marrakech – March 2016 Alexandre AKROUR, CEO 1.
LSST CORPORATION Patricia Eliason LSSTC Executive Officer Belgrade, Serbia 2016.
T. Axelrod, NASA Asteroid Grand Challenge, Houston, Oct 1, 2013 Improving NEO Discovery Efficiency With Citizen Science Tim Axelrod LSST EPO Scientist.
Stanford Linear Accelerator
Optical Survey Astronomy DATA at NCSA
Digital Agriculture and Food Security: Framework for Integrating Agricultural Knowledge Services with Digital India N H Rao.
Project Coordinator; Create-Net
The Formation of the.
Find Your Opportunities in the Internet of Things 
Stanford Linear Accelerator
11/28/17—Astronomy Warm-Up: Write 3 things you know about the Milky Way galaxy. Bring laptops/project materials MONDAY!! SCSh1. Students will evaluate.
The hubble telescope Carlota.
Internet of Things in logistics
Presentation transcript:

Principal Data Scientist Booz Allen Hamilton, Strategic Innovation Data for Discovery and Innovation: From Data Management to Digital Leadership Data Scientist and Astrophysicist space.html

Ever since we first began to explore our world…

…We asked questions about the world around us.

So, we have collected evidence (data), which leads to more questions, which leads to more data collection...

It was the best of times, it was the worst of times… January 1986 – Shuttle Challenger disaster!!! August 1986 – Hubble Space Telescope (HST) was scheduled for launch, but postponed until April : Time of reflection, re-tooling, improvements, and … … a new look at Scientific Data Management

It was the best of times, it was the worst of times… : new look at Scientific Data Management! –In the pre-1986 era, NASA managers decided that HST didn’t need a data archive, just a “Data Management Facility” (e.g., that wooden crate holding the Ark of the Covenant at the end of Indiana Jones movie “Raiders of the Lost Ark”) –After some “lobbying” by HST science managers, the concept of a Hubble Science Data Archive was born! (and Borne! – who eventually became HST Data Archive Project Scientist!)

The New Data “Management” Data Reuse for Discovery is the “new normal” The Hubble Data Archive became a widely used research tool for scientists, who conducted “secondary” investigations on the data that were initially collected for some other primary research program. Now, the number of refereed papers for HST science is larger for archival research than for primary observation programs. Another project (on the ground, not in space) > The Sloan Digital Sky Survey carried out a survey of ¼ of the sky. The Sloan project scientists had their own primary science programs, but the real value came in the community re-use of the data – nearly 6000 refereed papers thus far! Big Science Data = focused on Discovery, not Management!

The New Data “Management” Data Reuse for Discovery is the “new normal” The Hubble Data Archive became a widely used research tool for scientists, who conducted “secondary” investigations on the data that were initially collected for some other primary research program. Now, the number of refereed papers for HST science is larger for archival research than for primary observation programs. Another project (on the ground, not in space) > The Sloan Digital Sky Survey carried out a survey of ¼ of the sky. The Sloan project scientists had their own primary science programs, but the real value came in the community re-use of the data – over 6000 refereed papers thus far! Big Data = focused on Discovery, not Management!

9 The Large Hadron Collider (LHC) Atlas Experiment – over 1 Petabyte/sec data generated in this huge detector (just one of several LHC experiments) Hello !

LSST = Large Synoptic Survey Telescope meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) – Petabyte image archive –20-40 Petabyte database catalog

LSST = Large Synoptic Survey Telescope meter diameter primary mirror = 10 square degrees! Hello ! (mirror funded by private donors) Construction began August 1, 2014 – Petabyte image archive –20-40 Petabyte database catalog

LSST in time and space: – When? ~ – Where? Cerro Pachon, Chile LSST Key Science Drivers: Mapping the Dynamic Universe – Complete inventory of the Solar System (Near-Earth Objects; killer asteroids???) – Nature of Dark Energy (Cosmology; Supernovae at edge of the known Universe) – Optical transients (10 million daily event notifications sent within 60 seconds) – Digital Milky Way (Dark Matter; Locations and velocities of 20 billion stars!) Architect’s design of LSST Observatory

LSST Summary Gigapixel camera One 6-Gigabyte image every 20 seconds 30 Terabytes every night for 10 years Repeat images of the entire night sky every 3 nights: Celestial Cinematography Petabyte final image data archive anticipated – all data are public!!! Petabyte final database catalog anticipated Real-Time Event Mining: ~10 million events per night, every night, for 10 years! –Follow-up observations required to classify these –Which ones should we follow up? … TRIAGE! … Decision Support through Data Science! 13

14 Visualizing 10 nights of LSST data… 14 The CD Sea in Kilmington, England (600,000 CDs ~ 300 TB)

The LSST Big Data Challenges

But the LSST is not the biggest Big Data Astronomy project being planned … 16

SKA (starting in 2024) 17

SKA = Square Kilometer Array joint project: Australia and South Africa

Data Science = Data-Oriented Discovery Scientific experiments can now be run against the data collection. Hypotheses are inferred, questions are posed, experiments are designed & run, results are analyzed, hypotheses are tested & refined! This is the 4 th Paradigm of Science This is especially (and correctly) true if the data collection is the “full” data set for a given domain: –astronomical sky surveys, human genome (1000 Genomes Project), large-scale simulations, earth observing system, ocean observatories, social networks, banking, retail, marketing, telecomm, energy exploration, national security, cybersecurity, … and the list goes on and on …

The Fourth Paradigm: Data-Intensive Scientific Discovery The 4 Scientific Paradigms: 1.Experiment (sensors) 2.Theory (modeling) 3.Simulation (HPC) 4.Data Exploration (KDD)

Huge quantities of data are acquired everywhere: Big Data is a big issue in all aspects of life: science, social networks, transportation, business, healthcare, government, national security, media, education, etc.

Data Science Use Case examples Retail (Dynamic Pricing, Smart Supply Chain, Precision Demand Forecasting) Marketing (Personalized Real-time Ad Campaigns for Next Best Offer) Smart Highways (monitoring vehicles, weather, road conditions, closures) Precision Traffic (Self-driving & Self-parking Connected Cars) Smart Cities (Growth, Dynamic Street-lighting, Smart Energy Usage) Law Enforcement (Predictive, Prescriptive personnel & resource placements) Healthcare (Wearables, Personalized Medicine, Patient/Provider Monitoring) Online Education (Personalized Learning, Real-time interventions) Forests, Farms, Vineyards,… (Precision Planning, Nurturing, Harvesting) Financial / Banking / Insurance (Real-time Risk Mitigation, Fraud detection) Organizations (Smart Ergonomics, Improved Employee Workflow, Process Mining for Efficiencies) Machines (Early Warning, Prescriptive Maintenance, Smart Obsolescence, Internet of Things IoT, and Industrial IoT) Invisibles (under-the-skin smart sensors – not only measure, but also learn, react, and proactively respond) = The Internet of Emotions!

Smart X Precision Y Personalized Z The XYZ of Data Science: (intelligence at the point of data collection)

Data Science = A National Imperative 1. National Academies report: Bits of Power: Issues in Global Access to Scientific Data, (1997) 2. NSF (National Science Foundation) report: Knowledge Lost in Information: Research Directions for Digital Libraries, (2003) downloaded from NSF report: Cyberinfrastructure for Environmental Research and Education, (2003) downloaded from NSB (National Science Board) report: Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century, (2005) downloaded from 5. NSF report with the Computing Research Association: Cyberinfrastructure for Education and Learning for the Future: A Vision and Research Agenda, (2005) downloaded from 6. NSF Atkins Report: Revolutionizing Science & Engineering Through Cyberinfrastructure: Report of the NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure, (2005) downloaded from 7. NSF report: The Role of Academic Libraries in the Digital Data Universe, (2006) downloaded from 8. National Research Council, National Academies Press report: Learning to Think Spatially, (2006) downloaded from NSF report: Cyberinfrastructure Vision for 21st Century Discovery, (2007) downloaded from JISC/NSF Workshop report on Data-Driven Science & Repositories, (2007) downloaded from DOE report: Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, (2007) downloaded from DOE report: Mathematics for Analysis of Petascale Data Workshop Report, (2008) downloaded from NSTC Interagency Working Group on Digital Data report: Harnessing the Power of Digital Data for Science and Society, (2009) downloaded from National Academies report: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, (2009) downloaded from NSF report: Data-Enabled Science in the Mathematical and Physical Sciences, (2010) downloaded from National Big Data Research and Development Initiative, (2012) downloaded from National Academies report: Frontiers in Massive Data Analysis, (2013) downloaded from

Creating Value from Big Data : The 3 D2D’s o Knowledge Discovery – Data-to-Discovery (D2D) o Data-driven Decision Support – Data-to-Decisions (D2D) o Big ROI (Return On Innovation) – Data-to-Dollars (D2D) or Data-to-Dividends – Innovative Applications of sense-making from sensors and sentinels everywhere 25

Big Data and Data Science in Education

Big Data Science for the Masses: start small, Think Big This is the CD Sea in Kilmington, England (600,000 CDs ~ 300 TB). 1)Big Data in Education Work with data in all learning settings: Informatics (Data Science) enables transparent reuse and analysis of data in inquiry-based classroom learning. Learning is enhanced when students work with real data and information (especially online data) that are related to the topic (any topic) being studied. (“Using Data in the Classroom”) 2)An Education in Big Data Students are specifically trained to: access & query big data repositories; conduct meaningful inquiries into data; mine, visualize, and analyze the data; make objective data-driven inferences, discoveries, and decisions; and communicate “stories” through data. 3)Big Data for Education = Learning Analytics Visualize This: A sea of Data (sea of CDs) We need more Data Scientists in order to discover the unknown unknowns in BIG DATA collections more efficiently and more effectively. “Big Data” are different! “Big Data” is different!

Data Literacy For All – A Reading List