Earth Data Science Planning Meeting #1 February 20, 2013.

Slides:



Advertisements
Similar presentations
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Challenges of Analyzing.
Advertisements

Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
NWS Support for a National Mesonet Network of Weather & Climate Observing Networks (NOWCON) Don Berchoff Director, Office of Science and Technology NOAA.
UWG 2013 Meeting Science Direction Discussion. Thrusts Invigorate outreach Deploy DM infrastructure Modernize data access tools Enhance web presence Integrate.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
SDSC Computing the 21st Century Talk Given to the NSF Sugar Panel May 27, 1998.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed.
May 17, Capabilities Description of a Rapid Prototyping Capability for Earth-Sun System Sciences RPC Project Team Mississippi State University.
1 WRF Development Test Center A NOAA Perspective WRF ExOB Meeting U.S. Naval Observatory, Washington, D.C. 28 April 2006 Fred Toepfer NOAA Environmental.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
An Introduction to the Open Science Data Cloud Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Update on the NASA/NOAA/DOE Collaboration on the Utilization of ROA/UAV/UAS for Global Climate Change and Weather Research Will Bolton Sandia National.
Welcome to HTCondor Week #14 (year #29 for our project)
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
US NITRD LSN-MAGIC Coordinating Team – Organization and Goals Richard Carlson NGNS Program Manager, Research Division, Office of Advanced Scientific Computing.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
1 Robert S. Webb and Roger S. Pulwarty NOAA Climate Service.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Partnerships and Broadening Participation Dr. Nathaniel G. Pitts Director, Office of Integrative Activities May 18, 2004 Center.
, Increasing Discoverability and Accessibility of NASA Atmospheric Science Data Center (ASDC) Data Products with GIS Technology ASDC Introduction The Atmospheric.
The Materials Genome Initiative and Materials Innovation Infrastructure Meredith Drosback White House Office of Science and Technology Policy September.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Federal Networking and Information Technology R&D Program Big Data Senior Steering Group Wendy Wigen, Technical Coordinator April 13, 2012.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
Earth Data Science Planning Meeting #2 March 7, 2013.
ESIP Federation Air Quality Cluster Partner Agencies.
SEEDS Technology Infusion Study ESIP Federation SEEDS Cluster Group Meeting: SEEDS Capability Vision Development ESTO & SEEDS - Karen Moe SEEDS - David.
David Mogk Dept. of Earth Sciences Montana State University April 8, 2015 Webinar SAGE/GAGE FACILITIES SUPPORTING BROADER EDUCATIONAL IMPACTS: SOME CONTEXTS.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Europlanet RI A network for all planetary scientists A service to the community Manuel Grande Scientific Dissemination Coordinator EC grant agreement no.
PSCIC Working Group: Parag Chitnis Chris Greer Susan Lolle Sam Scheiner Jane Silverthorne Bill Zamer Manfred Zorn.
INTO THE NEW YEAR January 3, Objectives Reaffirm principles –China’s interest in exploring ESIP structure prompted review of ESIP evolution (more.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
EGovOS Panel Discussion CIO Council Architecture & Infrastructure Committee Subcommittee Co-Chairs March 15, 2004.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The State Climatologist Program and a National Climate Services Initiative Mark A. Shafer Oklahoma Climatological Survey University of Oklahoma.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Science Data in the Science Mission Directorate (SMD) Jeffrey J.E. Hayes Program Executive for MO & DA, Heliophysics Division August 17, 2011.
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Challenges of Coping with Funding and Data Management in a Changing World Rick Lyons Director Infectious Disease Research Center.
Evolving a Legacy System Evolution of the Earth Observing Data and Information System M. Esfandiari 1, H. Ramapriyan 1, J. Behnke 1, E. Sofinowski 2 1.
ADASS the Planning and Scheduling Perspective Roadmap: - How planning and scheduling fits in at ADASS - ADASS planning and scheduling posters and presentations.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
NSF Middleware Initiative Purpose To design, develop, deploy and support a set of reusable, expandable set of middleware functions and services that benefit.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Planetary Data System (PDS) Tom Morgan November 24, 2014.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
Working with Your Archive : Broadening Your User Community Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Toward High Breakthrough Collaboration (HBC) Susan Turnbull Program Manager Advanced Scientific Computing Research March 4, 2009.
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT.
June 23, 2016 Organizational Overview. 2 Automation Federation Background A fragmented community of automation professional associations and societies.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Future Data Architectures Big Data Workshop – April 2018
Unidata Policy Committee Meeting
Presentation transcript:

Earth Data Science Planning Meeting #1 February 20, 2013

Data Science: An Emerging Discipline for Analyzing Massive Data Data Science is the intersection between data analysis, statistics, computer science, software engineering, and discipline science for the purposes of learning from massive data Recent Activities GC&E Investment in CDX resulted in $12M+ of new business in data system technology Applied to key 8x science initiatives (IPCC, CO2 data record, etc) JPL leadership in the Earth System Grid JPL participated in the NRC Massive Data Analysis study Working with the NSF SAMSI on a massive data research study for earth science for Acquired new business from DOE, DARPA, NOAA, NSF, and NASA Plans & Expectations Establish a multi-year roadmap for earth data science research and technology for massive data Shift technology research from serving data products to providing online, analytic data services Hold a 3 rd IT for Climate Research Workshop [funded through ROSES] Work with HQ Program Managers (e.g., CMAC, AIST, etc) on formulating ROSES calls around data science Establish Lab for Earth Data Science that integrates infrastructure, tools, and methods Needs Align science and mission roadmaps with emerging data science approaches for massive data Support a FY13 advanced study to generate the roadmap and business plan for a JPL data science program

TaskProgramAwarded Coastline Marine DiscoveryNASA ACCESS$583K Likelihood-based Quantification of Agreement Between Climate Model Output and NASA Data Records NASA ESDRERR$1,300K Water Resource ManagementNASA ARRA$2,000K Virtual Oceanographic Data Center (VODC)NASA ACCESS$650K Facilitate integration of NASA and ESGNASA IPP$250K NASA NCA/RCMESNASA$1,200K Multivariate Data Fusion & Uncertainty Quantification for Remote SensingNASA AIST$1,500K RCMESNASA AIST$400K ESG/RCMES IntegrationNASA CMAC$650K Collaborative Climate Model & Observational Data ServicesNASA ACCESS$700K CO2 Virtual Science EnvironmentNASA OCO Mission $750K Development of the ESGDOE$250K NOAA Earth Science Grid IntegrationNOAA IAA$450K DARPA Big DataDARPA$5,000K Total$15,683K

Data Science Computing Stack Science Data Processing, Storage/Management of Scientific Data Data Access/Usage – Data Search, Query, Retrieval Tools/Services/Methods for Massive Earth Data Analysis Decision Support Tools and Applications competency Needed competency Data Science Gap Mission/ Instrument SDS’s, ESG, PO.DAAC, etc Computational and Data Services for Earth Data

Traditional NASA Earth Science Pipeline For JPL Internal Use Only Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution EOSDIS Data Centers Science Data Management Archive & Distribution Science Data Processing L0B L1 L2 L3 L4 Science Data Systems Instrument Operations EDOS/Groun d Data Systems L0A Processing Science Teams Outreach Research Mission Operation s TDRS Network On Board Processing

Addressing the Data Science Gap In a decade (…many say much sooner), serving scientific data to the community is not going to be sufficient –Distribution and volume of data will make this obsolete –Technology will allow for on-demand analysis leveraging enormous computational and data infrastructures –Science research will depend on these infrastructure for “services” Online analytic services will create a competitive advantage in the area of “data science” –The next generation of scientists are growing up with online services; they will go where the services are –Those with the service will be branded as the leaders –We should put in place the on-demand analytic capabilities for these measurements

Shifting To Online Data Analysis Paradigms For JPL Internal Use Only Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution Instrume nt Operation s EDOS/GD S L0A Processin g Science Data Processing L0B L1 L2 L3 L4 SDS EOSDIS DAAC Science Data Management Archive & Distribution EOSDIS Data Centers Science Data Management Archive & Distribution Science Data Processing L0B L1 L2 L3 L4 Science Data Systems Instrument Operations EDOS/Groun d Data Systems L0A Processing Science Teams Outreach Mission Operation s TDRS Network On Board Processing Applications Analysis, Modeling and Application Environments/Ga teways Decision Support Research

The Big Picture: Enabling Multi-Disciplinary Analysis through a Systematic Approach An opportunity to improve the efficiency of data analysis for the world-wide science community Generate Capture Analyze Generate Capture Analyze Observational Data Predictive Models, Understanding Earth Science Projects Other Disciplines Planetary Sciences Projects Radio Astronomy Projects Compare Science

Challenges: Moving Towards Exabyte Data Analysis for Earth Science Growing, distributed, massive record of observational and climate model output –CMIP3: ~34 Terabytes –CMIP5: ~3 Petabytes –CMIP6: 350 PBs – 3 Exabytes (per D. Williams and 2011 Climate Knowledge Discovery Workshop) A new paradigm is required to shift focus from data access and independent data analysis to online analysis services for highly distributed, heterogeneous data to Fuse data together for long-term records Compute higher order data products on request Analyze distributed data (e.g., climate model output, satellite data, etc) with distributed computation Establish a scalable computing infrastructure for missions and science projects

Example: Data challenge of CMIP3 archive vs. CMIP5 archive 9/5/12I. Williams, LLNL Climate SFA Review CMIP3 Modeling Centersvolume (GB) BCCRNorway862 CCCmaCanada2,071 CNRMFrance999 CSIROAustralia2,088 GFDLUSA3,843 GISSUSA1,097 IAPChina2,868 INGVItaly1,472 INMCM3Russia368 IPSLFrance998 MIROC3Japan3,975 MIUBGermany/Korea477 MPIGermany2,700 MRIJapan1,025 NCARUSA9,173 UKMOUK973 Totals34,989 (TB) Archive size: currently: 1.4 PB total: 3.1 PB by 2013 CMIP5 Modeling Centersvolume (TB) BCCChina51 CCCmaCanada51 CMCCEurope (Italy)158 CNRMFrance71 CSIROAustralia81 EC-EARTHEurope (Netherland) 97 GCESSChina24 INMRussia30 IPSLFrance121 LASGChina100 MIROCJapan350 MOHCUK195 MPIGermany166 MRIJapan269 NASAUSA375 NCARUSA739 NCCNorway32 NCEPUSA26 NIMR/KMAKorea14 NOAA GFDLUSA158 Totals3,108 (PB) Archive size: 35 TB CMIP5/CMIP3 = 10 2

ESGF: IPCC CMIP5 Data System Credit: D. Williams, LLNL Climate SFA Review

Emerging Technologies Big Data Analytics/Computation (Hadoop) Cloud Computing Virtual Systems Distributed Computing Data Mining Large-scale Data Management

Scaling the Analysis Data Acquisition and Command Instrume nt Operation s EDOS/GD S L0A Processin g Instrume nt Operation s EDOS/GD S L0A Processin g Instrument Operations EDOS/Groun d Data Systems L0A Processing Mission Operation s TDRS Network On Board Processing Network w/ Cloud Storage & Computation Applications Analysis, Modeling and Application Environments/Ga teways Other Data Systems (e.g. NOAA) Other Data Systems (e.g. NOAA) Other Data Systems (e.g. NOAA) Decision Support Science Data Processing Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Science Data Manage NASA Mission/Multi- Mission Data & Science Centers Research Science Teams

Recent Examples VISION ESGF provides support for online sharing of data, not for online analytic services CMIP5 via Earth System Grid Federation – International data sharing (including observations)

Data Fusion for Remote Sensing Data Fixed-Rank Filtering - Cressie, N., Shi, T., and Kang, E.L. (2010) Multiple process multiple source spatial/spatio-temporal data fusion - Nguyen, H., Cressie, N., and Braverman, A. (2012), and Nguyen, H., Katzfuss, M., Cressie, N., and Braverman, A. (2012)

Example Research Questions What architectural design produces the most efficient system topology for the types of data movement that will be required given scientific objectives? Can we study this as an optimization problem? How do we design computational methods that exploit the system topology and its distributed nature? Need algorithms that operate on distributed data to produce statistics of interest, or approximations. Study this trade-off. Data analysis choreography: how to assemble algorithms most efficiently given a set of analysis goals? How to optimize the movement of data? How can statistics and other disciplines (e.g., computer science) education be better aligned? Statisticians and computer scientists need to work together to plan how system architectures can enable analysis of highly distributed data.

Earth Data Science Study 8X retreat discussion Recommendation to form a working group to explore the opportunities –Focus initially on Earth Science, but open to other areas Convene a cross-disciplinary group

Study Objective (1) Evaluation of the business case of targeting “data science” as a technology growth area in earth data systems research Identification of near-term science questions/challenges to address Identification of Data Science vs. Big Data synergies and differences Development of a capabilities roadmap Current state of JPL vs competitors Required staffing needs and gaps

Study Objective (2) Key partnerships Necessary facilities support vs. current state Recommendations on how to structure a long-term program Identify opportunities to work NASA ESD Program and propose

Study Team Michael Gunson, Earth Science Duane Waliser, Earth Science Joe Lazio, Astronomy/Science Amy Braverman, Statistics and Data Science Becky Castano, Machine Learning/AI David Thompson, Machine Learning/AI Robert Granat, Machine Learning/AI Michael Turmon, Machine Learning/AI Liz Kay-Im, Data Systems Chris Mattmann, IT Data Systems Tom Soderstrom, OCIO IT Chief Technologist Jason Hyon, 8X Chief Technologist Dan Crichton, IT Data Systems Emily Law, IT Data Systems

A Few Plans ESTO White Paper on Data Science for Earth Science 3 rd IT for Climate Research Workshop ACCESS proposals due in June AIST proposals in 2014 (bigger target) We’ve proposed to ESTO that they should fund an architecture study to address data science

Discussion

Future Meetings