National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Future Directions and Initiatives in the Use of Remote Sensing for Water Quality.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Challenges of Analyzing.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
May 17, Capabilities Description of a Rapid Prototyping Capability for Earth-Sun System Sciences RPC Project Team Mississippi State University.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, January 13, 2009.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
NASA World Wind. What is NASA World Wind? A richly interactive 3D planetary visualization tool. Smart client architecture. Portal for NASA data. Integrates.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann USC Center for Software Engineering Annual Research Review.
2004 International Telemetering Conference20 October CCSDS FILE DELIVERY PROTOCOL INTER-IMPLEMENTATION TESTING FINAL REPORT TESTING OF A DTN PROTOCOL.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
05 December, 2002HDF & HDF-EOS Workshop VI1 SEEDS Standards Process Richard Ullman SEEDS Standards Formulation Team Lead
The Natural Resources Digital Library Needs, Partners, and Challenges Bonnie Avery, Janine Salwasser, & Janet Webster Oregon State University.
Interoperability ERRA System.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
C. Mattmann 1, C. Goodale 1, J. Kim 2, D.E. Waliser 1,2, D. Crichton 1, A. Hart 1, P. Zimdars 1 and Peter Lean* The International Workshop on CORDEX-East.
Unidata TDS Workshop TDS Overview – Part I XX-XX October 2014.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Earth Data Science Planning Meeting #2 March 7, 2013.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Michelle Viotti, Manager,
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
Data for Model Evaluations Karl E. Taylor Program for Climate Model Diagnosis and Intercomparison (PCMDI) Presented to the Fourth WCRP Observation and.
Intelligent Distributed Spacecraft Infrastructure Earth Science Vision Session IGARSS 2002 Toronto, CA June 25, Needs for an Intelligent Distributed.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
2 HS3 Science Team Meeting - BWI - October 19-20, 2010HAMSR/Lambrigtsen HAMSR Status Update (2015) Bjorn Lambrigtsen (HAMSR PI) Shannon Brown (HAMSR Task.
1 Earth Science Technology Office The Earth Science (ES) Vision: An intelligent Web of Sensors IGARSS 2002 Paper 02_06_08:20 Eduardo Torres-Martinez –
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Figure 3. Overview of system architecture for RCMES. A Regional Climate Model Evaluation System based on Satellite and other Observations Peter Lean 1.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
Function BIRN The ability to find a subject who may have participated in multiple experiments and had multiple assessments done is a critical component.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
BG 5+6 How do we get to the Ideal World? Tuesday afternoon What gaps, challenges, obstacles prevent us from attaining the vision now? What new research.
Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science.
National Aeronautics and Space Administration Jet Propulsion Laboratory March 17, 2009 Workflow Orchestration: Conducting Science Efficiently on the Grid.
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION ESDS Reuse Working Group Earth Science Data Systems Reuse Working Group Case Study: SHAirED Services for.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Joseph JaJa, Mike Smorul, and Sangchul Song
Architecting Scientific Data Systems in the 21st Century
Future Data Architectures Big Data Workshop – April 2018
Google Sky.
Data Management Components for a Research Data Archive
Presentation transcript:

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed Climate Modeling Research and Analysis via the Climate Data eXchange 18 September 2008 GO ESSP 2008 Workshop Dan Crichton Chris Mattmann Amy Braverman

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California NASA’s Satellite Data and Climate Research Two major legacies from NASA’s Earth Observing System Data and Information System (EOSDIS) –Archiving of explosion in observational data in Distributed Active Archive Centers (DAACs) Request-driven retrieval from archive is time consuming –Adoption of Hierarchical Data Format (HDF) for data files Defined by and unique to each instrument but not necessarily consistent between instruments What are the next steps to accelerating use of an ever increasing observational data collection? –What data are available? –What is the information content? –How should it be interpreted in climate modeling research?

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EOSDIS DAAC’s Earth Observing System Data and Information System Distributed Active Archive Centers

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Data Processing Levels

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EOSDIS DAAC’s Earth Observing System Data and Information System Distributed Active Archive Centers

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Researcher’s Challenge Scientists cannot easily locate, access, or manipulate observational data or model output necessary to support climate research –The latest data are available from independent instrument project data systems. –Scientists may not even be aware of what repositories or data exist –Observational data and model output data are heterogeneous in form and cannot be simply compared or combined. Research data systems are often ad-hoc –They lack a modular approach limiting extensibility –They are designed individually rather than as a system –There are few capabilities in common between systems They require “human-in-the-loop” –Web forms, manual ftp transfer –Rectification left to individual scientists

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Current Data System System serves static data products. User must find move, and manipulate all data him/herself. User must change spatial and temporal resolutions to match. User must understand instrument observation strategies and subtleties to interpret.

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California 8 Pre-Oct 2002, no unified view across distributed operational planetary science data repositories –Science data distributed across the country –Science data distributed on physical media Planetary data archive increasing from 4 TBs in 2001 to 100+ TBs in 2008 –Traditional distribution infeasible due to cost and system constraints –Mars Odyssey could not be distributed using traditional method PDS now has a distributed, federated framework in place –Support online distribution of science data to planetary scientists –Enable interoperability between nine institutions –Support real-time access to distributed catalogs and repositories –Uniform software interfaces to all PDS data holdings scientists and developers to link in their own tools –Moving towards international standardization with the International Planetary Data Alliance –Operational October 1, Mars Odyssey PDS Federation Experience in Planetary Science: NASA’s PDS

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Experience in science information systems has lead to interagency agreements with both NIH and NCI Provided the NCI with a bioinformatics infrastructure for establishing a virtual knowledge system –Currently deployed at 15 of 31 NCI Research Institutions for the Early Detection Research Network (EDRN) –Providing real-time access to distributed, heterogeneous databases –Capturing validation study results, instrument results images, biomarkers, protocols, etc –Funded for NCI’s Early Detection Research Network Currently working with a new initiative in establishing an “informatics plan” for the Clinical Proteomics Technology Initiative Cancer Biomarkers Group Division of Cancer Prevention Experience in Cancer Research: NCI’s EDRN

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California CDX What: build open source software to -- connect existing systems into a virtual network (big disk), -- push as much computation as possible into remote nodes to minimize movement of data, -- operators to rectify and fuse heterogeneous data sets, provide uncertainties. Why: scientists need command line access to data sets (model output and observations) such that all data look local and rectified. How: use technologies in new ways -- distributed computing technologies already in place at JPL (OODT, others); Earth System Grid for parallel transfer, -- rigorous mathematical/statistical methods for interpolation, transformation, fusion, and comparisons. Comparisons require new methods developed specifically for massive, distributed data sets. Uncertainties are key. Why is this different: -- system will capture intellectual capital of instrument scientists and modelers through multiple, flexible operators, -- NOT trying to be all things to all people!

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Climate Data eXchange Research Flow

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Climate Data eXchange Architecture 12

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Conclusions CDX is a paradigm shift in data access/delivery/analysis systems –Data analysis should not be decoupled from access and delivery –Should support interactive analysis Distributed computing (e.g. web services) architecture is key –Support remote query, access, and computation –Not tied to any particular implementation –ESG is a success story for access and delivery –Partnership between JPL and LLNL to extend success to interactive, distributed data analysis JPL will develop, deploy and test V1.0 of CDX over next 18 months –Funded NASA support to construct JPL ESG data node –Critical components proposed for internal support at JPL to enable model evaluations, validation, and projections Feedback, suggestions, and collaborations welcome on path forward

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Backup 14

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Climate Research Use Case What is radiative effect of the vertical distribution of water vapor in the atmosphere under clear-sky conditions? –Warming by water vapor back to the surface could lead to increased evaporation and accelerate (positive feedback) the “greenhouse effect” Investigation and validation of climate model representations of water vapor distributions can be made by comparison to both AIRS and MLS measurements of water vapor –AIRS provides water vapor measurements up to 200 mb (15km) –MLS provides water vapor measurements from 300 mb to 100 mb (8km to 18km) –AIRS and MLS sample different states: each is capable of measuring vapor in clear scenes, but under cloudy conditions they have different biases. –Need to combine these data to get the full picture.

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Combining Instrument Data to enable Climate Research: AIRS and MLS Combining AIRS and MLS requires: –Rectifying horizontal, vertical and temporal mismatch –Assessing and correcting for the instruments’ scene- specific error characteristics (see left diagram)