Earth System Grid: Model Data Distribution & Server-Side Analysis to Enable Intercomparison Projects PCMDI Software Team UCRL-PRES-226277.

Slides:



Advertisements
Similar presentations
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
Advertisements

A. Sim, CRD, L B N L 1 ANI and Magellan Launch, Nov. 18, 2009 Climate 100: Scaling the Earth System Grid to 100Gbps Networks Alex Sim, CRD, LBNL Dean N.
Earth System Curator Spanning the Gap Between Models and Datasets.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Information Technology for Ocean Observations and Climate Research TYKKI Workshop, December 9-11, 1998, Tokyo, Japan Nancy N. Soreide NOAA Pacific Marine.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
Tools for accessing distributed in-situ data collections Donald W. Denbo, NOAA/PMEL-JISAO Jason E. Fabritz, NOAA/PMEL-JISAO Bernard J. Kilonsky, Sea Level.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
AIRNow-International The future of the United States real-time air quality reporting and forecasting program and GEOSS participation John E. White U.S.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
A Vision Fulfilled Transformative Science, Data Infrastructures and the IPCC Experience Lawrence Buja National Center for Atmospheric Research Boulder,
Geospatial Systems Architecture Todd Bacastow. GIS Evolution
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
AR5 Data and Product Access Architecture Concepts for Discussion Steve Hankin (NOAA/PMEL) (Not including metadata architecture or security)
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
Fisheries Oceanography Collaboration Software Donald Denbo NOAA/PMEL-UW/JISAO Presented by Nancy Soreide NOAA/PMEL AMS 2002/IIPS 10.3.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
The Earth System Grid: A Visualisation Solution Gary Strand.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
ESG Observational Data Integration Presented by Feiyi Wang Technology Integration Group National Center of Computational Sciences.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Product-Generation in ESG: some explorations of the user experience and discussion of implications for the design of ESG Steve Hankin & Roland Schweitzer.
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Working with masks. Example 1 – getting and applying a mask # Let us open a data file that contains surface type # (land fraction) data and extract data.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
1 Earth System Grid Center for Enabling Technologies (ESG-CET) Overview ESG-CET Team Climate change is not only a scientific challenge of the first order.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
ESMF and the future of end-to-end modeling Sylvia Murphy National Center for Atmospheric Research
Application of RDF-OWL in the ESG Ontology Sylvia Murphy: Julien Chastang: Luca Cinquini:
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
1 Alison Pamment, 2 Calum Byrom, 1 Bryan Lawrence, 3 Roy Lowry 1 NCAS/BADC,Science and Technology Facilities Council, 2 Tessella plc, 3 British Oceanogrphic.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech 5 th GO-ESSP Community Meeting.
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Metadata Support for Model Intercomparison Projects Sylvia Murphy: Cecelia DeLuca: Julien.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Data Management Components for a Research Data Archive
Presentation transcript:

Earth System Grid: Model Data Distribution & Server-Side Analysis to Enable Intercomparison Projects PCMDI Software Team UCRL-PRES

2 Williams_ESG Challenges facing ESG-CET  Building on the very successful CMIP3 IPCC AR4 ESG data portal.  How best to collect and distribute data on a much larger scale?  At each stage tools could be developed to improve efficiency  Substantially more ambitious community modeling projects (>~300 TBs) will require a distributed database  Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.)  How to make information understandable to end-users so that they can interpret the data correctly  More users from WGI. (Possibly WGII and WGIII?)  Client and Server-side analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …)  Testbed needed by late 2008 – early 2009

3 Williams_ESG 200 scientific papers published to date based on analysis of CMIP3 IPCC AR4 data Downloads to date  123 TB  543,500 files  300 GB/day (average) To support the infrastructural needs of the national and international climate community, ESG is providing crucial technology to securely access, monitor, catalog, transport, and distribute data in today’s Grid computing environment. 818 registered users 28 TB of data at the PCMDI site location  68,400 files  Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change  Model data from 11 countries CMIP3 IPCC AR4 ESG Portal ESG Objective ESG facts and figures Nov 2004 – Oct 2006 IPCC Downloads (10/12/06) Worldwide ESG user base

4 Williams_ESG Providing climate scientists with virtual proximity to large simulation results needed for their research Very large distributed data archives  Easy federation of sites  Across the US and around the world “Virtual Datasets” created through subsetting and aggregation Metadata-based search and discovery Web-based and analysis tool access Increased flexibility and robustness Server-side analysis ESG Goal Current ESG Sites

5 Williams_ESG Evolving ESG for the future ESG Data System Evolution Central database Centralized curated data archive Time aggregation Distribution by file transport No ESG responsibility for analysis Shopping-cart-oriented web portal ESG connection to desktop analysis tools (i.e., CDAT and CDAT-LAS) 2006 Testbed data sharing Federated metadata Federated portals Unified user interface Quick look server-side analysis with CDAT Location independence Distributed aggregation Manual data sharing Manual publishing Early 2009 Full data sharing (add to testbed…) Synchronized federation  metadata, data Full suite of server-side analysis with CDAT Model/observation integration ESG embedded into desktop productivity tools with CDAT GIS integration Model intercomparison metrics User support, life cycle maintenance 2011 CCSM, AR5, satellite, In situ biogeochemistry, ecosystems ESG Data Archive TerabytesPetabytes CCSM AR4

6 Williams_ESG The growing importance of climate simulation data standards  Global Organization for Earth System Science Portal (GO-ESSP)  International collaboration to develop new generation of software infrastructure  Access to observed and simulated data from climate and weather communities  Working closely together using agreed upon standards  Last Annual meeting held at PCMDI  NetCDF Climate and Forecast (CF) Metadata Convention standards  Specify syntax and vocabulary for climate and forecast metadata  Promotes the processing and sharing of data  The use of CF was essential for the success of the IPCC data dissemination

7 Williams_ESG  Develop further fundamental tools (such as Climate Model Output Rewriter - CMOR)  Develop staggered and unstructured grids  Deliver netCDF data into Geographical Information Systems (GIS)  Upgrade to netCDF-4  Include in situ observations Supporting CF and CMOR CF/CMOR Development New CF website developed by PCMDI repository  News  Documents CF Conventions CF Standard Name table Conformance  Requirements & Recommendations  CF Compliance Checker Mailing List  Archives New CF website Future issues for CF

8 Williams_ESG Architecture of the next-generation ESG-CET  Huge data archives  Broader geographical distribution of archives  across the United States  around the world  Easy federation of sites  Increased flexibility and robustness data user AR5 ESG Gateway (PCMDI) user registration security services monitoring services metadata services notification services services startup/shutdown OPeNDAP/OLFS (aggregation) product server publishing (harvester) storage management backend analysis and vis engine workflow metrics services replica location services replica management ESG Node (GFDL) access control HTTP/FTP/GFTP servers metrics services publishing (extraction) OPeNDAP/OLFSOPeNDAP/BS backend analysis and vis engine monitoring info provider storage management disk cache deep archive data online data ESG node ESG Gateway (CCES) centralized metrics services ESG node ESG Gateway (CCSM) centralized security services Analysis Tool browser Analysis Tool browser data provider service API service implementation client ? ARS mandatory component FEDERATION JOIN ESG-CET architecture

Climate Data Analysis Tools: Software for Distributed Model Diagnosis & Intercomparison Research PCMDI Software Team UCRL-PRES

10 Williams_ESG Challenges facing CDAT  Integrating CDAT into a distributed environment  Providing climate diagnostics  Delivering climate component software to the community  Working with other forms of climate Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.)  Testbed needed by late 2008 – early 2009

11 Williams_ESG CDAT objectives Seamless mechanisms for climate information exploration and analysis. CDAT Objectives

12 Williams_ESG Enabling data management, data analysis, and visualization for intercomparison research Address the challenges of enabling data management, discovery, access, and advanced data analysis for climate model diagnosis and intercomparison research. CDAT Goal Typical usage examples of CDAT  Calculate a long-term average  Define wind-speed from u- and v-components  Subset a dataset, selecting a spatiotemporal region  Aggregate 1000s of files into a small XML file  Generate a Hovmoller plot  CDAT IS Python!  Designed for climate science data  Scriptable  Open-source and free What is CDAT?

13 Williams_ESG Evolving CDAT into an integrated client technology workplace CDAT Integrated Analysis Evolution 2006 Testbed distributed analysis Equal-access to shared resources (Web/Grid services) Quick look server-side analysis tool for ESG Diagnostics specific to AR5 GFDL Ncvtk 3D visualization Web-CDAT: discover, learn, and browse via web browser Serving Google Maps and Google Earth Data with CDAT Early CDMS, Numeric, Genutil, Cdutil, Ncvtk, VACET, Diagnostics, ESG CDAT Core Modules StandaloneDistributed CDMS Numeric / MV Genutil / Cdutil VCS CDMS Numeric / MV Genutil / Cdutil VCS Community software Python based Start to finish environment Diverse analysis tools Languages: C/C++, Java, FORTRAN, Python Platforms: Unix, Mac, Windows VCDAT: discover, learn, and browse with a few clicks Connection to ESG Full analysis sharing Full suite server-side analysis tool for ESG ESG embedded into desktop productivity tools (i.e., CDAT) GIS integration with CDAT SciDAC VACET analysis and visualization collaboration Global Organization for Earth System Science Portal (GO- ESSP) Remote generic apps for ESG

14 Williams_ESG Data aggregation: collections of files/datasets are treated as single entities. Aspects of aggregation:  combining/merging variables,  joining variables,  new coordinate axes,  overlaying/adding metadata,  nesting datasets PCMDI CDAT supports aggregations via the cdscan utility that uses XML representation  cdscan will analyse the archive for:  variable information  axis information  global (universal) metadata Why use cdscan  Large datasets described as a grouped entity.  No need to know underlying data format.  No need to know file-names.  Datasets can be sliced in any way the user chooses using logical spatio-temporal selectors rather than loops of programming code.  You can use it to improve the metadata of your data files… cdscan in action  $ cdscan –x monthly_means.xml./*.nc MVCDSCAN CDAT examples >>> import cdms, MV >>> f_surface = cdms.open('sftlf_ta.nc') >>> surf = f_surface('sftlf') # Designate land where "surf" has values # not equal to 100 >>> land_only = MV.masked_not_equal(surf, 100.) >>> land_mask = MV.getmask(land_only) # Now extract a variable from another file >>> f = cdms.open('ta_ nc') >>> ta = f('ta') # Apply this mask to retain only land values. >>> ta_land = cdms.createVariable(ta, mask=land_mask, copy=0, id='ta_land') >>> import cdms, MV >>> f_surface = cdms.open('sftlf_ta.nc') >>> surf = f_surface('sftlf') # Designate land where "surf" has values # not equal to 100 >>> land_only = MV.masked_not_equal(surf, 100.) >>> land_mask = MV.getmask(land_only) # Now extract a variable from another file >>> f = cdms.open('ta_ nc') >>> ta = f('ta') # Apply this mask to retain only land values. >>> ta_land = cdms.createVariable(ta, mask=land_mask, copy=0, id='ta_land')

15 Williams_ESG CDAT examples Regridder Ncvtk #!/usr/local/cdat/bin/python import cdms from regrid import Regridder f = cdms.open('temp.nc') t= f.variables['t'] ingrid = t.getGrid() outgrid = cdms.createUniformGrid( -90.0, 46, 4.0, 0.0, 72, 5.0) regridFunc = Regridder(ingrid, outgrid) newt = regridFunc(t) import vcs vcs.init().plot(t) vcs.init().plot(newt) Collaboration: CDAT developers are currently working with Ncvtk developers to make Ncvtk 3D graphics accessible to the CDAT community. Ncvtk is a collection of commonly used 3D visualization methods applied to data on structured lat/lon grids.

16 Williams_ESG CDAT facts and figures Some CDAT development centers:  British Atmospheric Data Center  LBNL  GFDL  Laboratory of Science of Climate and the Environment (LSCE), FR  PCMDI  University of Chicago  University of Hawaii  University of Reading, UK CDAT Collaborations CDAT Core Modules  Over 120 mailing list registers  Probably 7 to 10 times more casual users  Mailing list archive: over 1,000 message (~30 per month)  912 Downloads since May 19, 2006 for version 4.1  Improved Documentation CDAT Users

17 Williams_ESG Simple intercomparison use case scenario  Browse PCMDI’s centralized database  Download data  Organize data on local site  Regrid data at local site  Perform diagnostics  Produces results  Search, browse and discover distributed data  Remote site  Request data  Regrids  Diagnostics  ESG returns results Future Scenario Current Scenario