Download presentation
Presentation is loading. Please wait.
Published byMeghan James Modified over 9 years ago
1
1 Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt On behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009
2
2 Agenda 1.Introduction and Overview 2.Overall Architecture Design 3.Gateway 4.Data Node Break 5.Accomplishments 6.Collaborations and Partnerships 7.Recap of Morning Presentations Lunch 8.Research and Development Break 9.Demonstration 10.Future Work 11.Summary Review folder http://esg-pcmdi.llnl.gov/review- folder Review presentations http://esg-pcmdi.llnl.gov/review- folder/presentations
3
3 A Brief History: ESG-I, 2000-2001 The emerging challenge of climate data Proposal to DOE’s Next Generation Internet (NGI) program in March 1999 ANL, LANL, LBNL, LLNL, NCAR, USC/ISI Data movement and replication Prototype climate “data browser” “Hottest Infrastructure” award at SC2000 NGI cut short, follow-on funding from OBER & MICS Ideas on the table, partnerships, experience Minimal end-user deployment or use Began development of SciDAC proposal
4
4 A Brief History: ESG-II, 2001-2006 SciDAC Program announced, began proposal in 2000 ANL, LANL, LBNL, LLNL, NCAR, ORNL, USC/ISI “Turning Climate Datasets into Community Resources” New focus on web-based portals, metadata, seamless access to archival storage, security, operational service Uncertain about size of audience, hoping for 100-200 Very positive mid-term assessment in 2003 PCMDI accepted WGCM/CMIP role in 2004 Operational CCSM portal in 2004 Operational IPCC/CMIP portal later in 2004 In 2006, 200 TB of data, 4000 users, 130TB served
5
5 Purpose and Scope Purpose Provide climate researchers worldwide with access to data, information, models, analysis tools, and computational resources required to make sense of enormous climate simulation datasets Scope Petabyte-scale data volumes Gateway to climate change data products, model outputs and informational sites (i.e., globally federated sites) Comprehensive registry of climate change Earth Science research results and components Support climate change and its partner scientists, analysts, data managers, educators and decision makers Resource to national and international science and societal benefit initiatives Resource to climate change data products through interoperable web service and climate analysis tools
6
6 Objectives Meet specific distributed database, data access, and data movement needs of national and international climate projects Provide a universal and secure web-based data access portal for broad multi-model data collections Provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies. Develop Grid technology that enhances data accessibility and usability Make newly developed tools and technologies available for use in other domains
7
7 Project Participants and Focus Areas
8
8 Project Team ANL Rachana Ananthakrishnan Ian Foster Neill Miller Frank Siebenlist LBNL Junmin Gu Vijaya Natarajan Arie Shoshani Alex Sim LLNL Robert Drach Dean N. Williams LANL Phil Jones NCAR David Brown Julien Chastang Luca Cinquini Peter Fox Danielle Harper Nathan Hook NCAR (cont.) Don Middleton Eric Nienhouse Gary Strand Patrick West Hannah Wilcox Nathaniel Wilhelmi Stephan Zednik PMEL Steve Hankin Roland Schweitzer ORNL David Bernholdt Meili Chen Jens Schwidder Sudharshan Vazhkudai USC/ISI S. Bharathi Ann Chervenak Robert Schuler Mei-Hui Su Key Institutional PI Project Co-PI Project Lead PI Executive Committee
9
9 Project Organization
10
10 Concept Overview Workstation Applications, Thick Clients Standard Browser, Web Services
11
11 Capabilities, Usage, and Impact Capabilities “Virtual Datasets” created through subsetting and aggregation Metadata-based search and discovery Bulk data access Web-based access Usage Archive Facts NCAR Gateway Data holdings: 198 TB Registered users: 13,000+ Data Downloaded:100 TB http://www.earthsystemgrid.org http://www.earthsystemgrid.org PCMDI/LLNL CMIP3 Gateway Data holdings: 35 TB Registered users: 3,000+ Data Downloaded:600+ TB http://www-pcmdi.llnl.gov http://www-pcmdi.llnl.gov Over 500 sites worldwide Over 500 scientific papers published based CMIP3 data Average downloads: 400 to 600 GB/day
12
12 Data Integration Challenges Facing Climate Science Modeling groups will generate more data in the near future than exist today Large part of research consists of writing programs to analyze data How best to collect, distribute, and find data on a much larger scale? At each stage tools could be developed to improve efficiency Substantially more ambitious community modeling projects (Petabyte (PB 10 15 ) and Exabyte (EB 10 18 )) will require a distributed database Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. ) How to make information understandable to end-users so that they can interpret the data correctly More users than just Working Group (WG) 1-science. (WG2-impacts and WG3- mitigation) (Policy makers, economists, health officials, etc.) Integration of multiple analysis tools, formats, data from unknown sources Trust and security on a global scale (not just an agency or country, but worldwide )
13
13 Complexity of Data Distribution Future coupled runs will produce much larger data sets Storage and retrieval needs new thinking Additional quality assurance data and software Tools to facilitate publication and cataloging of output Publication - the act of putting data in the database and making it visible to others Cataloging - describes information about where a data set, file or database entity is located Automated updating of output availability/status pages Automated notification to users with updates tailored to their interests (new, withdrawn, replaced data) Sophisticated discovery capabilities Common data transfer tasks can be automated
14
14 It’s All About the Data Data publication Data access Data viewing Data sharing Data versioning Data replication Data products Data delivery Standards and interoperability
15
15 Strategic Challenges for ESG-CET Sustain and build upon the existing ESG archives Address future scientific needs for data management and analysis by extending support for sharing and diagnosing climate simulation data Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists contributing to the IPCC Fifth Assessment Report (AR5) in 2010 SciDAC II: A Scalable and Extensible Earth System Model for Climate Change Science The Climate Science Computational End Station (CCES) The North American Regional Climate Change Assessment Program (NARCCAP) Other wide-ranging climate model evaluation activities How to make information understandable to end-users so that they can interpret the data correctly Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …) Integrating analysis into a distributed environment Providing climate diagnostics Delivering climate component software to the community
16
16 CMIP5 (IPCC AR5) is a Major Driver for ESG Development CMIP5 multi-model archive expected to include 3 suites of experiments (“Near-Term” decadal prediction, “Long-Term century & longer), and “Atmosphere-Only”) 40+ models 600+ TB “core” data, 6+ PB total data Contributed by 25+ modeling centers in 17+ countries Driver for scale of data, global distribution Timeline fixed by IPCC Already working with key international partners to establish testbed Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.) National Center for Atmospheric Research - NCAR (U.S.) Oak Ridge National Laboratory – ORNL (U.S.) Geophysical Fluid Dynamics Laboratory - GFDL (U.S.) British Atmosphere Data Centre - BADC (U.K.) Max Planck Institute for Meteorology - MPIM (Germany) JAMSTEC and University of Tokyo Center for Climate System Research (Japan)
17
17 ESG-CET AR5 Timeline 2008: Design and implement core functionality: Browse and search Registration Single sign-on / security Publication Distributed metadata Server-side processing Early 2009: Testbed Plan to include at least seven centers in the US, Europe, and Japan 2009: Deal with system integration issues, develop production system 2010: Modeling centers publish data 2011-2012: Research and journal articles submissions 2013: IPCC AR5 Assessment Report
18
18 Key: - Relying on ESG to reach their goals are highlighted in “italic blue” - Relying on ESG to develop tools and technologies are highlighted in “italic red” - Relying on ESG to deliver their products to the climate science community are in “italic green” ESG-CET Collaborates Extensively Leverage best-in-class tools and capabilities developed elsewhere Increase outreach, ability to serve scientific community, impact Joint development of new ideas, technologies of common interest
19
19 Accomplishments: Development Gateway web application (new) Data Node components integration (new publishing client integrated with existing TDS and LAS servers, and with Gateway) Security architecture for federation across Gateways and partner Data Centers OpenID for web SSO MyProxy integration for rich client access Web Services for user attributes retrieval Architecture for metadata exchange among Gateways and partner Data Centers (based on OAI-PMH) BeStMan middleware for deep storage files retrieval (new) Handling and access of detailed model metadata (in collaboration with Earth System Curator) Two major accomplishments are the Gateway and the Data Node which form the main components of the ESG-CET architecture.
20
20 Accomplishments: Operational Sustained data deliver from 2004 – present from three ESG data portals Register over 16,000 users worldwide Over 700 TB downloaded (coming up on 1 PB milestone) Reached milestone of 500 scientific research papers published based on CMIP3 Added C-LAMP, NARCCAP, and CFMIP to the distributed archive
21
21 Future Plans Short-term: Packaging and documentation of Gateway software Packaging and documentation of the Data Node software Integration with Data Mover Lite (DML) Federation with partner data centers Longer-term: Gateway customization Expanded visualization services Gateway and Data Node invoking more of the LAS functionality GIS services Google Earth services Remote query services for rich client access User and Group workspaces Server-side processing and analysis services
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.