NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11,

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

NCAR The Earth System Grid (ESG) & The Community Data Portal (CDP) (NCARs Data & GriD Efforts) for COMMISSION FOR BASIC SYSTEMS INFORMATION SYSTEMS and.
NCAR Cyberinfrastructure for Earth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
A. Sim, CRD, L B N L 1 ANI and Magellan Launch, Nov. 18, 2009 Climate 100: Scaling the Earth System Grid to 100Gbps Networks Alex Sim, CRD, LBNL Dean N.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
DOE Global Modeling Strategic Goals Anjuli Bamzai Program Manager Climate Change Prediction Program DOE/OBER/Climate Change Res Div
Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
NCAR/SCD/VETS The NCAR Community Data Portal
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Cyberinfrastructure for Rapid Prototyping Capability Tomasz Haupt, Anand Kalyanasundaram, Igor Zhuk, Vamsi Goli Mississippi State University GeoResouces.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
CCSM Portal/ESG/ESGC Integration (a PY5 GIG project) Lan Zhao, Carol X. Song Rosen Center for Advanced Computing Purdue University With contributions by:
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
ESG The Earth System Grid (ESG) Presented by Don Middleton & Luca Cinquini NCAR Scientific Computing Division On Behalf of the ESG Team SCD Executive Committee.
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Intergrid KoM Santander 22 june, 2006 E-Infraestructure shared between Europe and Latin America José Manuel Gutiérrez
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
The Earth System Grid: A Visualisation Solution Gary Strand.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES
- Vendredi 27 mars PRODIGUER un nœud de distribution des données CMIP5 GIEC/IPCC Sébastien Denvil Pôle de Modélisation, IPSL.
Fox 2 AISRP April 4-6, 2005  Earth System Grid  Grid-enabled OPeNDAP  Architecture - Server and Application access  Framework experience.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
ESG Observational Data Integration Presented by Feiyi Wang Technology Integration Group National Center of Computational Sciences.
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Access Control for NCAR Data Portals A report on work in progress about the future of the NCAR Community Data Portal Luca Cinquini GO-ESSP Workshop, 6-8.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
2. WP9 – Earth Observation Applications ESA DataGrid Review Frascati, 10 June Welcome and introduction (15m) 2.WP9 – Earth Observation Applications.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
SCD User Briefing The Community Data Portal and the Earth System Grid Don Middleton with presentation material developed by Luca Cinquini, Mary Haley,
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
The NOAA Operational Model Archive and Distribution System NOMADS CEOS-Grid Application Status Report Glenn K. Rutledge NOAA NCDC CEOS WGISS-19 Cordoba,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
The Earth System Grid: A Visualisation Solution
improve the efficiency, collaborative potential, and
Data Requirements for Climate and Carbon Research
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Metadata Development in the Earth System Curator
Data Management Components for a Research Data Archive
Presentation transcript:

NCAR NCAR Data and Grid Efforts: The Earth System Grid & The Community Data Portal Don Middleton NCAR Scientific Computing Division CAS2003 September 11, 2003

NCAR The Earth System Grid U.S. DOE SciDAC funded R&D effort - a “ Collaboratory Pilot Project” U.S. DOE SciDAC funded R&D effort - a “ Collaboratory Pilot Project” Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data l Build upon Globus Toolkit  and DataGrid technologies and deploy l Potential broad application to other areas

NCAR ESG Team l ANL –Ian Foster (PI) –Veronika Nefedova –(John Bresenhan) –(Bill Allcock) l LBNL –Arie Shoshani –Alex Sim l ORNL –David Bernholdte –Kasidit Chanchio –Line Pouchard l LLNL/PCMDI –Bob Drach –Dean Williams (PI) l USC/ISI –Anne Chervenak –Carl Kesselman –(Laura Perlman) l NCAR –David Brown –Luca Cinquini –Peter Fox –Jose Garcia –Don Middleton (PI) –Gary Strand

NCAR

Baseline Numbers l T42 CCSM (current, 280km) –7.5GB/yr, 100 years ->.75TB l T85 CCSM (140km) –29GB/yr, 100 years -> 2.9TB l T170 CCSM (70km) –110GB/yr, 100 years -> 11TB

NCAR Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data l Current T42 CCSM –7.5GB/yr, 100 years ->.75TB * 10 = 7.5TB

NCAR Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170 Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data CCM3 at T170 (70km)

NCAR Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2-3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data

NCAR Model Improvement Wishlist Grand Total: Increase compute by a Factor O( )

NCAR Longer-term Missions - Observation of Key Earth System Interactions Terra Aura Aqua Landsat 7 Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies GRACE PICASSO Cloudsat QuikScat EO-1 ICEsatJason-1 SRTM VCL We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Triana Courtesy of Tim Killeen, NCAR

NCAR ESG Scenario l End 2002: 1.2 million files comprising ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDI l End 2007: As much as 3 PB (3,000 TB) of data (!) l Current practice is already broken – the future will be even worse if something isn’t done…

NCAR ESG Scenario (cont.) l Data –Different formats are converted to netCDF –netCDF is not standardized to the CF model –Different sites require knowledge of different methods of access l Metadata –Most kept in online files separate from data and unsearchable unless one is “in the know” –Some kept in people’s brains l Access control –Manual –Not formalized l Data requests –Beginnings of a formal process (e.g., the PCMDI model) –Beginnings of web portals –Far too much done by hand –Logging nearly non-existent

NCAR ESG: Challenges l Enabling the simulation and data management team l Enabling the core research community in analyzing and visualizing results l Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

NCAR ESG: Strategies l Move data a minimal amount, keep it close to computational point of origin when possible –Data access protocols, distributed analysis l When we must move data, do it fast and with a minimum amount of human intervention –Storage Resource Management, fast networks l Keep track of what we have, particularly what’s on deep storage –Metadata and Replica Catalogs l Harness a federation of sites, web portals –Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

NCAR Server Tera/Peta-scale Archive HRM Tools for reliable staging, transport, and replication Server Tera/Peta-scale Archive HRM Client Selection Control Monitoring HRM Storage/Data Management

NCAR HRM aka “DataMover” l Running well across DOE/HPSS systems l New component built that abstracts NCAR Mass Storage System l Defining next generation of requirements with climate production group l First “real” usage “The bottom line is that it now works fines and is over 100 times faster than what I was doing before. As important as two orders of magnitude increase in throughput is, more importantly I can see a path that will essentially reduce my own time spent on file transfers to zero in the development of the climate model database” – Mike Wehner, LBNL

NCAR OPeNDAP An Open Source Project for a Network Data Access Protocol (originally DODS, the Distributed Oceanographic Data System)

NCAR OPeNDAP-g -Transparency -Performance -Security -Authorization -(Processing) Typical Application Data (local) netCDF lib Application Data (remote) OPeNDAP Client Application OPeNDAP Via http Big Data (remote) ESG client Application ESG + DODS OpenDAP Server ESG Server Distributed Application data Distributed Data Access Services OPeNDAP Via Grid

NCAR l For XML encoding of metadata (and data) of any generic netCDF file l Objects: netCDF, dimension, variable, attribute l Beta version reference implementation as Java Library ( ESG: NcML Core Schema netCDF nc:netCDFType nc:dimension nc:variable nc: attribute nc:values nc:VariableType

NCAR Object [1] id Object [1] id Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= isA Investigation isA Project [0,n] topic type= [0,1] funding Project [0,n] topic type= [0,1] funding isA Ensemble Campaign isPartOf Simulation [0,n] simulationInput type= [0,n] simulationHardware Simulation [0,n] simulationInput type= [0,n] simulationHardware Observation Experiment Analysis isPartOf hasParent hasChild hasSibling Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage isA generated By isPart Of Person [0,1] firstName [0,1] lastName [0,1] contact Person [0,1] firstName [0,1] lastName [0,1] contact Institution [0,1] name [0,1] type [0,1] contact Institution [0,1] name [0,1] type [0,1] contact isA worksFor participant role= Class AbstractClass inheritance association LEGEND Service [0,1] name [0,1] description Service [0,1] name [0,1] description serviceId

NCAR ESG Metadata Progress l Co-developed NcML with Unidata –CF conventions in progress, almost done l Developed & evaluated a prototype metadata system l Finalized an initial schema for PCM/CCSM –Address interoperability with federal standards and NASA/GCMD via the generation of DIF/FGDC/ISO –Address interoperability with digital libraries via the creation of Dublin Core l Testing relational and native XML databases, and OGSA-DAI l Exploratory work for first-generation ontology l Authoring of discovery metadata in progress

NCAR ESG Web Portal Demonstration

NCAR RLS MSS HRM HPSS HRM RLS HPSS HRM RLS DISK HRM RLS DISK cache OGSA-DAI MySQL RDBMS ESG WEB PORTAL Tomcat/Struts cross-update gridFTP query MyProxy authenticate GRAM GATEKEEPER submit execute gridFTP SERVER LAS SERVER visualize LBNL ISI LLNL NCAR ORNL CAS ANL ESG Topology (CAS 2003)

NCAR Collaborations & Relationships l CCSM Data Management Group l The Globus Project l Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High- performance DataGrid Toolkit l OPeNDAP/DODS (multi-agency) l NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) l U.K. e-Science and British Atmospheric Data Center l NOAA NOMADS and CEOS-grid l Earth Science Portal group (multi-agency, intnl.)

NCAR Immediate Directions l Broaden usage of DataMover and refine l Continue building metadata catalogs l Revisit overall security model and consider simplified approaches l Redesign and implement user interface l Alpha version of OPeNDAPg – Test and evaluate with three client applications (ncview, CDAT, & NCL) l Develop automation for data publishing (GT3) l Deploy for IPCC runs

NCAR The Community Data Portal (CDP) l Provide a common portal to NCAR, UCAR, and university data l Provide cyberinfrastructure that dramatically lowers the cost of sharing data (there is large interest in this) l Directly couple to simulation and data analysis systems l Begin capturing rich metadata and catalog our scientific experiments for the world l MSS -> A petascale Mass Knowledge System l Federate internationally (ESG, THREDDS, U.K. e-Science, NOMADS, PRISM, GEON, etc.) “The dataportal has changed my life…” Ben Kirtman, COLA

NCAR A Quick Tour of the CDP ‘ing Our Data

Community Data Portal Metadata Software THREDDS catalogs ESG metadata DC metadata NcML metadata THREDDS catalog parser application relational DB (MySQL) XML native DB (Xindice XML viewer web application schema- specific stylesheets stores full XML doc shreds XML doc into tables Search & Discovery web application simple query (SQL) Results: list of triplets (dataset id, metadata schema, metadata URL) THREDDS catalogs browser Web application reference other metadata parses future advanced query (Xpath, Xquery) displays links to uses

NCAR Data->Knowledge Mass Storage System (1.3PB) Petascale Knowledge Repository Establish new paradigms for managing and accessing scientific data based on semantic organization.

NCAR Closing Thoughts l Building an environment for the long-term –Difficult, expensive, and time-consuming –Requires longer-term projects l Team-building is a critical process –Collaboration technologies really help l Managing all the collaborations is a challenge –But extremely valuable l Good progress, first real usage

NCAR Managing Expectations…

NCAR Links l Earth System Grid – l Community Data Portal –dataportal.ucar.edu

NCAR END