Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN 2003, Busan, August 26, 2003 Image Credit: Electronic Visualization Lab, UIC
2 ARGONNE CHICAGO Overview l The Grid: why and what –Global knowledge communities –Resource sharing technologies –Open standards and software l The Grid and meteorology –Opportunities –Espresso interface –Earth System Grid project
3 ARGONNE CHICAGO It’s Easy to Forget How Different 2003 is From 1993 l Enormous quantities of data: Petabytes –For an increasing number of communities, gating step is not collection but analysis l Ubiquitous Internet: 100+ million hosts –Collaboration & resource sharing the norm l Ultra-high-speed networks: 10+ Gb/s –Global optical networks l Huge quantities of computing: 100+ Top/s –Moore’s law gives us all supercomputers
4 ARGONNE CHICAGO Consequence: The Emergence of Global Knowledge Communities l Teams organized around common goals –Communities: “Virtual organizations” l With diverse membership & capabilities –Heterogeneity is a strength not a weakness l And geographic and political distribution –No location/organization possesses all required skills and resources l Must adapt as a function of the situation –Adjust membership, reallocate responsibilities, renegotiate resources
5 ARGONNE CHICAGO For Example: High Energy Physics
6 ARGONNE CHICAGO Grid Technologies Address Key Requirements l Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations –Dynamic, autonomous, domain independent –On-demand, ubiquitous access to computing, data, and services l Mechanisms for creating and managing workflow within such federations –New capabilities constructed dynamically and transparently from distributed services –Service-oriented, virtualization
7 ARGONNE CHICAGO The Grid World: Current Status l Substantial number of Grid success stories –Major projects in science –Emerging infrastructure deployments –Growing number of commercial deployments l Open source Globus Toolkit® a de facto standard for major protocols & services –Simple protocols & APIs for authentication, discovery, access, etc.: infrastructure –Large user and developer base –Multiple commercial support providers l Global Grid Forum: community & standards l Emerging Open Grid Services Architecture
8 ARGONNE CHICAGO What We Can Do Today l A core set of Grid capabilities are available and distributed in good quality form, e.g. –Globus Toolkit: security, discovery, access, data movement, etc. –Condor: scheduling, workflow management –Virtual Data Toolkit, NMI, EDG, etc. l Deployed at moderate scales –WorldGrid, TeraGrid, NEESgrid, DOE SG, EDG, … l Usable with some hand holding, e.g. –US-CMS event prod.: O(6) sites, 2 months –NEESgrid: earthquake engineering experiment
9 ARGONNE CHICAGO
10 ARGONNE CHICAGO NEESgrid Earthquake Engineering Collaboratory U.Nevada Reno
11 ARGONNE CHICAGO CMS Event Simulation Production l Production Run on the Integration Testbed –Simulate 1.5 million full CMS events for physics studies: ~500 sec per event on 850 MHz processor –2 months continuous running across 5 testbed sites –Managed by a single person at the US-CMS Tier 1
12 ARGONNE CHICAGO Key Areas of Concern l Integration with site operational procedures –Many challenging issues l Scalability in multiple dimensions –Number of sites, resources, users, tasks l Higher-level services in multiple areas –Virtual data, policy, collaboration l Integration with end-user science tools –Science desktops l Coordination of international contributions l Integration with commercial technologies
13 ARGONNE CHICAGO Overview l The Grid: why and what –Global knowledge communities –Resource sharing technologies –Open standards and software l The Grid and meteorology –Opportunities –Espresso interface –Earth System Grid project
14 ARGONNE CHICAGO The Grid and Meteorology: Opportunities l Inter-personal collaboration –E.g., Access Grid, CHEF l On-demand access to simulation models –E.g., Espresso l Access to, and integration of, data sources –E.g., Earth System Grid l Dynamic, virtual computing resources –“Metacomputing” l Integration of all of the above –Collaborative, computationally intensive analysis of large quantities of online data
15 ARGONNE CHICAGO Expresso Modeling Interface (Michael Dvorak, John Taylor) l “Meteorology on demand”
16 ARGONNE CHICAGO Earth System Grid (ESG) Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models
17 ARGONNE CHICAGO
18 ARGONNE CHICAGO ESG: Strategies l Move data a minimal amount, keep it close to point of origin when possible –Data access protocols, distributed analysis l When we must move data, do it fast and with minimum human intervention –Storage Resource Management, fast networks l Keep track of what we have, particularly what’s on deep storage –Metadata and Replica Catalogs l Harness a federation of sites, web portals –GT -> Earth System Grid -> UltraDataGrid
19 ARGONNE CHICAGO OPeNDAP-g -Transparency -Performance -Security -Authorization -(Processing) Typical Application Data (local) netCDF lib Application Data (remote) OPeNDAP Client Application OPeNDAP Via http Big Data (remote) ESG client Application ESG + DODS OpenDAP Server ESG Server Distributed Application data OPeNDAP Via Grid Distributed Data Access Protocols
20 ARGONNE CHICAGO ESG: Metadata Services METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA QUERY METADATA QUERY ESG CLIENTS API & USER INTERFACES Data & Metadata Catalog Dublin Core Database COARDS Database mirror Dublin Core XML Files COMMENTS XML Files METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA DISCOVERY METADATA DISCOVERY METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION
21 ARGONNE CHICAGO l XML encoding of metadata (and data) of any generic netCDF file l Objects: netCDF, dimension, variable, attribute l Beta version reference implementation as Java Library ( ESG: NcML Core Schema netCDF nc:netCDFType nc:dimension nc:variable nc: attribute nc:values nc:VariableType
22 ARGONNE CHICAGO
23 ARGONNE CHICAGO Collaborations & Relationships l CCSM Data Management Group l OPeNDAP/DODS (multi-agency) l NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) l U.K. e-Science and British Atmospheric Data Center l NOAA NOMADS and CEOS-grid l Earth Science Portal group (multi-agency, international)
24 ARGONNE CHICAGO For More Information l The Globus Project® – l Earth System Grid – l Global Grid Forum – l Background information – l GlobusWORLD 2004 – –Jan 20–23, San Francisco 2nd Edition: November 2003