Building an Open Grid: A Status Report Ian Foster Argonne National Lab University of Chicago Globus Alliance CERN, September 9, 2003
2 ARGONNE CHICAGO Abstract Grid technologies and infrastructure support the integration of services and resources within and among enterprises, and thus allow new approaches to problem solving and interaction within distributed, multi-organizational collaborations. Sustained effort by computer scientists and application developers has resulted in the creation of a substantial open source technology, numerous infrastructure deployments, a vibrant international community, and significant application success stories. Long- term success now depends critically on three issues: open standards, open software, and open infrastructure. I discuss current efforts and future directions in each area, referring in particular to recent developments in "cyber-infrastructure" in the U.S., EGEE in Europe, Open Grid Services Architecture standards, and adoption within scientific communities beyond the physical sciences.
3 ARGONNE CHICAGO It’s Easy to Forget How Different 2003 is From 1993 l Ubiquitous Internet: 100+ million hosts u Collaboration & resource sharing the norm l Ultra-high-speed networks: 10+ Gb/s u Global optical networks l Enormous quantities of data: Petabytes u For an increasing number of communities, gating step is not collection but analysis l Huge quantities of computing: 100+ Top/s u Moore’s law gives us all supercomputers
4 ARGONNE CHICAGO Consequence: The Emergence of Global Knowledge Communities l Teams organized around common goals u Communities: “Virtual organizations” l With diverse membership & capabilities u Heterogeneity is a strength not a weakness l And geographic and political distribution u No location/organization possesses all required skills and resources l Must adapt as a function of the situation u Adjust membership, reallocate responsibilities, renegotiate resources
5 ARGONNE CHICAGO For Example: High Energy Physics
6 ARGONNE CHICAGO Resource Integration as a Fundamental Challenge R Discovery Many sources of data, services, computation R Registries organize services of interest to a community Access Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex, multi-step workflows RM Resource management is needed to ensure progress & arbitrate competing demands Security service Security service Policy service Policy service Security & policy must underlie access & management decisions
7 ARGONNE CHICAGO Grid Technologies Promise to Address Key Requirements l Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations u Dynamic, autonomous, domain independent u On-demand, ubiquitous access to computing, data, and services l Mechanisms for creating and managing workflow within such federations u New capabilities constructed dynamically and transparently from distributed services u Service-oriented, virtualization
8 ARGONNE CHICAGO Successful Applns Realizing the Promise: Building an Open Grid Open Standards Open Source Open Infrastructure
9 ARGONNE CHICAGO Realizing the Promise: Building an Open Grid Open Standards Open Source Open Infrastructure Successful Applns
10 ARGONNE CHICAGO Why Open Standards Matter l Ubiquitous adoption demands open, standard protocols u Standard protocols enable interoperability u Avoid product/vendor lock-in u Enables innovation/competition on end points l Further aided by open, standard APIs u Standard APIs enable portability u Allow implementations to port to different vendor platforms l Internet and Web as exemplars
11 ARGONNE CHICAGO Grids and Open Standards Increased functionality, standardization Time Custom solutions Open Grid Services Arch GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Web services Globus Toolkit Defacto standards GGF: GridFTP, GSI X.509, LDAP, FTP, … App-specific Services
12 ARGONNE CHICAGO Open Grid Services Architecture l Service-oriented architecture u Key to virtualization, discovery, composition, local-remote transparency l Leverage industry standards u Internet, Web services l Distributed service management u A “component model for Web services” l A framework for the definition of composable, interoperable services “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002
13 ARGONNE CHICAGO More Specifically l A standard substrate: the Grid service u OGSI = Open Grid Service Infrastructure u Web services interfaces and behaviors that address key distributed system issues l … supports standard service specifications u Resource mgt, dbms, workflow, security, … u Target of current & planned GGF efforts u OGSA wg defines “OGSA compliance” l … and arbitrary application-specific services based on these & other definitions
14 ARGONNE CHICAGO What Is OGSI? l Useful, general purpose plumbing to make it easier to build Web services relevant to Grids u OGSI came about because we started trying to define Globus Toolkit ® functionality using WSDL, and found there were common, base behaviors that we wanted to define once and reuse in all of our services l There is nothing Grid specific about OGSI l Perhaps it should have been better named: WS-UsefulInterfacesToBuildAnInterestingClassOfWebServices
15 ARGONNE CHICAGO “Web Services” l For OGSI, Web Services = WSDL, Web Services Description Language u OGSI is defined in terms of WSDL portTypes, messages, and XML Schema types u Largely silent on WSDL binding and service l SOAP is important as a standard, inter-operable binding under WSDL, but OGSI is silent on this u Ditto for WS-Security, etc. l UDDI registry can be used with OGSI u OGSI also defines primitives for building custom or domain-specific registries
16 ARGONNE CHICAGO Service registry Service requestor (e.g. user application) Service factory Create Service Grid Service Handle Resource allocation Service instances Register Service Service discovery Interactions standardized using WSDL Service data Keep-alives Notifications Service invocation Authentication & authorization are applied to all requests Open Grid Services Infrastructure (OGSI)
17 ARGONNE CHICAGO Example: Reliable File Transfer Service Performance Policy Faults service data elements Pending File Transfer Internal State Grid Service Notf’n Source Policy interfaces Query &/or subscribe to service data Fault Monitor Perf. Monitor Client Request and manage file transfer operations Data transfer operations
18 ARGONNE CHICAGO OGSA Standardization & Implementation l OGSI defines core interfaces and behaviors for manageable services u Supported by strong open source technology & major commercial vendors l Efforts are underway within GGF, OASIS, and other bodies to define standards for u Agreement negotiation u Common management model u Data access and integration u Security and policy u Etc., etc., etc.
19 ARGONNE CHICAGO Relevant Standards Organizations l GGF: Grid services: OGSI/A, WS-Agreement l W3C: Web services: WSDL, SOAP l OASIS: Web services security, WSDM, SAML l IETF: Internet protocols and security l Project Liberty Alliance: Identity federation l DMTF: Common Information Model (CIM)
20 ARGONNE CHICAGO Globus Toolkit Implements Standards As They Emerge l GT 2.x u X.509 (Proxy) Certs, GridFTP, LDAP, GSS-API l GT 3.0 (June 2003) u GT2 + WSDL, SOAP, OGSI, WS-Security, etc. l GT 3.2 (1Q2004) u Maintenance release + new GridFTP code l GT 3.x (3-4Q2004) u First implementation of many new OGSA standards: WS-Agreement, OGSA-DAI, …, …
21 ARGONNE CHICAGO Realizing the Promise: Building an Open Grid Open Standards Open Source Open Infrastructure Successful Applns
22 ARGONNE CHICAGO Open Source l Encourage adoption of standards by reducing barriers to entry u Overcome new technology Catch-22 l Enable broad Grid technology ecosystem u Key to international cooperation u Key to science-commerce partnership l Jumpstart Grid industry and allow vendors to focus on value-add u E.g., IBM, Oracle, HP use GT; Platform Globus u Open source is industry friendly!
23 ARGONNE CHICAGO “Open Source” is More than Software and a License l A community of contributors to code design, development, packaging, testing, documentation, training, support u United by common architectural perspective l A community of users u May be major contributors via e.g. testing l A governance structure u To determine how the software evolves l Processes for coordinating all these activities u Packaging, testing, reporting, …
24 ARGONNE CHICAGO The Globus Alliance & Toolkit (Argonne, USC/ISI, Edinburgh, PDC) l An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit u Design, engineering, support, governance l Academic Affiliates make major contributions u EU: CERN, Imperial, MPI, Poznan u AP: AIST, TIT, Monash u US: NCSA, SDSC, TACC, UCSB, UW, etc. l Significant industrial contributions l 1000s of users worldwide, many contribute
25 ARGONNE CHICAGO Other Relevant Efforts l NSF Middleware Infrastructure initiative u Funds system integrators for middleware l U.K. Open Middleware Initiative u Archive, integration, testing, documentation l EU DataGrid, GridLab, EGEE, GriPhyN, Earth Science Grid, NEESgrid, etc., etc. u Produce/integrate software l U.S. Virtual Data Toolkit u GT + Condor in physics-friendly package
26 ARGONNE CHICAGO Globus Toolkit Contributors Include l Grid Packaging Technology (GPT) NCSA l Persistent GRAM Jobmanager Condor l GSI/Kerberos interchangeability Sandia l Documentation NASA, NCSA l Ports IBM, HP, Sun, SDSC, … l MDS stress testing EU DataGrid l Support IBM, Platform, UK eScience l Testing and patches Many l Interoperable tools Many l Replica location service EU DataGrid l Python hosting environment LBNL l Data access & integration UK eScience l Data mediation services SDSC l Tooling, Xindice, JMS IBM l Brokering framework Platform l Management frameworkHP l $$ DARPA, DOE, NSF, NASA, Microsoft, EU
27 ARGONNE CHICAGO GT Processes Are Evolving Rapidly (Thanks to Partners & the GT Team) l Before 2000 u -based problem tracking, aka “req” l 2000 u Detailed documentation, release notes (Q1) u Legal framework for external contributions (Q1) l 2001 u Packaging; module & binary releases (Q4) u Substantial regression tests (Q4) l 2002 u Bugzilla problem reporting & tracking (Q2) u Processes for external contrib (Q2) u Distributed testing infrastructure (Q3) l (in progress) u Distributed support infrastructure: GT “support centers” u Standardized Grid testing framework(s) u GT “contrib” components u Grid Technology Repository
28 ARGONNE CHICAGO Distributed Test Facility (NSF GRIDS Center activity: grids-center.org)
29 ARGONNE CHICAGO The Grid Technology Repository l Community repository l Clearing house for service definitions, code, documentation l Encourage collaboration & avoid redundant work International advisory committee: Ian Foster (Chair), Malcolm Atkinson, John Brooke, Fabrizio Gagliardi, Dennis Gannon, Wolfgang Gentzsch, Andrew Grimshaw, Keith Jackson, Gregor von Laszewski, Satoshi Matsuoka, Jarek Nabrzyski, Bill St. Arnaud, Jay Unger 20+ contributions received
30 ARGONNE CHICAGO Realizing the Promise: Building an Open Grid Open Standards Open Source Open Infrastructure Successful Applns
31 ARGONNE CHICAGO Grid Infrastructure l Broadly deployed services in support of fundamental collaborative activities u Formation & operation of virtual organizations u Authentication, authorization, discovery, … l Services, software, and policies enabling on- demand access to critical resources u Computers, databases, networks, storage, software services,… l Operational support for 24x7 availability l Integration with campus and commercial infrastructures
The Foundations Are Being Laid
Current Focus of U.S. Physics Grid Efforts: Grid3 A continuation of GriPhyN/PPDG/iVDGL testbed efforts, focused on establishing a functional federated Grid
34 ARGONNE CHICAGO Resource Center (Processors, disks) Grid server Nodes Resource Center Resource Center Resource Center Operations Center Regional Support Center (Support for Applications Local Resources) Regional Support Regional Support Regional Support EGEE: Enabling Grids for E-Science in Europe
35 ARGONNE CHICAGO Realizing the Promise: Building an Open Grid Open Standards Open Source Open Infrastructure Successful Applns
36 ARGONNE CHICAGO What We Can Do Today l A core set of Grid capabilities are available and distributed in good quality form, e.g. u GT: security, discovery, access, data movement u Condor, EDG: scheduling, workflow management u Virtual Data Toolkit, NMI, etc. l Deployed at moderate scales u NEESgrid, EU DataGrid, Grid3, LCG-1, … l Usable with some hand holding, e.g. u CMS event production: 5 sites, 2 months u NEESgrid earthquake engineering experiment
37 ARGONNE CHICAGO
38 ARGONNE CHICAGO NEESgrid Earthquake Engineering Collaboratory U.Nevada Reno
39 ARGONNE CHICAGO Earth System Grid (ESG) Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models
40 ARGONNE CHICAGO CMS Event Simulation Production l Production run on the integration testbed u Simulate 1.5 million full CMS events for physics studies: ~500 sec per event on 850 MHz processor u 2 months continuous running across 5 testbed sites u Managed by a single person at the US-CMS Tier 1 l EU DataGrid and LCG-1 operating at similar scales
41 ARGONNE CHICAGO Challenges l Integration with site operational procedures u Many difficult issues l Scalability in multiple dimensions u Number of sites, resources, users, tasks l Higher-level services in multiple areas u Virtual data, policy, collaboration l Integration with end-user science tools u Science desktops l Coordination of international contributions l Integration with commercial technologies
42 ARGONNE CHICAGO Earth System Grid
43 ARGONNE CHICAGO Scalable, Resilient Services: EDG/Globus Replica Location l Soft state maintenance of replicated index l E.g., Earth System Grid replica registry
44 ARGONNE CHICAGO Strategies for Success l Establish real international collaboration u Partition the space of what to do: it’s large u Be each other’s partners, not customers l Tackle process as well as software u Standardize on packaging, testing, support u Deployment and operations issues l Build sustained, critical mass teams u Software is hard; requires time & expertise l Build and operate large-scale Grids with real application groups to drive all of this u Look beyond data grids and physics
45 ARGONNE CHICAGO Strategies for Failure l Select software for political reasons u E.g., “We must have a European solution” l Underestimate cost & difficulty of software u “It’s easy to hire programmers & students” l Start from scratch if “A does not do Z” u You don’t need U, V, W, X, Y. Or do you? l Embrace diversity and non-interoperability u “We can add an interoperability layer” l Develop discipline-specific solutions
46 ARGONNE CHICAGO Summary l We have made a good start on creating an open Grid technology in three key areas u Standards, software, infrastructure u Real application success stories l We must now scale this technology base u Address deployment & operational issues u Functionality and resilience at large scales l International collaboration is key to success: we’ve scored a C so far, let’s aim for an A
47 ARGONNE CHICAGO For More Information l The Globus Alliance® u l Global Grid Forum u l Background information u l GlobusWORLD 2004 u u Jan 20–23, San Francisco 2nd Edition: November 2003