1 Observations on Architecture, Protocols, Services, APIs, SDKs, and the Role of the Grid Forum Ian Foster Carl Kesselman Steven Tuecke
2 Why this Talk? u Considerable progress over GF1-5 in terms of interest and understanding in Grid concepts u Seems timely to attempt to –Define the scope of the problem that we are tackling –Define a common vocabulary for describing components and activities
3 Overview 1. The Grid problem: controlled resource sharing in multi-institutional settings 2. Definition, role, and importance of protocols, services, SDKs, and APIs 3. A categorization of protocols, services, SDKs, and APIs in the Grid environment
4 The Grid Problem u Grid language has been driven by genesis from metacomputing, but… u In practice, the Grid is about resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations u Focus on how to enable, maintain, and control the sharing of resources to achieve a common goal
5 Universal Nature of the Grid Problem u “Sharing” fundamental in many settings –Application Service Providers, Storage Service Providers, etc.; Peer-to-peer computing; Distributed computing; Business to business; … u Sharing issues not adequately addressed by existing technologies –Sharing at a deep level, across broad ranges of resources and in a general way –E.g., user provides ASP with controlled access to their data on an SSP: how?? u Grid community has unique experience
6 Some Important Definitions u Resource u Network protocol u Network enabled service u Application Programmer Interface (API) u Software Development Kit (SDK) u Not discussed, but important: policies
7 Resource u Entity that is to be shared –Includes computers, storage, data, software u Does not have to be physical entity –Condor pool, distributed file system, … u Defined in terms of interfaces, not devices –E.g. LSF defines compute resource –Open/close/read/write defines access to a distributed file system, e.g. NFS, AFS, DFS
8 Network Protocol u A formal description of message formats and a set of rules for message exchange –Rules may define sequence of message exchanges –Protocol may define state-change in endpoint, e.g. state change u Good protocols designed to do one thing –Protocols can be layered u Examples of protocols –IP, TCP, TLS, HTTP, Kerberos
9 Network Enabled Services u Implementation of a protocol that defines a set of capabilities –Protocol defines interaction with service –All services require protocols –Not all protocols are used to provide services (e.g. IP, TLS) u Examples: FTP and Web servers Web Server IP Protocol TCP Protocol TLS Protocol HTTP Protocol FTP Server IP Protocol TCP Protocol FTP Protocol Telnet Protocol
10 Application Programmer Interface u A specification for a set of routines to facilitate application development –Refers to definition, not implementation, e.g. there are many implementations of MPI u Spec often language-specific (or IDL) –Routine name, number, order and type of arguments; mapping to language constructs –Behavior or function of routine u Examples –GSS API, MPI
11 Software Development Kit u A particular instantiation of an API u SDK consists of libraries and tools –Provides implementation of API specification u Can have multiple SDKs for an API u Examples of SDKs –MPICH, Motif Widgets
12 Multiple APIs but a Single Protocol Example: TCP/IP u Multiple APIs: BSD sockets, Winsock, System V streams, … u Different programs use different APIs u Interoperability: programs using different APIs can exchange information TCP/IP Protocol: Reliable byte streams WinSock APIBerkeley Sockets API Application
13 Single API, but Multiple Protocols E.g., Message Passing Interface u MPI provides portability: any correct program compiles & runs on a platform u Does not provide interoperability: all processes must link against same SDK –E.g., MPICH and LAM versions of MPI Application MPI API LAM SDK LAM protocol MPICH-P4 SDK MPICH-P4 protocol TCP/IP Different message formats, exchange sequences, etc.
14 Back to Grids: The Programming & Systems Problems u The programming problem –Making it easy to develop sophisticated applications –Requires programming environments: APIs, SDKs, tools u The systems problem –Facilitating coordinated use of diverse resources; sharing infrastructure –Requires systems: protocols and services u “Standards” can help in both cases: but in different ways
15 I.e., Standard APIs and Protocols are Both Important: For Different Reasons u Standard APIs/SDKs are important –They enable application portability –But w/o standard protocols, interoperability is hard (every SDK speaks every protocol?) u Standard protocols are important –Enable cross-site interoperability –Enable shared infrastructure –But w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
16 Grid Architecture u We now proceed to analyze Grid systems with respect to sharing u Identify key areas where protocols, services, APIs, and SDKs can occur u Result is a layered protocol architecture u We assert this can be useful as a means of describing and structuring Grid Forum activities
17 Layered Grid Protocol Architecture (By Analogy to Internet Architecture) Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Managing multiple resources”: ubiquitous infrastructure services User “Specialized services”: user- or appln-specific distributed services Internet Transport Application Link Internet Protocol Architecture
18 Protocols, Services, and Interfaces Occur at Each Level Languages/Frameworks Fabric Layer Applications Local Access APIs and Protocols Collective Service APIs and SDKs Collective Services Collective Service Protocols Resource APIs and SDKs Resource Services Resource Service Protocols User Service Protocols User Service APIs and SDKs User Services Connectivity APIs Connectivity Protocols
19 Compute Resource SDK API Access Protocol Source Code Repository SDK API Lookup Protocol Example: User Portal Web Portal Source code discovery, application configuration Brokering, co-allocation, certificate authorities Access to data, access to computers, access to network performance data Communication, service discovery (DNS), authentication, authorization, delegation Storage systems, schedulers User Appln Collective Resource Connect Fabric
20 Compute Resource SDK API Access Protocol Checkpoint Repository SDK API C-point Protocol Example: High-Throughput Computing System High Throughput Computing System Dynamic checkpoint, job management, failover, staging Brokering, certificate authorities Access to data, access to computers, access to network performance data Communication, service discovery (DNS), authentication, authorization, delegation Storage systems, schedulers User Appln Collective Resource Connect Fabric
21 Important Points u We build on Internet protocols u One or many protocols? –No one “right” protocol for any one function –But: interoperability requires that we define and commit to core “Intergrid” protocols –Definition: “A resource is Grid-enabled if it speaks Intergrid protocols” u One or many APIs and SDKs? –Many APIs, SDKs, programming models can target Intergrid protocols –But: code sharing requires standards
22 Summary u Grids are about [large-scale] sharing –Hence require standard protocols to enable interoperability and shared infrastructure –As well as, of course, standard APIs and SDKs to enable portability & code sharing u Well defined protocol architecture is essential to understanding & progress –Provides a framework for figuring out where the pieces fit
23
24 Additional Slides
25 Back to Grids u Grid applications are complex and multifaceted –Need to develop abstractions –Need to be able to share code u Grids are multi-organizational –Heterogeneous in systems, policy, mechanisms –Need to be able to share resources u “Standards” are the vehicle by which sharing occurs
26 “Standards Enable Sharing” -- But of What, and How? u Of code & abstractions –Via SDKs, APIs –E.g.: MPI u Of infrastructure services –Via protocols, policies –E.g.: GIS, CA u Of resources –Via protocols, policies –E.g., TCP/IP, TLS App 1 SDK App 2 SDK App 1App 2 CAGIS App TCP/IP TLS Site 1Site 2
27 Aspects of the Programming Problem u Need for abstractions and models to add to speed/robustness/etc. of development –E.g., OO abstractions, MPI for messaging u Need for code sharing to allow reuse of code components developed by others –E.g., MPI allows reuse of message passing u Need for tool sharing to allow reuse of tools developed by others –E.g., standard debuggers
28 Aspects of the Systems Problem u Need for interoperability when different groups want to share resources –Diverse components, policies, mechanisms –E.g., standard notions of identity, means of communication, resource descriptions u Need for shared infrastructure services to avoid repeated development, installation –E.g., one port/service for remote access to computing, not one per tool/application –E.g., Certificate Authorities: expensive to run