Promoting and Standardizing Grid Computing OGSA - A View From The Trenches Andrew Grimshaw GGF Architecture Area Co-Director January, 2005
2 Agenda Background – quick OGSA objectives and process OGSA design teams Opportunities for collaboration
3 What is an architecture? In the computer systems world an architecture is the definition of the components, their interactions, and the design philosophy used in the development of the whole system. In a grid, high-performance secure, shared, collaboration distributed system, the architecture will define the services, their interactions, and the design philosophy. In other words, what are the pieces of the puzzle, how do they fit together, and what does the puzzle look like when complete. One of our design philosophies is that the pieces can be replaced, extended, and tailored to particular use cases. Further, a systems architecture is the architecture on which applications, application services, and specialized views or profiles of the architecture are built. OGSA is a grid system architecture.
4 Architecture Requirements Simple Secure Standards-based Multiple interoperable implementations Scalable Extensible Site Autonomy Persistence & I/O Multi-Language Legacy Support Transparency Heterogeneity Fault-tolerance & Exception Management Success Requires an integrated model at the foundation. Manage Complexity!!
5 The Importance of a Strong Foundation
6 OGSA Aims and Perspective Goals −Interoperable solutions to Grid based applications Grid definitions sidebar −Addressing loosely coupled distributed computing Philosophy −Standardization at the Architectural level Similar to profiling. Developed before and/or during standards development −Use existing standards and technology where possible −Use case driven gap analysis −Gaps are filled proactively Not exclusively within the GGF (e.g. naming).
7 OGSA Process Use Case Driven −21 Detailed Use Cases (~ 6 pages each) Tier 1 Available at: Distributed Specification and Standardization −Identify and/or develop open and accessible standard specifications −Active current work in GGF, OASIS, W3C, and DMTF. “Design Team” Working Model −Facilitate cross fertilization within and outside GGF. −Avoid redundant work applicable efforts −Focus mind share (the most valuable commodity) e.g. DAIS-WG and OGSA-Data Design Team Iterative Refinement −Abstract service evolving to concrete specifications Documents: −OGSA: Use Cases, Informal Specification, Recommendation
8 OGSA –What is it? Two streams −Profiles −Design Teams Working Groups Process for design team, working group, profile development interaction −Draw circle
9 Profiles Define a usage pattern and include specifications developed by working groups both within and external to GGF. Issue: How mature and “widely adopted”? Three “in the pipe” −Basic −Data −Execution Management
10 Design Teams Naming – the foundation on which distributed systems are built Security – deeply dependent on WS-Security Data of all types Execution Management Services – EMS Logging – spit off into a working group
11 “A Rose by any other name would smell as sweet” Terms Resource Abstract resource name Human name (paths and attributes) Resource address Resource identity Binding scheme Bind time
12 Why names? Transparencies −Location −Migration −Failure −Replication −Scalability −and so on
13 Distributed naming is a well-understood area - properties Unique Provide identity Comparable Location portable Widely adopted Scalable – high performance Extensible Dynamic binding …. Two and three level name schemes dominate
14 Two level schemes Human name -> address −E.g., DNS, Unix file system (string->inode) abstract name -> address
15 Three level schemes Human -> abstract -> address In OGSA, −Human -> address and Human -> abstract will likely be handled by RNS – Resource Naming Service being developed by the GGF GFS-WG
16 OGSA Security Process is not moving rapidly −Partially because they are waiting on WS Security Maybe too focused on one set of use cases (big government labs working together) (my opinion)
17 OGSA Data & InfoD Use case driven Many different data “types” and use scenario’s from HEP to business intelligence Strong consensus emerging with some issues still around meta-data and information dissemination Strawman services defined for flat files, interacting with GFS. Pushing for early spec’s. Interacting with existing GGF WG’s including GFS, GSM, DIAS, Info-D Interacting begun with WSDM
18 Info Services Troubleshooting Event Management Discovery Logging – spun off
19 EMS Overview Basic problem: provision, execute and manage services (including legacy applications) in a grid −Some use cases start up a cache service on-demand, utility computing start up and manage a set of legacy applications Want to be able to “instantiate” a service and have the grid figure it out, and provide management interfaces throughout the lifetime of the service
20 EMS addresses issues such as: Where can a service execute? What are the locations it can execute because of resource restrictions such as memory, CPU and binary type, available libraries, and available licenses? Given the above, what policy restrictions are in place that may further limit the candidate set of execution locations? Where should the service execute? Once it is known where the service can execute, the question is where should it execute? This may involve different selection algorithms that optimize different objective functions or are trying to enforce different policies or service level agreements. Prepare the service to execute. Just because a service can execute somewhere does not necessarily mean it can execute there without some setup. Setup could include deployment and configuration of binaries, libraries, staging data, or other operations to prepare the local execution environment to execute the service. Get the service executing. Once everything is ready, actually start the service and register it in the appropriate places. Manage (monitor, restart, move, etc.). Once the service is started in must be managed and monitored. What if it fails? Or fails to meet its service agreements. Should it be restarted in another location? What about state? Should the state be “checkpointed” periodically to ensure restartability? Is the service participating in some sort of fault-detection and recovery scheme?
21 EMS Services fall into three sets Resources that model processing, storage, executables, resource management, and provisioning Job management and monitoring services; and Resource selection services that collectively decide where to execute a service.
22 Typical Pattern Provisioning Deployment Configuration Information Services Service Container Persistent State Handle Service Accounting Services Execution Planning Services Candidate Set Generator (Work - Resource mapping) Job Manager Reservation
23 Opportunities For Collaboration OMII and EGEE efforts intersect with OGSA design team efforts We all win if we can come to consensus EMS −The basic problem that everyone (Globus, SGE, LSF, Legion, EGEE, OMII) solves is the same. −Solutions have many similarities −EMS team spent quite a bit of time hammering those out −We’re here to make sure that OMII input is part of design Similarly for data