1 I.Foster LCG Grid Technology: Introduction & Overview Ian Foster Argonne National Laboratory University of Chicago
2 I.Foster LCG Grid Technologies: Expanding the Horizons of HEP Computing Including New Zealand! Enabling thousands of physicists to harness the resources of hundreds of institutions in pursuit of knowledge
3 I.Foster LCG The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
4 I.Foster LCG Some “Large” Grid Issues (H. Newman) Consistent transaction management Query (task completion time) estimation Queuing and co-scheduling strategies Load balancing (e.g., Self Organizing Neural Network) Error Recovery: Fallback and Redirection Strategies Strategy for use of tapes Extraction, transport and caching of physicists’ object-collections; Grid/Database Integration Policy-driven strategies for resource sharing among sites and activities; policy/capability tradeoffs Network Performance and Problem Handling Monitoring and Response to Bottlenecks Configuration and Use of New-Technology Networks e.g. Dynamic Wavelength Scheduling or Switching Fault-Tolerance, Performance of the Grid Services Architecture
5 I.Foster LCG How Large is “Large”? Is the LHC Grid –Just the O(10) Tier 0/1 sites and O(20,000) CPUs? –+ the O(50) Tier 2 sites: O(40,000) CPUs? –+ the collective computing power of O(300) LHC institutions: perhaps O(60,000) CPUs in total? Are the LHC Grid users –The experiments and their relatively few, well- structured “production” computing activities? –The curiosity-driven work of 1000s of physicists? Depending on our answer, the LHC Grid is –A relatively simple deployment of today’s technology –A significant information technology challenge
6 I.Foster LCG The Problem: Resource Sharing Mechanisms That … Address security and policy concerns of resource owners and users Are flexible enough to deal with many resource types and sharing modalities Scale to large number of resources, many participants, many program components Operate efficiently when dealing with large amounts of data & computation
7 I.Foster LCG Aspects of the Problem 1) Need for interoperability when different groups want to share resources –Diverse components, policies, mechanisms –E.g., standard notions of identity, means of communication, resource descriptions 2) Need for shared infrastructure services to avoid repeated development, installation –E.g., one port/service/protocol for remote access to computing, not one per tool/appln –E.g., Certificate Authorities: expensive to run A common need for protocols & services
8 I.Foster LCG Hence, Grid Architecture Must Address Development of Grid protocols & services –Protocol-mediated access to remote resources –New services: e.g., resource brokering –“On the Grid” = speak Intergrid protocols –Mostly (extensions to) existing protocols Development of Grid APIs & SDKs –Interfaces to Grid protocols & services –Facilitate application development by supplying higher-level abstractions The (hugely successful) model is the Internet
9 I.Foster LCG Grid Architecture Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services For more info:
10 I.Foster LCG HENP Grid Architecture (H. Newman) Physicists’ Application Codes –Reconstruction, Calibration, Analysis Experiments’ Software Framework Layer –Modular and Grid-aware: Architecture able to interact effectively with the lower layers (above) Grid Applications Layer (Parameters and algorithms that govern system operations) –Policy and priority metrics –Workflow evaluation metrics –Task-Site Coupling proximity metrics Global End-to-End System Services Layer –Monitoring and Tracking Component performance –Workflow monitoring and evaluation mechanisms –System self-monitoring, evaluation and optimization mechanisms
11 I.Foster LCG Architecture (1): Fabric Layer Diverse resources that may be shared –Computers, clusters, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. Speak connectivity, resource protocols –The neck of the protocol hourglass May implement standard behaviors –Reservation, pre-emption, virtualization –Grid operation can have profound implications for resource behavior Grid resource Registration, enquiry, management, access protocol(s)
12 I.Foster LCG Architecture (2): Connectivity Layer Protocols & Services Communication –Internet protocols: IP, DNS, routing, etc. Security: Grid Security Infrastructure (GSI) –Uniform authentication & authorization mechanisms in multi-institutional setting –Single sign-on, delegation, identity mapping –Public key technology, SSL, X.509, GSS-API (several Internet drafts document extensions) –Supporting infrastructure: Certificate Authorities, key management, etc.
13 I.Foster LCG Architecture (3): Resource Layer Protocols & Services Resource management: GRAM –Remote allocation, reservation, monitoring, control of [compute=>arbitrary] resources Data access: GridFTP –High-performance data access & transport Information/monitoring –MDS: Access to structure & state information –GMA & others : database access, code repository access, virtual data, … All integrated with GSI
14 I.Foster LCG Grid Services Architecture (4): Collective Layer Protocols & Services Community membership & policy –E.g., Community Authorization Service Index/metadirectory/brokering services –E.g., Globus GIIS, Condor Matchmaker, DAGMAN Replica management and replica selection –E.g., GDMP –Optimize aggregate data access performance Co-reservation and co-allocation services –End-to-end performance Middle tier services –MyProxy credential repository, portal services
15 I.Foster LCG Evolution of Grid Architecture Up to 1998 –Basic mechanisms: Authentication, virtualization, resource management, information/monitoring –Condor, Globus Toolkit, SRB, etc. –Early application experiments on O(60) site testbeds –Data Grid protocols and services; GDMP, GridFTP, DRM, etc. –First experiences with production operation –Further evolution in protocol base (Web services) –Higher-level services, reliability, scalability
16 I.Foster LCG The Grid Information Problem Large numbers of distributed “sensors” with different properties Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type
17 I.Foster LCG Grid Information Architecture Registration & enquiry protocols, information models, query languages –Provides standard interfaces to sensors –Supports different “directory” structures supporting various discovery/access strategies
18 I.Foster LCG Web Services “Web services” provide –A standard interface definition language (WSDL) –Standard RPC protocol (SOAP) [but not required] –Emerging higher-level services (e.g., workflow) Nothing to do with the Web Useful framework/toolset for Grid applications? –See proposed Open Grid Services Architecture Represent a natural evolution of current technology –No need to change any existing plans –Introduce in phased fashion when available –Maintain focus on hard issues: how to structure services, build applications, operate Grids For more info:
19 I.Foster LCG Identifying and Addressing Technology Challenges 1) Identify and correct critical technology challenges –We don’t know all of the problems yet 2) Develop coherent Grid technology architecture –To conserve scarce resources; for experiments Both challenges can be addressed by a pragmatic, experiential strategy –Build and run joint testbeds of increasing size –Gain experience “at scale” –Mix and match technologies –Coordinated projects to resolve problems
20 I.Foster LCG Summary We have a solid base on which we can build –Still learning how to deploy and operate Success of LCG (and EDG, GriPhyN, PPDG, …) requires –Focused, methodical effort to deploy and operate –Continued iteration on core components –Collaborative design and development of higher- level services –Early adoption and experimentation by experiments We are not alone in these endeavors –Dozens of other Grid projects worldwide –Significant and growing industrial participation