An Introduction to Grid Computing Richard Fujimoto Reference: The Grid 2, ch. 1-4, 7 Ian Foster & Carl Kesselman (eds.)
Outline What is Grid Computing? Why are we interested in Grids? Grid Architecture from 10,000 feet
Evolution of Technology Phase I: Developmental Stage –Concerned with development of the technology –Focus is the technology itself - how it is built, how it works –Users of the technology are experts –If successful, technology grows in popularity, standards develop, costs decline, widespread use –Examples: automobile, electric power Phase II: Post-Technology Phase –Technology is taken for granted, except when it fails –Main issues are application of technology, ease of use, reliability, availability, cost –Experts behind the scenes make it work, transparent to users
Information Technology Fast approaching post-technology phase (mass adoption) –Increasing commoditization (processors, memory, storage, communications) –More complex, powerful, systems –Possible to have systems with billions of devices and sophistication to hide them from users Issues –Integration and standards –Efficiency while maintaining transparency Virtualization seen as key approach to allow transparent, shared resource usage –Quality of Service Sophisticated, end-to-end resource management needed to ensure high quality at low price
Virtual Organizations “… mutually distrustful participants with varying degrees of prior relationships (perhaps none at all) want to share resources in order to perform some task.” [Foster/Kesselman, p. 39] Coordinated, controlled resource sharing among dynamic multi-institutional virtual organizations –Resources Computational facilities Software Data Sensors, instruments, actuators –Control over what is shared, who has access, conditions under which sharing occurs
What is a Grid? Includes three essential elements: Coordinates sharing of distributed resources –Resources and users live within different control domains –Issues such as security, policy, payment, membership etc. Uses standard, open, general-purpose protocols and interfaces –Address issues such as authentication, authorization, resource discovery, resource access Deliver non-trivial qualities of service –Throughput, response time, availability, security But what does this mean? Main elements Distributed computing using Standard interfaces, APIs, tools in order to Virtualize resources, people, applications, to support Virtual organizations
Virtual Observatory Application Multiple archives of astronomic data stored at geographically distributed sites –Each covers part of the electromagnetic spectrum for a certain period of time for certain celestial objects –Desire to do multi-spectral or temporal studies of specific objects by combining data from different archives –Terabytes to petabytes of data; data growing at an exponential rate –Peer-reviewed data! Virtual data [Grid Physics Network Project -GriPhyN] –Pipelined processing of data typical –Data used by analysis packages might be generated dynamically, e.g., query distributed data, processed data in pipeline specified by the user (e.g., recalibration followed by object detection) –Moving data vs. moving computation? Large data sets, operations involving much reduction suggest moving the computation to the data Reference: Grid 2, Chapter 7 (Szalay, Gray)
Hierarchical Architecture Archives –Text, images, raw data –Data mining tools to search and subset data objects –Metadata (units, provenance) Web services –Queries –File transfer –Data format standards (VOTable) - similar to HLA OMT Registries –Records kinds of information stored in each archive - sky coverage, temporal coverage, spectral coverage, resolution Portals –Process user queries by integrating data from different archives
Issues Economics of database queries –Empirical costs for computation, disk space, network bandwidth, DB access; use to compute most economical approach to processing query –Most queries data intensive (<10K instructions per byte) suggesting usually better to move computation near data –Either provide cluster near data, or move database to user (Internet or sneakernet) Compute-Intensive tasks –Raw data must be converted to calibrated, cataloged data –Must reprocess data ~annually due to s/w improvements –Currently, about instructions (15 TB data) - 10 CPU years –Clusters can do it in about 6 weeks –Exploit grid computing Data mining and statistical calculations –Amount of computations for large data sets a major impediment
Sample VO Grid Workflow Locate suitable sites (data archives) Authenticate access to these sites Allocate resources on those computers Select, configure and initiate computations at those sites Automatically and transparently adapt to changes in resource availability, changes in user requirements Display output to user
Grid Architecture Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture Slide courtesy of C. Kessleman Cal(IT)2 Presentation
Fabric Layer Two types of basic services for individual resources Introspection mechanisms –Determination of structure, state, capability of resource Resource management mechanisms –Control over delivered quality of service Resource types and example services Computational resources –Characteristics of hardware/software resources available, status (e.g., load, job queue length) –Starting programs, monitoring and controling execution of processes –Control over resources allocated to processes, advance reservations Storage resources –File access (read, write) –Check availability of memory or disk space –Control of resources allocated for data transfer (e.g., disk bandwidth) Network resources –Control over prioritization, bandwidth allocation –Interrogate for network characteristics of load
Connectivity Layer Communication services between fabric layer resources –Basically, Internet protocols (TCP, UDP, DNS, RSVP, etc.) Authentication protocols –Single sign-on to access multiple resources –Delegation - give program ability to access resources user is authorized to access –Integration with local security mechanisms –User-based trust relationships - if user can access A and B, should be able to access both without requiring A’s and B’s security administrators to interact
Resource Layer: Sharing Single Resources Protocols for secure negotiation, initiation, monitoring, control, accounting, and payment of sharing operations on individual resources Envisioned to be a small set of protocols Use fabric level functions to access and control local resources Information protocols - obtain information on structure and state of resource (e.g., loading, configuration, cost of use) Management protocols - negotiate access to resource, e.g., for QoS –Check usage against policy –Accounting and payment
Collective: Coordinating Multiple Resources Discovery services to allow discovery of resources and queries of status Coallocation, scheduling, and brokering services to utilize multiple resources for a specific purpose Monitoring and diagnostic services Data replication services - manage storage resources to achieve acceptable performance Programming models and tools Grid enable programming systems, e.g., grid-MPI Workflow specification and management Software discovery services Collaborative work services Security, policy, accounting issues
Final Comments Current trends –Merging of Grid and Web services –Has much momentum - substantial industry support –Universally embraced by scientific computing community –Enterprise computing in commercial sector Ideas have been around for awhile (e.g., meta-computing) –Standardization perhaps most important aspect