The Grid: Globus and the Open Grid Services Architecture Dr. Carl Kesselman Director Center for Grid Technologies Information Sciences Institute University of Southern California
Outline l Why Grids l Grid Technology l Applications of Grids in Physics l Summary
Grid Computing
How do we solve problems? l Communities committed to common goals -Virtual organizations l Teams with heterogeneous members & capabilities l Distributed geographically and politically -No location/organization possesses all required skills and resources l Adapt as a function of the situation -Adjust membership, reallocate responsibilities, renegotiate resources
The Grid Vision “ Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” -On-demand, ubiquitous access to computing, data, and services -New capabilities constructed dynamically and transparently from distributed services “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder)
The Grid Opportunity: eScience and eBusiness l Physicists worldwide pool resources for peta-op analyses of petabytes of data l Civil engineers collaborate to design, execute, & analyze shake table experiments l An insurance company mines data from partner hospitals for fraud detection l An application service provider offloads excess load to a compute cycle provider l An enterprise configures internal & external resources to support eBusiness workload
Grid Communities & Applications: Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents
Grid Communities and Applications: Network for Earthquake Eng. Simulation l NEESgrid: US national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other l On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
Living in an Exponential World (1) Computing & Sensors Moore’s Law: transistor count doubles each 18 months Magnetohydro- dynamics star formation
Living in an Exponential World: (2) Storage l Storage density doubles every 12 months l Dramatic growth in online data (1 petabyte = 1000 terabyte = 1,000,000 gigabyte) -2000~0.5 petabyte -2005~10 petabytes -2010~100 petabytes -2015~1000 petabytes? l Transforming entire disciplines in physical and, increasingly, biological sciences; humanities next?
An Exponential World: (3) Networks (Or, Coefficients Matter …) l Network vs. computer performance -Computer speed doubles every 18 months -Network speed doubles every 9 months -Difference = order of magnitude per 5 years l 1986 to Computers: x 500 -Networks: x 340,000 l 2001 to Computers: x 60 -Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan- 2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Requirements Include … l Dynamic formation and management of virtual organizations l Online negotiation of access to services: who, what, why, when, how l Establishment of applications and systems able to deliver multiple qualities of service l Autonomic management of infrastructure elements l Open, extensible, evolvable infrastructure
The Grid World: Current Status l Dozens of major Grid projects in scientific & technical computing/research & education l Considerable consensus on key concepts and technologies -Open source Globus Toolkit™ a de facto standard for major protocols & services -Far from complete or perfect, but out there, evolving rapidly, and large tool/user base l Industrial interest emerging rapidly l Opportunity: convergence of eScience and eBusiness requirements & technologies
Globus Toolkit l Globus Toolkit is the source of many of the protocols described in “Grid architecture” l Adopted by almost all major Grid projects worldwide as a source of infrastructure l Open source, open architecture framework encourages community development l Active R&D program continues to move technology forward l Developers at ANL, USC/ISI, NCSA, LBNL, and other institutions
User process #1 Proxy Authenticate & create proxy credential GSI (Grid Security Infrastruc- ture) Gatekeeper (factory) Reliable remote invocation GRAM (Grid Resource Allocation & Management) Reporter (registry + discovery) User process #2 Proxy #2 Create process Register The Globus Toolkit in One Slide l Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services) l Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, … Other service (e.g. GridFTP) Other GSI- authenticated remote service requests GIIS: Grid Information Index Server (discovery) MDS-2 (Meta Directory Service) Soft state registration; enquiry
Globus Toolkit: Evaluation (+) l Good technical solutions for key problems, e.g. u Authentication and authorization u Resource discovery and monitoring u Reliable remote service invocation u High-performance remote data access l This & good engineering is enabling progress u Good quality reference implementation, multi- language support, interfaces to many systems, large user base, industrial support u Growing community code base built on tools
Globus Toolkit: Evaluation (-) l Protocol deficiencies, e.g. u Heterogeneous basis: HTTP, LDAP, FTP u No standard means of invocation, notification, error propagation, authorization, termination, … l Significant missing functionality, e.g. u Databases, sensors, instruments, workflow, … u Virtualization of end systems (hosting envs.) l Little work on total system properties, e.g. u Dependability, end-to-end QoS, … u Reasoning about system properties
“Web Services” l Increasingly popular standards-based framework for accessing network applications -W3C standardization; Microsoft, IBM, Sun, others l WSDL: Web Services Description Language -Interface Definition Language for Web services l SOAP: Simple Object Access Protocol -XML-based RPC protocol; common WSDL target l WS-Inspection -Conventions for locating service descriptions l UDDI: Universal Desc., Discovery, & Integration -Directory for Web services
Web Services Example: Database Service l WSDL definition for “DBaccess” porttype defines operations and bindings, e.g.: -Query(QueryLanguage, Query, Result) -SOAP protocol l Client C, Java, Python, etc., APIs can then be generated DBaccess
Transient Service Instances l “Web services” address discovery & invocation of persistent services -Interface to persistent state of entire enterprise l In Grids, must also support transient service instances, created/destroyed dynamically -Interfaces to the states of distributed activities -E.g. workflow, video conf., dist. data analysis l Significant implications for how services are managed, named, discovered, and used -In fact, much of our work is concerned with the management of service instances
OGSA Design Principles l Service orientation to virtualize resources -Everything is a service l From Web services -Standard interface definition mechanisms: multiple protocol bindings, local/remote transparency l From Grids -Service semantics, reliability and security models -Lifecycle management, discovery, other services l Multiple “hosting environments” -C, J2EE,.NET, …
OGSA Service Model l System comprises (a typically few) persistent services & (potentially many) transient services -Everything is a service l OGSA defines basic behaviors of services: fundamental semantics, life-cycle, etc. -More than defining WSDL wrappers
Open Grid Services Architecture: Fundamental Structure l WSDL conventions and extensions for describing and structuring services -Useful independent of “Grid” computing l Standard WSDL interfaces & behaviors for core service activities -portTypes and operations => protocols
The Grid Service = Interfaces + Service Data Service data element Service data element Service data element GridService… other interfaces … Implementation Service data access Explicit destruction Soft-state lifetime Notification Authorization Service creation Service registry Manageability Concurrency Reliable invocation Authentication Hosting environment/runtime (“C”, J2EE,.NET, …)
The GriPhyN Project l Amplify science productivity through the Grid -Provide powerful abstractions for scientists: datasets and transformations, not files and programs -Using a grid is harder than using a workstation. GriPhyN seeks to reverse this situation! l These goals challenge the boundaries of computer science in knowledge representation and distributed computing. l Apply these advances to major experiments -Not just developing solutions, but proving them through deployment
GriPhyN Approach l Virtual Data -Tracking the derivation of experiment data with high fidelity -Transparency with respect to location and materialization l Automated grid request planning -Advanced, policy driven scheduling l Achieve this at peta-scale magnitude l We present here a vision that is still 3 years away, but the foundation is starting to come together
Virtual Data l Track all data assets l Accurately record how they were derived l Encapsulate the transformations that produce new data objects l Interact with the grid in terms of requests for data derivations
GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG (concrete) DAG (abstract) DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus
Virtual Data in CMS Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN
NCSA Linux cluster 5) Secondary reports complete to master Master Condor job running at Caltech 7) GridFTP fetches data from UniTree NCSA UniTree - GridFTP- enabled FTP server 4) 100 data files transferred via GridFTP, ~ 1 GB each Secondary Condor job on WI pool 3) 100 Monte Carlo jobs on Wisconsin Condor pool 2) Launch secondary job on WI pool; input files via Globus GASS Caltech workstation 6) Master starts reconstruction jobs via Globus jobmanager on cluster 8) Processed objectivity database stored to UniTree 9) Reconstruction job reports complete to master GriPhyN Challenge Problem: CMS Event Reconstruction Work of: Scott Koranda, Miron Livny, Vladimir Litvin, & others
GriPhyN-LIGO SC2001 Demo Work of: Ewa Deelman, Gaurang Mehta, Scott Koranda, & others
iVDGL: A Global Grid Laboratory l International Virtual-Data Grid Laboratory -A global Grid laboratory (US, Europe, Asia, South America, …) -A place to conduct Data Grid tests “at scale” -A mechanism to create common Grid infrastructure -A laboratory for other disciplines to perform Data Grid tests -A focus of outreach efforts to small institutions l U.S. part funded by NSF ( ) -$13.7M (NSF) + $2M (matching) “We propose to create, operate and evaluate, over a sustained period of time, an international research laboratory for data-intensive science.” From NSF proposal, 2001
iVDGL Components l Computing resources -2 Tier1 laboratory sites (funded elsewhere) -7 Tier2 university sites software integration -3 Tier3 university sites outreach effort l Networks -USA (TeraGrid, Internet2, ESNET), Europe (Géant, …) -Transatlantic (DataTAG), Transpacific, AMPATH?, … l Grid Operations Center (GOC) -Joint work with TeraGrid on GOC development l Computer Science support teams -Support, test, upgrade GriPhyN Virtual Data Toolkit l Education and Outreach l Coordination, management
iVDGL Components (cont.) l High level of coordination with DataTAG -Transatlantic research network (2.5 Gb/s) connecting EU & US l Current partners -TeraGrid, EU DataGrid, EU projects, Japan, Australia l Experiments/labs requesting participation -ALICE, CMS-HI, D0, BaBar, BTEV, PDC (Sweden)
-U FloridaCMS -CaltechCMS, LIGO -UC San DiegoCMS, CS -Indiana UATLAS, GOC -Boston UATLAS -U Wisconsin, MilwaukeeLIGO -Penn StateLIGO -Johns HopkinsSDSS, NVO -U Chicago/ArgonneCS -U Southern CaliforniaCS -U Wisconsin, MadisonCS -Salish KootenaiOutreach, LIGO -Hampton UOutreach, ATLAS -U Texas, BrownsvilleOutreach, LIGO -FermilabCMS, SDSS, NVO -BrookhavenATLAS -Argonne LabATLAS, CS Initial US iVDGL Participants Tier2 / Software CS support Tier3 / Outreach Tier1 / Labs (funded elsewhere)
Summary l Technology exponentials are changing the shape of scientific investigation & knowledge -More computing, even more data, yet more networking l The Grid: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations l Current Grid Technology
Partial Acknowledgements l Open Grid Services Architecture design -Karl USC/ISI -Ian Foster, Steve -Jeff Nick, Steve Graham, Jeff IBM l Globus Toolkit R&D also involves many fine scientists & engineers at ANL, USC/ISI, and elsewhere (see l Strong links with many EU, UK, US Grid projects l Support from DOE, NASA, NSF, Microsoft
For More Information l Grid Book - l The Globus Project™ - l OGSA - l Global Grid Forum -