The Grid and the Future of Business Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster
Grid Computing
Technical Universe, Circa 1890 Ubiquitous communication infrastructure The telegraph Local power generation Electric power generators serve at most city blocks
Technical Universe, Circa 2000 Ubiquitous comms infrastructure Internet, email, Web Ubiquitous power distribution (Reliable?) standard access Tremendous variety of devices Local computing Most computing and storage on internal enterprise computers
Exponentials (and Coefficients) Network vs. computer performance Computer speed doubles every 18 months Network speed doubles every 9 months Difference = order of magnitude per 5 years 1986 to 2000 Computers: x 500 Networks: x 340,000 2001 to 2010 Computers: x 60 Networks: x 4000 Scientific American (Jan-2001)
Therefore: A Computing Grid On-demand, ubiquitous access to computing, data, and services New capabilities constructed dynamically and transparently from distributed services “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder)
My Presentation The emergence of the Grid concept Grids and e-business Origins in eScience, and the Globus Toolkit Grids and e-business Opportunities & requirements Technology convergence Open Grid Services Architecture Summary
My Presentation The emergence of the Grid concept Grids and e-business Origins in eScience, and the Globus Toolkit Grids and e-business Opportunities & requirements Technology convergence Open Grid Services Architecture Summary
E-Science: The Original Grid Driver Pre-electronic science Theorize &/or experiment, in small teams Post-electronic science Construct and mine very large databases Develop computer simulations & analyses Access specialized devices remotely Exchange information within distributed multidisciplinary teams Need to manage dynamic, distributed infrastructures, services, and applications
And Thus: The Grid “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations”
The Grid: A Brief History Early 90s Gigabit testbeds, metacomputing Mid to late 90s Early experiments (e.g., I-WAY), academic software projects (e.g., Globus), applications 2002 Dozens of application communities & projects Significant technology base (Globus ToolkitTM) Global Grid Forum: ~500 people, 20+ countries
Sloan Digital Sky Survey Analysis
Sloan Digital Sky Survey Analysis Size distribution of galaxy clusters? Chimera Virtual Data System + iVDGL Data Grid (many CPUs) Galaxy cluster size distribution
A Large Virtual Organization: CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs?
Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Pentium II 300 MHz Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec HPSS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Caltech ~1 TIPS Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents
Grids at NASA: Aviation Safety Wing Models Lift Capabilities Drag Capabilities Responsiveness Stabilizer Models Airframe Models Deflection capabilities Responsiveness Crew Capabilities - accuracy - perception - stamina - re-action times - SOPs Human Models Engine Models Braking performance Steering capabilities Traction Dampening capabilities Thrust performance Reverse Thrust performance Responsiveness Fuel Consumption Landing Gear Models
Life Sciences: Telemicroscopy DATA ACQUISITION PROCESSING, ANALYSIS ADVANCED VISUALIZATION NETWORK IMAGING INSTRUMENTS COMPUTATIONAL RESOURCES LARGE DATABASES
Underlying Technical Requirements Dynamic formation and management of virtual organizations Online negotiation of access to services: who, what, why, when, how Configuration of applications and systems able to deliver multiple qualities of service Autonomic management of distributed infrastructures, services, and applications
State of the Art: Globus ToolkitTM (since 1996) Small, standards-based set of protocols Authentication, delegation; resource discovery; reliable invocation; etc. Information-centric design Data models; publication, discovery protocols Open source implementation & community With commercial support Enabler of services and applications
Grid Projects in eScience
My Presentation The emergence of the Grid concept Grids and e-business Origins in eScience, and the Globus Toolkit Grids and e-business Opportunities & requirements Technology convergence Open Grid Services Architecture Summary
And What About Business? Fragmentation of enterprise infrastructure Specialized platforms -> commodity servers “Intelligence” embedded in networks The rise of the “eUtility” (IBM, HP, …) Outsourcing, economies of scale Business-to-business computing Especially complex virtual organizations Ever more challenging QoS requirements “Green-screen” -> “ubiquitious web presence”
Today’s Enterprise Computing Environment
The Business Opportunity On-demand computing, storage, services Significant savings due to reduced build-out, economies of scale, reduced admin costs Greater flexibility => greater productivity Entirely new applications and services Based on high-speed resource integration Solution to enterprise computing crisis Render distributed infrastructures manageable
Grids and Industry: Early Examples Entropia: Distributed computing (BMS, Novartis, …) Butterfly.net: Grid for multi-player games
Realizing the Promise Requires Significant Innovation Automation of infrastructure operation to achieve economies of scale Management and component models for distributed service provisioning New applications and tools powered by distributed services and resources Business and service models to support specialization of function
My Presentation The emergence of the Grid concept Grids and e-business Origins in eScience, and the Globus Toolkit Grids and e-business Opportunities & requirements Technology convergence Open Grid Services Architecture Summary
Grid Evolution: Open Grid Services Architecture Refactor Globus protocol suite to enable common base and expose key capabilities Service orientation to virtualize resources and unify resources/services/information Embrace key Web services technologies: standard IDL, leverage commercial efforts Result: standard interfaces & behaviors for distributed system management
Transient Service Instances “Web services” address discovery & invocation of persistent services Interface to persistent state of entire enterprise In Grids, must also support transient service instances, created/destroyed dynamically Interfaces to the states of distributed activities E.g. workflow, video conf., dist. data analysis Significant implications for how services are managed, named, discovered, and used
Open Grid Services Architecture Defines fundamental (WSDL) interfaces and behaviors that define a Grid Service Required + optional interfaces = WS “profile” Defines WSDL extensibility elements E.g., serviceType (a group of portTypes) Open source Globus Toolkit 3.0 Leverage GT experience, code, community And also commercial implementations
The Grid Service = Interfaces/Behaviors + Service Data (required) Service data access Explicit destruction Soft-state lifetime … other interfaces … (optional) Standard: - Notification - Authorization - Service creation - Service registry - Manageability - Concurrency + application- specific interfaces Service data element Service data element Service data element Binding properties: - Reliable invocation - Authentication Implementation Hosting environment/runtime (“C”, J2EE, .NET, …)
Example: Database Service DBaccess Grid service supports at least two portTypes GridService DBaccess Each has service data GridService: basic introspection, lifetime, … DBaccess: database type, current load, …, … Maybe other portTypes as well E.g., NotificationSource (SDE = subscribers) Grid Service DBaccess Name, lifetime, etc. DB info
Data Mining for Bioinformatics Community Registry Mining Factory Database Service BioDB 1 Compute Service Provider User Application . . “I want to create a personal database containing data on e.coli metabolism” Database Service Database Factory BioDB n Storage Service Provider
Data Mining for Bioinformatics “Find me a data mining service, and somewhere to store data” Community Registry Mining Factory Database Service BioDB 1 Compute Service Provider User Application . . Database Service Database Factory BioDB n Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service Handles for Mining and Database factories BioDB 1 Compute Service Provider User Application . . Database Service Database Factory BioDB n Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service “Create a data mining service with initial lifetime 10” BioDB 1 Compute Service Provider User Application . . “Create a database with initial lifetime 1000” Database Service Database Factory BioDB n Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service “Create a data mining service with initial lifetime 10” BioDB 1 Miner Compute Service Provider User Application . . “Create a database with initial lifetime 1000” Database Service Database Factory BioDB n Database Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service Query BioDB 1 Miner Compute Service Provider User Application . . Query Database Service Database Factory BioDB n Database Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service Query BioDB 1 Miner Keepalive Compute Service Provider User Application . . Query Database Service Database Factory Keepalive BioDB n Database Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service BioDB 1 Miner Keepalive Compute Service Provider User Application . . Results Database Service Database Factory Keepalive Results BioDB n Database Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service BioDB 1 Miner Compute Service Provider User Application . . Database Service Database Factory Keepalive BioDB n Database Storage Service Provider
Data Mining for Bioinformatics Community Registry Mining Factory Database Service BioDB 1 Compute Service Provider User Application . . Database Service Database Factory Keepalive BioDB n Database Storage Service Provider
GT3: OGSA-Based Globus Toolkit GT3 Core Grid service interfaces Reference impln of evolving standard Multiple hosting envs: Java/J2EE, C, C#/.NET? GT3 Base Services Globus capabilities Many other services Other Grid GT3 Services Data Services GT3 Base Services GT3 Core
OGSA: Current Status Grid service specification & other documents moving forward in GGF www.gridforum.org/ogsi-wg Globus Project on track for open source OGSA-based GT3 release end of 2002 www.globus.org/ogsa IBM committed to various OGSA-compliant software releases (e.g., WebSphere) Other industrial efforts underway
My Presentation The emergence of the Grid concept Grids and e-business Origins in eScience, and the Globus Toolkit Grids and e-business Opportunities & requirements Technology convergence Open Grid Services Architecture Summary
Summary: The Grid Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations On-demand, ubiquitous access to computing, data, and services New capabilities constructed dynamically and transparently from distributed services Evolved to be dominant eScience, now transitioning to industry (think Web in 1994)
Open Grid Services Architecture Open Grid Services Architecture represents next step in Grid evolution Service orientation enables unified treatment of resources, data, and services Standard interfaces and behaviors (the Grid service) for managing distributed state Open source Globus Toolkit implementation (and numerous commercial value adds)
For More Information Grid Book Survey articles The Globus Project™ www.mkp.com/grids Survey articles www.mcs.anl.gov/~foster The Globus Project™ www.globus.org Global Grid Forum www.gridforum.org Edinburgh, July 22-24 Chicago, Oct 15-17