1 ARGONNE  CHICAGO How the Linux and Grid Communities can Build the Next- Generation Internet Platform Ian Foster Argonne National.

Slides:



Advertisements
Similar presentations
All rights reserved © 2006, Alcatel Grid Standardization & ETSI (May 2006) B. Berde, Alcatel R & I.
Advertisements

Distributed Data Processing
Distributed Processing, Client/Server and Clusters
High Performance Computing Course Notes Grid Computing.
Introduction to DBA.
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
CS-3013 & CS-502, Summer 2006 Virtual Machine Systems1 CS-502 Operating Systems Slides excerpted from Silbershatz, Ch. 2.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.
Chapter 13 Embedded Systems
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Figure 1.1 Interaction between applications and the operating system.
Cloud Computing (101).
Grid Computing Net 535.
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
CLOUD COMPUTING. A general term for anything that involves delivering hosted services over the Internet. And Cloud is referred to the hardware and software.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
The Challenges of Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
CHAPTER 2 OPERATING SYSTEM OVERVIEW 1. Operating System Operating System Definition A program that controls the execution of application programs and.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Click to add text TWA Cloud Integration with Tivoli Service Automation Manager TWS Education.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
DISTRIBUTED COMPUTING
Ian Foster Argonne National Lab University of Chicago Globus Project The Grid and Meteorology Meteorology and HPN Workshop, APAN.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 ARGONNE  CHICAGO Grid Introduction and Overview Ian Foster Argonne National Lab University of Chicago Globus Project
Authors: Ronnie Julio Cole David
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
VMware vSphere Configuration and Management v6
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
7. Grid Computing Systems and Resource Management
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
©Ian Sommerville 2000, Tom Dietterich 2001 Slide 1 Distributed Systems Architectures l Architectural design for software that executes on more than one.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Clouds , Grids and Clusters
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Grid Computing.
University of Technology
Introduction to Cloud Computing
Cloud Computing.
GGF15 – Grids and Network Virtualization
Grid Introduction and Overview
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Introduction to Grid Technology
Large Scale Distributed Computing
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Review of grid computing
Presentation transcript:

1 ARGONNE  CHICAGO How the Linux and Grid Communities can Build the Next- Generation Internet Platform Ian Foster Argonne National Lab University of Chicago Globus Project

2 ARGONNE  CHICAGO Ottawa Linux Symposium, July 24, 2003 Linux has gained tremendous traction as a server operating system. However, a variety of technology trends, the Grid being one, are converging to create a service-based future in which functions such as computing and storage are virtualized and services and resources are increasingly integrated within and across enterprises. The servers that will power this sort of environment will require new capabilities including high scalability, integrated resource management, and RAS. I discuss what I see as development priorities if Linux is to retain its leadership role as a server operating system.

3 ARGONNE  CHICAGO The (Power) Grid: On-Demand Access to Electricity Time Quality, economies of scale

4 ARGONNE  CHICAGO By Analogy, A Computing Grid Decouple production and consumption –Enable on-demand access –Achieve economies of scale –Enhance consumer flexibility –Enable new devices On a variety of scales –Department –Campus –Enterprise –Internet

5 ARGONNE  CHICAGO Requirements Dynamically link resources/services –From collaborators, customers, eUtilities, … (members of evolving “virtual organization”) Into a “virtual computing system” –Dynamic, multi-faceted system spanning institutions and industries –Configured to meet instantaneous needs, for: Multi-faceted QoX for demanding workloads –Security, performance, reliability, …

6 ARGONNE  CHICAGO For Example: Real-Time Online Processing Servers: Execution Application Services: Distribution Applications: Delivery Application Virtualization Automatically connect applications to services Dynamic & intelligent provisioning Infrastructure Virtualization Dynamic & intelligent provisioning Automatic failover

7 ARGONNE  CHICAGO Examples of Linux-Based Grids: High Energy Physics Production Run on the Integration Testbed –Simulate 1.5 million full CMS events for physics studies: ~500 sec per event on 850 MHz processor –2 months continuous running across 5 testbed sites –Managed by a single person at the US-CMS Tier 1

8 ARGONNE  CHICAGO Examples of Linux-Based Grids: Earthquake Engineering U.Nevada Reno

9 ARGONNE  CHICAGO Grid Technologies & Community Grid technologies developed since mid-90s –Product of work on resource sharing for scientific collaboration; commercial adoption Open source Globus Toolkit has emerged as a de facto standard –International community of contributors –Thousands of deployments worldwide –Commercial support providers Global Grid Forum serves as a community and standards body –Home to recent OGSA work

10 ARGONNE  CHICAGO Increased functionality, standardization Custom solutions Open Grid Services Arch Real standards Multiple implementations Web services, etc. Managed shared virtual systems Computer science research Globus Toolkit Defacto standard Single implementation Internet standards The Emergence of Open Grid Standards 2010

Service registry Service requestor (e.g. user application) Service factory Create Service Grid Service Handle Resource allocation Service instances Regist er Service Service discovery Interactions standardized using WSDL and SOAP Service data Keep-alives Notifications Service invocation Authentication & Authorization are applied to all requests Open Grid Services Infrastructure (OGSI)

12 ARGONNE  CHICAGO Web Services: Basic Functionality OGSA Open Grid Services Architecture OGSI: Interface to Grid Infrastructure Applications in Problem Domain X Compute, Data & Storage Resources Distributed Application & Integration Technology for Problem Domain X Users in Problem Domain X Virtual Integration Architecture Generic Virtual Service Access and Integration Layer - Structured Data Integration Structured Data Access Structured Data RelationalXMLSemi-structured Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation

13 ARGONNE  CHICAGO But It’s Not Turtles All the Way Down Our ability to deliver virtualized services efficiently and with desired QoX ultimately depends on the underlying platform! At multiple levels, including but not limited to –Dynamic provisioning & resource management –Reliability, availability, manageability –Performance and parallelism New demands on the OS in each area

14 ARGONNE  CHICAGO (1) Dynamic Provisioning Static provisioning dedicates resources –Typical of “co-lo” hosting –Reprovision manually as needed But load is dynamic –Must overprovision for surges –High variable cost of capacity Need dynamic provisioning to achieve true economies of scale –Load multiplexing –Tradeoff cost vs. quality –Service level agreements –Dynamic resource recruitment

15 ARGONNE  CHICAGO Load Is Dynamic ibm.com external site February 2001 Daily fluctuations (3x) Workday cycle Weekends off World Cup soccer site May-June 1998 Seasonal fluctuations Event surges (11x) ita.ee.lbl.gov M T W Th F S S M T W Th F S S Week Week 6 7 8

16 ARGONNE  CHICAGO For Example: Energy-Conscious Provisioning Light load: concentrate traffic on a minimal set of servers –Step down surplus servers to low-power state APM and ACPI –Activate surplus servers on demand Wake-On-LAN Browndown: provision for a specified energy target Even smarter: also manage air conditioning CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand.

17 ARGONNE  CHICAGO Power Management via MUSE: IBM Trace Run (Before) 1 ms Throughput (requests/s ) Power draw (watts) Latency (ms*50) MUSE: Jeff Chase et al., Duke University (SOSP 2003)

18 ARGONNE  CHICAGO Power Management via MUSE: IBM Trace Run (After) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003)

19 ARGONNE  CHICAGO Dynamic Provisioning: OS Issues Hot plug memory, CPU, and I/O –For partitioning, core virtualization capabilities Security –Containment & data integrity in a virtualized environment: user-mode Linux++? Scheduler improvements for resource and workload management –Allocate for required resource consumption –Dynamic, sub processor logical partitioning Improved instrumentation & accounting –Determine actual resource consumption

20 ARGONNE  CHICAGO (2) Reliability, Availability, Manageablity Error log and diagnostics frameworks –Foundation for automated error analysis and recovery of distributed & remote systems –Enable problem determination, automated reconfiguration, localization of failure Configuration management –Determine hardware configuration/inventory –Apply/remove service/support patches –Isolate failing components quickly

21 ARGONNE  CHICAGO (3) Performance and Parallelism: E.g., Data Integration Assume –Remote data at 1 GB/s –10 local bytes per remote –100 operations per byte Local Network Wide area link (end-to-end switched lambda?) 1 GB/s Parallel I/O: 10 GB/s Parallel computation: 1000 Gop/s Remote data >1 GByte/s achievable today (FAST, 7 streams, LA  Geneva)

22 ARGONNE  CHICAGO Performance and Parallelism Distributed/cluster/parallel file systems Optimized TCP/IP stacks Scheduling of computation & communication Web100 configuration & instrumentation

23 ARGONNE  CHICAGO Web100: Overcome TCP/IP “Wizard Gap”

24 ARGONNE  CHICAGO Web100 Kernel Instrument Set Definition –Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection How it is implemented –Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure –Linux /proc interface is used to expose these instruments outside the kernel

25 ARGONNE  CHICAGO For Example … Recent transAtlantic transfer showed frequent drops in data rate But no loss or retransmit Web100 identified problem as Linux send stall congestion events

26 ARGONNE  CHICAGO Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility Grid/Linux Cooperation: We Have Testbeds, Users, Applications Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton

27 ARGONNE  CHICAGO Increased Flexibility (and Complexity) Evolution of the Server Time Significant implications for the underlying operating system

28 ARGONNE  CHICAGO Summary The Grid community is creating middleware for distributed resource & service sharing –Open source software for resource & service virtualization, service management/integration –Motivated by wonderful applications –But we need help from the OS Linux: the next-generation Internet platform? –Could be: but significant evolution is required to address provisioning/resource management; availability, manageability; performance and parallelism; and other issues –Grid community can provide testbeds, users, requirements, applications

29 ARGONNE  CHICAGO For More Information The Globus Project™ – Global Grid Forum – Background information – GlobusWORLD 2004 – –Jan 20–23, San Fran 2nd Edition: November 2003