Grid Computing Exposing the myths of desktop scavenging grids John Easton – IBM Grid computing

Slides:



Advertisements
Similar presentations
Scheduling Algorithems
Advertisements

Display Power Management Policies in Practice Stephen P. Tarzia Peter A. Dinda Robert P. Dick Gokhan Memik Presented by: Andrew Hahn.
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
CS1104: Computer Organisation School of Computing National University of Singapore.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
SLA-Oriented Resource Provisioning for Cloud Computing
Interrupts Chapter 8 – pp Chapter 10 – pp Appendix A – pp 537 &
INSE - Lectures 19 & 20 SE for Real-Time & SE for Concurrency  Really these are two topics – but rather tangled together.
1 Uniprocessor Scheduling Types of scheduling –The aim of processor scheduling is to assign processes to be executed by the processor so as to optimize.
CNT 4603: Managing/Maintaining Server 2008 – Part 3 Page 1 Dr. Mark Llewellyn © CNT 4603: System Administration Spring 2014 Managing And Maintaining Windows.
Making HTCondor Energy Efficient by identifying miscreant jobs Stephen McGough, Matthew Forshaw & Clive Gerrard Newcastle University Stuart Wheater Arjuna.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Best Practice for Performance Edward M. Kwang President.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
04/16/2010CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying an earlier edition.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
1 Internet Management and Security We will look at management and security of networks and systems. Systems: The end nodes of the Internet Network: The.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Chapter 3 Overview of Operating Systems Copyright © 2008.
OS and the Computer System  Some OS programs exist permanently in the system area of the memory to monitor and control activities in the computer system.
PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Ajou University, South Korea ICSOC 2003 “Disconnected Operation Service in Mobile Grid Computing” Disconnected Operation Service in Mobile Grid Computing.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.1.
I/O Systems I/O Hardware Application I/O Interface
Your First Azure Application Michael Stiefel Reliable Software, Inc.
Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Chapter 8 I/O. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 8-2 I/O: Connecting to Outside World So far,
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
INDUSTRIAL PROJECT (234313) ULTRASOUND SCANNER EMBEDDED ONLINE PROFILER Students: Liat Peterfreund, Hagay Myr Supervisor: Mr. Tomer Gal (GE Healthcare)
Networked Computer Power Management Software Determining “Equivalency” to Surveyor RTF Meeting February 5, 2008.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
BOINC.
Chapter 13: I/O Systems. 13.2/34 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware.
Computer Performance Computer Engineering Department.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
Math 4030 – 9a Introduction to Hypothesis Testing
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
EGRE 426 Computer Organization and Design Chapter 4.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
BYOD: An IT Security Perspective. What is BYOD? Bring your own device - refers to the policy of permitting employees to bring personally owned mobile.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Project Initium: Remote Job Submission Pawel Krepsztul Fairfield University Electrical and Computer Engineering program.
Local Scheduling for Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab John McLeod VII Sybase March 30, 2007.
Chapter 6: CPU Scheduling (Cont’d)
Chapter 19: Real-Time Systems
Job Scheduling in a Grid Computing Environment
Operations Management
CSCI 315 Operating Systems Design
Basic Grid Projects – Condor (Part I)
I/O Systems I/O Hardware Application I/O Interface
CPU SCHEDULING.
Operations Management
Chapter 19: Real-Time Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
GPU Select Scheduling to Improve Job Success Rates on Titan
Implementation of a small-scale desktop grid computing infrastructure in a commercial domain    
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Chapter 1: Creating a Program.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Grid Computing Exposing the myths of desktop scavenging grids John Easton – IBM Grid computing

Grid Computing With so much “compute power” out there…  Sat 08 Oct 21:08: GMT  URL: setiathome.ssl.berkeley.edu/ setiathome.ssl.berkeley.edu/  Total credit granted: –2,799,193,542 cobblestones  Total users: –220,769 users  Total hosts: –469,244 host(s) World Community Grid  10/09/ :06:02 (UTC)  URL:  Total Run Time (y:d:h:m:s) –17,675:218:05:11:19  Members –94,038  Devices –152,071 … how come so many companies have failed to make desktop scavenging “work”? Cobblestone 1, is 1/100 day of CPU time on a reference computer that does: 1,000 double-precision MIPS based on the Whetstone benchmark. 1,000 VAX MIPS based on the Dhrystone benchmark. On this basis cobblestones = 76,690 years of 1000 VAX MIPS

Grid Computing A desktop scavenging “timeline” System is ‘idle’ Screensaver activated Keyboard / mouse / network activity - screensaver deactivated Average job runtime Average user idle time Typically set by security policy KMN daemon Triggers grid Activation (polling interval Vs. interrupt driven) System restored to usable state

Grid Computing Things to think about….  Push vs. pull scheduling  100% utilisation is generally a BAD thing for a PC  How do you know the machine is available for work? –Screensaver kicks in –KMN (keyboard / mouse / network) activity  Polling vs. interrupt-driven  Job criticality –Response time  What is the critical mass of systems to deliver response time AND throughput? – System population vs. user activity profiles vs. application considerations

Grid Computing Job & task elapse distributions Total Jobs: 1167 Average elapse: 100 sec. Total tasks: Average elapse: 83 sec.

Grid Computing Problem statement  Average task duration = 83 seconds on a 1.7 GHz CPU. –2.5 minutes to complete on the standard customer configuration of a 800 MHz Intel.  To determine whether tasks lasting on average 2.5 minutes would statistically complete on a PC grid, during the day time, in intervals where PC users would be inactive on their machine. –How often will an active task be kicked out by a user getting back to his machine.  The answer to this question depends on two major data. –Processing priority in favor of the total restitution time  Redispatch the task to another engine in case of user activity vs. a mechanism that would put the process into sleep mode for an unpredictable duration –What is the statistical behavior of the average PC user?  How many intervals is he/she is away from the PC during the day  What are durations of those intervals.  Strategy used by the grid to determine when an interval should start. –The usual strategy is to assume that if the user has been inactive for say 10 minutes, the chance that they will return in the next minute is low enough to consider sending a task to the PC.

Grid Computing PC Grid availability daily profile Total PC inventory : 100 Idle time threshold : 10 minutes Actual data from a live production grid - On average only 20% of systems are available during the working day !! - This is for ‘desktop’ systems – imagine what the effect of laptops is!!

Grid Computing PC Grid usability analysis Total PC inventory : 100 Idle time threshold : 10 minutes Actual data from a live production grid 17 Intervals -If we assume a 10 minute idle time and a 2.5 minute runtime there are 17 -intervals of between 11 and 12 minutes

Grid Computing PC Grid Usability Simulation Total PC inventory : 100 Idle time threshold : 10 minutes Task elapse: 2.5 minutes Number of Intervals : 425 Processed tasks : 2726 Interrupted tasks : 425 i.e. 6.2 % Average time lost: 0.93 min / retry 17x4 = 68 tasks x2.5 min = 170 min

Grid Computing Results  17 intervals of between 11 & 12 minutes can accommodate on average 4 tasks of 2.5 minutes each. –17 intervals could support processing of 68 tasks (4 x 17) => 170 minutes of usable capacity with 17 interrupted tasks.  Analysis across all possible intervals gives cumulated figures of 425 intervals supporting up to 2726 task executions with only 425 interruptions. –A 6.2 % probability of interruption and an average lost time of less than 1 minute –2726 intervals represent 14% of the total PC capacity across an 8 hours window.  This kind of analysis demonstrated that the Grid was indeed viable for this organisation.  The target deployment was 1000 PCs delivering an average of 140 PCs full in addition to the 1000 PCs capacity fully available during nights and week ends.  If average task duration had been longer, then the probability for each task to be interrupted would have been higher. –For longer running tasks, this grid implementation would created significant network traffic without any work done. Results would have been unacceptable.

Grid Computing Conclusions  100% utilisation is generally a bad thing on a PC –It is better to run on an idle system than ‘compete’ with the user  Criticality of response time? –Task runtime should be shorter than average idle time –How do you work out whether the dispatch a work-unit to an idle PC?  What is the critical mass of systems to deliver response time AND throughput? – System population vs. user activity profiles vs. application considerations

Grid Computing Questions?