University of Pittsburgh

Slides:



Advertisements
Similar presentations
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
Advertisements

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.
1 SOFSEM 2007 Weighted Nearest Neighbor Algorithms for the Graph Exploration Problem on Cycles Eiji Miyano Kyushu Institute of Technology, Japan Joint.
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
S ELFISH M IGRATE : A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors Janardhan Kulkarni, Duke University Sungjin Im (UC Merced.
Potential for parallel computers/parallel programming
2007/3/6 1 Online Chasing Problems for Regular n-gons Hiroshi Fujiwara* Kazuo Iwama Kouki Yonezawa.
1 Better Scalable Algorithms for Broadcast Scheduling Ravishankar Krishnaswamy Carnegie Mellon University Joint work with Nikhil Bansal and Viswanath Nagarajan.
Presented by: Priti Lohani
PTAS for Bin-Packing. Special Cases of Bin Packing 1. All item sizes smaller than Claim 1: Proof: If then So assume Therefore:
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Maryam Elahi Fairness in Speed Scaling Design Joint work with: Carey Williamson and Philipp Woelfel.
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Resource augmentation and on-line scheduling on multiprocessors Phillips, Stein, Torng, and Wein. Optimal time-critical scheduling via resource augmentation.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
1 Scheduling on Heterogeneous Machines: Minimize Total Energy + Flowtime Ravishankar Krishnaswamy Carnegie Mellon University Joint work with Anupam Gupta.
1 Scheduling Jobs with Varying Parallelizability Ravishankar Krishnaswamy Carnegie Mellon University.
Minimizing Flow Time on Multiple Machines Nikhil Bansal IBM Research, T.J. Watson.
Online Function Tracking with Generalized Penalties Marcin Bieńkowski Institute of Computer Science, University of Wrocław, Poland Stefan Schmid Deutsche.
Paging for Multi-Core Shared Caches Alejandro López-Ortiz, Alejandro Salinger ITCS, January 8 th, 2012.
Energy, Energy, Energy  Worldwide efforts to reduce energy consumption  People can conserve. Large percentage savings possible, but each individual has.
Parallel Programming in C with MPI and OpenMP
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
International Graduate School of Dynamic Intelligent Systems, University of Paderborn Improved Algorithms for Dynamic Page Migration Marcin Bieńkowski.
Chapter 6 CPU SCHEDULING.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Speed Scaling to Manage Energy and Temperature Nikhil Bansal (IBM Research) Tracy Kimbrel (IBM) and Kirk Pruhs (Univ. of Pittsburgh)
1 Pruhs, Woeginger, Uthaisombut 2004  Qos Objective: Minimize total flow time  Flow time f i of a job i is completion time C i – r i  Power Objective:
Quantifying the Sub-optimality of Non-preemptive Real-time Scheduling Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
1 Server Scheduling in the L p norm Nikhil Bansal (CMU) Kirk Pruhs (Univ. of Pittsburgh)
A Maiden Analysis of Longest Wait First Jeff Edmonds York University Kirk Pruhs University of Pittsburgh.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.
For Wednesday No reading No homework There will be homework for Friday, as well the program being due – plan ahead.
Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.
Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Location aware CHORD Ashwin, Vivek, Manu CS-7460 Project Presentation.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
Loss-Bounded Analysis for Differentiated Services. By Alexander Kesselman and Yishay Mansour Presented By Sharon Lubasz
Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.
With Extra Bandwidth and Time for Adjustment TCP is Competitive J. Edmonds, S. Datta, and P. Dymond.
Multicast Pull Scheduling Kirk Pruhs. The Big Problem Movie Distribution Database Replication via Internet Harry Potter Book Download Software Download.
1 Approximation Algorithms for Generalized Scheduling Problems Ravishankar Krishnaswamy Carnegie Mellon University joint work with Nikhil Bansal, Anupam.
11 -1 Chapter 12 On-Line Algorithms On-Line Algorithms On-line algorithms are used to solve on-line problems. The disk scheduling problem The requests.
CSCI-455/552 Introduction to High Performance Computing Lecture 21.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Lecture 4 Page 1 CS 111 Summer 2013 Scheduling CS 111 Operating Systems Peter Reiher.
Scheduling Parallel DAG Jobs to Minimize the Average Flow Time K. Agrawal, J. Li, K. Lu, B. Moseley.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016
Perturbation method, lexicographic method
CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017
Process Scheduling B.Ramamurthy 9/16/2018.
Scheduling (Priority Based)
On Scheduling in Map-Reduce and Flow-Shops
Chapter 6: CPU Scheduling
Process Scheduling B.Ramamurthy 12/5/2018.
Parallel Computing Spring 2010
New Scheduling Algorithms: Improving Fairness and Quality of Service
University of Pittsburgh
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Parallel Speedup.
Potential for parallel computers/parallel programming
Chapter 6: Scheduling Algorithms Dr. Amjad Ali
Potential for parallel computers/parallel programming
Presentation transcript:

University of Pittsburgh CO 2000 Scalably Scheduling Processes with Arbitrary Speedup Curves (Better Scheduling in the Dark) Jeff Edmonds York University Kirk Pruhs University of Pittsburgh MAPSP 2009

The Scheduling Problem CO 2000 Allocate p processors to a stream of n jobs: Shortest Remaining Processing Time (SRPT) Shortest Elapsed Time First (SETF)

Sublinear Nondecreasing Speedup Functions CO 2000 A set of jobs: Each job has phases: Each phase: Work: Speedup function: Nondecreasing Sublinear Examples: J = f 1 ; : n g J i = ­ 1 ; : q ® J q i = h W ; ¡ W q i ¡ q i

The Scheduling Problem CO 2000 Allocate p processors to a stream of n jobs: Measure of Quality: Competitive Ratio: F ( A I ) = P n i 1 c ¡ r R t d m a x I F ( A ) O p t

Sublinear Nondecreasing Speedup Functions CO 2000 Online Nonclairvoyant: Future ? Optimal: All Knowing All Powerful Not Competitive:

Nonclairvoyant ­ ­ = n = ­ = F ( S E T ) O p t F ( E q u i ) n l F ( N CO 2000 F ( S E T ) O p t = ­ n I u O p t F ( E q u i ) = ­ n l I F ( N o n c l a i r v y t ) O p = ­ [MPT]

Performance vs Load m a x = ­ ( n ) A I F O p t F ( A I ) F ( O p t I CO 2000 F ( A I ) F ( O p t I ) = ­ ( n ) m a x I F O p t A Average Performance Load I

Performance vs Load m a x = O ( 1 ) A I F ( O p t ) F ( A I ) F ( O p CO 2000 F ( A I ) F ( O p t I ) I Average Performance Load = O ( 1 ) s c m a x I F ( O p t ) A

Performance vs Load m a x O = F ( A I ) I F ( O p t ) F ( A I ) F ( O CO 2000 F ( A s I ) F ( O p t I ) I Average Performance Load c F ( A s I ) = O Load m a x I F ( O p t )

Resource Augmentation CO 2000 Nonclairvoyant: Future ? Extra Speed Optimal: All Knowing All Powerful Competitive:

Resource Augmentation CO 2000 S E T F O p t I n u ( 1 + ² ) = £ [KP] 2 O p t I n u E q i 2 + ² F ( ) 1 = £ [E] Required Fully Parallelizable Jobs

Sublinear Nondecreasing Speedup Functions CO 2000 Arrives over time Currently Alive O p t Opt gives all its resources to the parallelizable job and hence competes them as they arrive. The sequential jobs complete with no resources.

Sublinear Nondecreasing Speedup Functions CO 2000 Arrives over time Currently Alive S E T F s Shortest Elapsed Time First (SETF) gives all its resources to a sequential job, wasting it. The parallelizable jobs, getting no resources never complete. F ( S E T s ) O p t 1 = ­ n

Sublinear Nondecreasing Speedup Functions CO 2000 Arrives over time Currently Alive ` ² jobs EQUI wastes є resources, but has є extra. Equi spreads its resources fairly. Most are wasted on the sequential jobs. The parallelizable jobs don’t get enough and fall behind. E q u i 1 + ² F ( E q u i 1 + ² ) O p t = `

Sublinear Nondecreasing Speedup Functions CO 2000 I n p u t O p t E q u i 2 + ² F ( E q u i 2 + ² ) O p t 1 = £ Required [E]

nt jobs currently alive sorted by arrival time. CO 2000 S E T F < L A P q u i nt jobs currently alive sorted by arrival time. Latest Arrival Processor Sharing O p t 1 job S E T F But may be sequential. L A P S jobs ¯ n t Compromise 2 + ² Too thin & needs speed jobs E q u i n t New result [EP] F ( L A P S h ¯ ; 1 + ² i ) O p t = £ Speed

nt jobs currently alive sorted by arrival time. CO 2000 S E T F < L A P q u i nt jobs currently alive sorted by arrival time. Latest Arrival Processor Sharing O p t 1 job S E T F ¯ ¼ F ( S E T 1 + ² ) O p t = £ L A P S jobs ¯ n t Compromise jobs E q u i n t F ( E q u i 2 + ² ) O p t 1 = £ ¯ New result [EP] F ( L A P S h ¯ ; 1 + ² i ) O p t = £

Backwards Quantifies 9 A l g 8 ² = 8 ² 9 A l g = £ 8 A l g 9 ² = ! = £ CO 2000 Desired result: 9 A l g 8 ² F ( 1 + ) O p t = Obtained: 8 ² 9 A l g F ( 1 + ) O p t = £ 2 ¯ = 1 2 ² New result [E STOC09?] 8 A l g 9 ² F ( 1 + ) O p t = ! ² = 1 2 ¯ New result [EP] F ( L A P S h ¯ ; 1 + ² i ) O p t = £ ’

Performance vs Load Threshold CO 2000 Defn: A set of jobs has load if i.e. can be optimally handled with speed L. L 2 [ ; 1 ] F ( O p t L I ) = 1 Defn: F ¯ ( L ) = m a x I w i t h l o d A P S ; 1 Equi (β=1) has the best performance, but it only can handle half load. L = 1 2 Small β can handle almost full load L = 1 ¡ ¯ 1 ¯ . but its performance degrades with L

Proof Sketch CO 2000 F ( L A P S h ¯ ; 1 + ² i ) O p t = £

Proof Sketch CO 2000 In the worst cast inputs, each phase is either sequential or parallelizable. LAPS LAPS

Proof Sketch CO 2000

Potential Function + · c F ( L A P S ) + © ¡ · c O p t Define a potential function Φt. It says how much debt Laps has in the bank. Φ0 = Φfinal = 0. Φt does not increase as jobs arrive or complete. At other times, Result follows by integrating d F ( L A P S ) t + © · c O p F ( L A P S ) + © f i n a l ¡ · c O p t

Potential Function = © = ° P ¢ m a x ( ; ) d © t i 2 [ n ] x CO 2000 nt jobs currently alive sorted by arrival time. Job arrives: d © t = nt+1 Coefficient: 1 2 3 … nt Parallelizable work done by Opt not by LAPS: x 1 2 3 … n t © = ° P i 2 [ n t ] ¢ m a x ( ; )

Potential Function · © = ° P ¢ m a x ( ; ) d © t i 2 [ n ] x x x x x x CO 2000 n jobs currently alive sorted by arrival time. i Coefficient: 1 2 3 … i+1 d © t · nt-1 i nt Parallelizable work done by Opt not by LAPS: … x 1 x 2 x 3 x n t x n t x n t © = ° P i 2 [ n t ] ¢ m a x ( ; ) Job completes:

Potential Function · ° ¢ n 1 © = ° P ¢ m a x ( ; ) © = ° P ¢ m a x ( ; CO 2000 n jobs currently alive sorted by arrival time. Coefficient: 1 Opt works: O p t d © · ° ¢ n 1 2 3 … nt Parallelizable work done by Opt not by LAPS: … x 1 x 2 x 3 x n t x n t x n t © = ° P i 2 [ n t ] ¢ m a x ( ; ) © = ° P i 2 [ n t ] ¢ m a x ( ; ) © = ° P i 2 [ n t ] ¢ m a x ( ; )

Potential Function · ° P ¢ © = ° P ¢ m a x ( ; ) © = ° P ¢ m a x ( ; ) CO 2000 n jobs currently alive sorted by arrival time. Coefficient: 1 2 3 … nt L A P S ( 1 ¡ ¯ ) n t Parallelizable work done by Opt not by LAPS: … x 1 x 2 x 3 x n t x n t x n t x ( 1 ¡ ¯ ) n t © = ° P i 2 [ n t ] ¢ m a x ( ; ) © = ° P i 2 [ n t ] ¢ m a x ( ; ) © = ° P i 2 [ n t ] ¢ m a x ( ; ) LAPS works: d © t · ° P i 2 [ ( 1 ¡ ¯ ) n ; b ` ] ¢ + ² # of jobs sequential under LAPS LAPS is a head, i.e. x =

Potential Function b ` £ ( ) n · · N · ° ¢ n 1 + P + · c b ` = d F ( L CO 2000 d F ( L A P S ) t + © · c O p b ` £ ( 1 ¯ ² ) n t · · N t # jobs alive under Laps resulting competitive ratio # jobs alive under Opt A page of math later, and the proof is done. LAPS works: Opt works: d © t · ° ¢ n t 1 + P i 2 [ ( ¡ ¯ ) ; b ` ] ² # of jobs sequential under LAPS LAPS is a head, i.e. x i · b ` t =

Conclusions = £ 8 A l g 9 ² = ! [EP] Resource Augmentation: nt jobs currently alive sorted by arrival time. L A P S jobs ¯ n t Latest Arrival Processor Sharing [EP] Resource Augmentation: F ( L A P S h ¯ ; 1 + ² i ) O p t = £ [Work in Progress]: Suboptimal Load Threashold 8 A l g 9 ² F ( 1 + ) O p t = !

Other Models Same Techniques CO 2000 Other Models Same Techniques Broadcast: Many page requests [EP:SODA02, EP:SODA03] serviced simultaneously TCP: Add Incr & Mult Decr ~ EQUI Speed Scaling: Each algorithm can dynamically choose its speed s, but it must pay for it with energy P(s) = sα [EDD:PAA03, E:Latin 04] [CELLSP:STACS09,CEP:STACS09]

CO 2000 Thank you