Green HPC: Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-Enabled Data Centers Kyong Hoon Kim 1, Rajkumar Buyya 1, and Jong Kim 2 1 Grid Computing and Distributed Systems (GRIDS) Laboratory Dept. of Computer Science and Software Engineering The University of Melbourne, Australia POSTECH, Korea Gridbus Sponsors
2 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
3 Background Traditionally, high-performance computing (HPC) community has focused on performance (speed). At the same time microprocessor vendors have not only doubled the number of transistors (and speed) every months, but they have also doubled the power densities. Moore’s Law for Power Consumption:
4 Research Motivations of Power Aware/Energy Efficient High Performance Computing (HPC) Rapid uptake of HPC-architecture based Data Centers for hosting industrial applications Reducing the operational costs of powering and cooling HPC systems The tremendous increase in computer performance has come with an even grater increase in power usage. According to Eric Schmit, CEO of Google, what matter most to Google is “not speed but power, because data centers can consume as much electricity as a city.” Improving reliability As a rule of thumb, for every 10°C increase in temperature, the failure rate of a system doubles. Computing environment affected the correctness of the results. The 18-node Linux cluster produced an answer outside the residual (i.e., a silent error) when running in dusty 85°F warehouse but produced the correct answer when running in a 65°F machine-cooled room.
5 Reliability/Implications Reliability of Leading Edge Supercomputer (D. Reed, 2004) Estimated Cost of An hour of system downtime (W. Feng, (ACM Queue, 2003):
6 Power Aware Computing Power Aware (PA) computing/communications The objective of PA computing/communications is to improve power management and consumption using the awareness of power consumption of devices. Power consumption is one of the most important considerations in mobile devices due to the limitation of the battery life. System level power management Recent devices (CPU, disk, communication links, etc.) support multiple power modes. System scheduler can use these multiple power modes to reduce the power consumption.
7 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
8 Related Work (1/3) Research on power reduction for scientific applications Hsu and Feng (SC 2005) – Los Alamos National Lab, USA -adaptation algorithm Automatic adaptation of CPU frequencies : the intensity level of off-chip accesses Ge, Feng, and Cameron (2005) Three DVS scheduling strategies Software framework to implement scheduling techniques Hotta, et. al. (2006) Profile-based power-performance optimization Selection of an appropriate gear using DVS scheduling Development of power-profiling system called PowerWatch …
9 Related Work (2/3) Energy reduction for MPI programs Kappiah, et. al. (2005) - NC State, and Georgia Uni, USA Inter-node bottle problem in MPI programs Selection of an appropriate gear based on slack time Lim, et. al. (2006) – NC State, and Georgia Uni, USA Adaptive DVS of Communication Phases in MPI programs Son, et. al. (2006) … Two approaches for building power-aware cluster platforms Design and develop systems with consideration of energy consumption. BlueGene/L, Green Destiny, … Use DVS-enabled commodity systems. Clusters with AMD Athlon64s, Pentium Ms, AMD Opterons, …
10 Related Work (3/3) DVS (Dynamic Voltage Scaling) technique Reducing the dynamic energy consumption by lowering the supply voltage at the cost of performance degradation Recent processors support such ability to adjust the supply voltage dynamically. The dynamic energy consumption = * Vdd2 * Ncycle Vdd : the supply voltage Ncycle : the number of clock cycle An example 10 msec25 msec deadline Power (a) Supply voltage = 5.0 V 10 msec25 msec deadline Power (b) Supply voltage = 2.0 V
11 DVS-based Power Aware Cluster Scheduling Research motivation Previous work has focused on the development of DVS- enabled cluster systems. Few works have considered the scheduling problem in power- aware clusters. Problem to solve To provide scheduling algorithms in DVS-enabled cluster systems in order to minimize the energy consumption and to meet the job deadline. Exploit industries move towards Utility Model / SLA-based Resource Allocation…
12 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
13 System Model (1/2) Cluster model A cluster system is defined as (N, Q). N: the number of processors Q: the processing performance of each PE in terms of MIPS Job model A job is considered to be a bag-of-tasks application. The deadline is used as a QoS parameter of a job. A job = (p, {l 1, l 2, …, l p }, d) p : the number of sub-tasks l i : the length in MI of the i-th task d: the job deadline
14 System Model (2/2) Energy model Energy consumption of a task execution E = V2L L : the task length V : the supply voltage : a proportional constant Dynamic Voltage Scaling {V1, …, Vm} : m different voltage levels Qi : the processor speed (MIPS) under the associated voltage level Vi Si : the normalized speed of each voltage level Vi (Si = Qi/Qm) An Example Voltage (V i )MIPS (Q i )Relative Speed (S i ) 0.9 V4, V6, V8, V10,0001.0
15 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
16 Proposed Cluster RMS System Architecture with Energy-Efficient Resource Allocation (1) Job submission (2) Schedulability test & Energy estimation (3) Acknowledgement of schedulability and energy amount (4) Selection of PEs Resource Controller Processor Energy Estimator Job-Queue PE 1 Processor Energy Estimator Job-Queue PE 2 Processor Energy Estimator Job-Queue PE N User (1) (2) (3) (4) Cluster
17 Application Admission and Resource Allocation Algorithm Can task be completed in this PE? How to estimate energy consumption? We propose two DVS scheduling algorithms based on Space-shared policy (EDF) Time-shared policy Algorithm Admission_Resource_Allocation (J = (p, {l1, …, lp}, d)) 1: for i from 1 to p do 2: PE alloc null; 3: energy min MAX_VALUE; 4: for k from 1 to N do 5: if schedulable (PE k, l i, d) == true then 6: energy k energy_estimate (PE k, l i, d); 7: if energy k < energy min then 8: energy min energy k ; 9: PE alloc PE k ; 10: endif 11: endif 12: endfor 13: if PE alloc != null then 14: Allocate the i-th task of J to PE alloc ; 15: else 16: Cancel all tasks of J. 17: return reject; 18: endelse 19: endfor 20: return accept;
18 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
19 EDF-based on DVS scheduling (1/4) Basics T k = { k,i (e k,i, d k,i ) | i = 1, …, n k } The current available task set in the k-th PE n k : the current number of tasks k,i (e k,i, d k,i ) : the i-th task in T k e k,i : the remaining execution time d k,i : the remaining deadline EDF (Early Deadline First) policy T k is sorted by the deadline so that d k,i d k,i+1 The scheduler always executes the earliest-deadline task in the queue.
20 EDF-based DVS scheduling (2/4) The temporary utilization, u k,i The required processor utilization for task k,i by EDF The continuous speed level of the highest-priority task, s k The supply voltage level of the highest-priority task, v k ~
21 EDF-based DVS scheduling (3/4) An example T k = { k,1 (1, 4), k,2 (2, 6), k,3 (2, 10)} Temporary utilizations at time 0 u k,1 = 1/4 u k,2 = (1 + 2)/6 = 1/2 u k,3 = ( )/10 = 1/2 Scaling factors At time 0 s k = max{u k,1, u k,2, u k,3 } = 1/2 v k = 1.1V Energy model to use k,1 510 Speed level 0 k,2 k,3 v k = 1.1V 10/6 0.9V ~ Voltage (V i )MIPS (Q i )Relative Speed (S i ) 0.9 V4, V6, V8, V10,0001.0
22 EDF-based DVS scheduling (4/4) Schedulability test of EDF Energy estimation of EDF Algorithm schedulable_EDF (PE k, l, d) T k’ T k {(l/Q m, d)}; Sort T k’ in the order of deadline. for i from 1 to n k + 1 do u k’,i e k’,i / d k,i ; if u k’,i > 1 then return false; endfor return true; Algorithm energy_estimate_EDF (PE k, l, d) E current energy_consumption (T k, n k ); T k’ T k {(l/Q m, d)}; E new energy_consumption (T k’, nk+1); return (E new – E current ); function energy_consumption (T, n) Energy 0; time the current time; for i from 1 to n do for j from i to n do u j e k /d j ; s’ max {u j } v min {V j | S j s’} s min {S j | S j s’} Energy Energy + v 2 e i Q m ; time time + e i /s; for j from i to n do d j d j – e i /s; endfor return Energy;
23 Proportional Share-based DVS scheduling (1/2) The proportional share scheme Multiple tasks share the processor performance in proportion to each task’s weight. Each task should be given at least e k,i /d k,i under the maximum processor speed to meet the deadline. The continuous processor speed level, s k The supply voltage level of the highest-priority task, v k The proportional share of each task, share k ~
24 Proportional Share-based DVS scheduling (2/2) An example Tk = { k,1(1, 4), k,2(2, 6), k,3(2, 10)} Schedulability test and energy estimation is similar to EDF algorithm k,1 : 0.32 Speed level 0 v k = 1.3V 1.1V V 0.8 k,2 : k,3 : k,2 : 0.62 k,3 : k,3 : 1.0 k,i : share k,i
25 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
26 Simulation Environment Using the GridSim toolkit A cluster system with 32 DVS-enabled processors Operating points of simulated processors based on Athlon bag-of-tasks applications Task characteristics Task length : 600,000 MIs ~ 7,200,000 MIs The number of tasks : 2 ~ 32 Deadline : 20% ~ 100% more than average execution time FrequencyVoltageMIPSRelative Speed 0.8 GHz0.9 V4, GHz1.0 V5, GHz1.1 V6, GHz1.2 V7, GHz1.3 V8, GHz1.4 V9, GHz1.5 V10,0001.0
27 Simulated algorithms DVS-based scheduling EDF-DVS PShare-DVS Scheduling at maximum processor speed EDF-1.5V PShare-1.5V Scheduling at minimum processor speed EDF-0.9V PShare-1.5V
28 Job Acceptance Rate Acceptance rate (%)
29 Energy consumption Normalized to EDF-1.5V at inter-arrival time of 2 mins. Inter-arrival time (min) Normalized value
30 Normalized performance of DVS Inter-arrival time (min) EDF-DVS vs EDF-1.5V PShare-DVS vs PShare-1.5V Energy Reduction (%) Acceptance Degradation (%) Energy Reduction (%) Acceptance Degradation (%)
31 Impact of granularity/number of controllable voltage levels Normalized performance of EDF Normalized performance of PShare Normalized value
32 Outline Introduction Related Work System Model Job Admission Control DVS-based Cluster Scheduling EDF-based scheduling Proportional share-based scheduling Simulation Results Summary
33 Summary Two primary drivers for Power-Aware HPC Operational cost Reliability Power-aware scheduling with deadline constraints Reducing energy consumption Meeting jobs’ deadlines The proposed scheduling algorithms DVS-based scheduling based on Space-shared policy EDF Time-share policy / Proportional Resource Sharing Minimizing cost under the constraint of the job deadline Future work Budget-constrained power-aware scheduling Power-aware workflow scheduling
34 Thanks for your attention! We Welcome Cooperation in Research and Development! eScience2007.org