Download presentation
Presentation is loading. Please wait.
Published byPaula Daniels Modified over 9 years ago
1
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell, Russ Wakefield, Abdulla Al-Qawasmeh, Ron C. Chiang, and Jiayin Li 1 outline ●motivation and introduction ●system model ●robustness ●example of heuristic ●results and conclusions Supported by the NSF under grants CNS-0615170 and CNS-0905399
2
need to execute applications on satellite data satellite data is processed in a heterogeneous computing system results are needed before a deadline Motivation 2 multi-core heterogeneous data processing system app 1 app 2...... satellite data result applications deadline
3
Problem Statement multiple applications (for this presentation consider one) each application is a DAG of tasks a set of applications must complete before a deadline Δ completion time of an application must be robust against uncertainties in the estimated execution time of its tasks actual time is data dependent goal: robust resource allocation of data and tasks to heterogeneous multi-core system to meet deadline Δ for applications 3 t 4,α t 1,α t 2,α t 3,α t 5,α t 6,α t 7,α application α Δ time
4
Environment consider a heterogeneous environment used to analyze satellite imaging based on commodity hardware these environments require analysis of large data sets environment similar to systems in use at National Center for Atmospheric Research (NCAR) DigitalGlobe static resource allocation estimated time to compute a task is known in advance 4
5
Contributions contributions model and simulation of a complex multi-core-based data processing environment that executes data intensive applications multi-core machines RAM management hard drive management parallel tasks satellite data placement a robustness metric for this environment resource allocation heuristics to maximize robustness using this metric 5
6
System Model — Satellite Data Placement satellite data is split into smaller subsets and distributed among the hard drives of the compute nodes 6 satellite data multi-core heterogeneous data processing system satellite data PE j,1 compute node j PE j,8 … RAM j HD j ●processing element (PE) is a core ●PE j,k — PE k on compute node j (1 – 8 per node) PEs within a compute node are homogeneous ●no multi-tasking within a PE
7
●input data sets are staged to RAM ●task 1 (t 1 ) can start execution ●result is stored in RAM j RAM space is limited PE j,1 compute node j PE j,8 … RAM j HD j System Model — Processing tasks execute on processing elements (PEs) [if data on HD j ] required input data must be present in RAM to execute task 7 satellite data at compute node j task 1 t 1t 1 input data sets results ●ex.
8
System Model — RAM Management RAM has a fixed capacity 160Gbytes (based on DigitalGlobe computer center) assume 152Gbytes available for data typical data set was from 1Gbyte to 32Gbytes data sets can be swapped in and out of RAM if needed later all input data sets must be in RAM before task execution data sets must remain in RAM until execution is finished must reserve space in RAM for result 8
9
System Model — Storage satellite data sets allocated prior to task execution two scenarios for satellite data allocation determined by the heuristic randomly assigned (pre-determined) inter-task data is transmitted if destination is not equal to source 9
10
each application app α must complete before Δ app α is divided into T α tasks (tasks form a DAG) each task requires satellite data sets or produced inter-task data sets t i,α is the i th task in the application α each task produces other data items (e.g., data 7) last task produces a result System Model — Applications 10 sat. data 6 t 1,α t 3,α t 2,α sat. data 1 data 2 sat. data 4 data 7 data 3 result app α
11
System Model — Computation Parallelism 50% of tasks are parallelizable only parallelizable on PEs in the same compute node parallel time = sequential time / divider parallel execution time is used to model different speed ups two types of parallelizable tasks 25% good parallel tasks 25% average parallel tasks divider values chosen arbitrarily for the simulation study 11 PEs 12345678 divider11.752.53.2544.755.56.25 PEs 12345678 divider11.522.533.544.5
12
Robustness — Three Questions What behavior of the system makes it robust? all applications finish before Δ What uncertainties is the system robust against? differences between actual and estimated times assume communications times are fixed Quantitatively, exactly how robust is the system? smallest common percentage increase (ρ) for all task execution times that causes the makespan > Δ note: in a real system, the execution times of all tasks will not be increased by the same common percentage ρ is just a mathematical value used as a robustness measure 12
13
Robustness — Example assume 3 applications blue (b, d, g, and h), green (a, e, and i), and pink(c and f) 13 PE 1,1 PE 2,1 PE 3,1 a i b d g h c completion time f e makespan Δ makespan based on estimated task time PE 1,1 PE 2,1 PE 3,1 a′ i′ b′ d′ g′ h′ c′c′ completion time f′f′ e′ Δ makespan when task times = ρ ∙ estimated task time
14
Related Work significant amount of research assign a DAG to a heterogeneous computing system several critical path heuristics robustness in resource allocation our research considers the robustness of the allocation in DAGs two heuristics for minimization of makespan from literature were adapted to this paper heuristics originally meant to minimize makespan adapted heuristics can handle memory, satellite data placement, and robustness Dynamic Available Tasks Critical Path (DATCP) heuristic will be explained today 14
15
Dynamic Available Tasks Critical Path (DATCP) outline 1.calculate the critical path for each application for each task, from t exit to t entry edge labels are average transfer time/byte between any two nodes ∙ data size determine the maximum time from any successor (child) node to the t exit (max time ) critical path value is the sum of task data and satellite data transfer times, max time, and average execution time of t i 15 8 7 6 3 58 5 4 5 37 3 23 4 27 5 26 6 17 7 14 3 6 6 critical path value average exec. time
16
Dynamic Available Tasks Critical Path (DATCP) outline 1.calculate the critical path for each application 2.dynamically create a list of all tasks available for mapping 3.determine the task with the longest critical path from the list of available tasks 4.task t i determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping 5.repeat steps (2)–(4) until all tasks are mapped 16 8 7 6 3 58 5 4 5 37 3 23 4 27 5 26 6 17 7 14 3 6 6 critical path value average exec. time
17
Dynamic Available Tasks Critical Path (DATCP) outline 1.calculate the critical path for each application 2.dynamically create a list of all tasks available for mapping 3.determine the task with the longest critical path from the list of available tasks 4.task t i determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping 5.repeat steps (2)–(4) until all tasks are mapped 17 8 7 6 3 58 5 4 5 37 3 23 4 27 5 26 6 17 7 14 3 6 6 critical path value average exec. time
18
Dynamic Available Tasks Critical Path (DATCP) outline 1.calculate the critical path for each application 2.dynamically create a list of all tasks available for mapping 3.determine the task with the longest critical path from the list of available tasks 4.task t i determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping 5.repeat steps (2)–(4) until all tasks are mapped 18 8 7 6 3 58 5 4 5 37 3 23 4 27 5 26 6 17 7 14 3 6 6 critical path value average exec. time
19
Dynamic Available Tasks Critical Path (DATCP) outline 1.calculate the critical path for each application 2.dynamically create a list of all tasks available for mapping 3.determine the task with the longest critical path from the list of available tasks 4.task t i determined in (3) is assigned to the PE that gives the maximum system robustness based on partial mapping 5.repeat steps (2)–(4) until all tasks are mapped 19 8 7 6 3 58 5 4 5 37 3 23 4 27 5 26 6 17 7 14 3 6 6 critical path value average exec. time
20
DATCP — Memory Management determine available space in RAM decide if the required task and the input data can be stored in RAM immediately if there is not enough space heuristic checks when the task's input data sets can be moved into memory heuristic schedules task to start execution at that time if incoming data is from another compute node send it to destination compute node’s RAM if there is no space in RAM then send to the HD 20
21
DATCP — Parallelizable Tasks two approaches are studied no parallelization “max” approach heuristic always parallelizes across multiple PEs within a compute node determine system robustness for each possible assignment determine the node with the most PEs that have same maximum robustness map the task to all PEs that have the same robustness value within this compute node 21
22
DATCP — Satellite Data Placement two methods random placement first time a satellite data set is required, that data set and the task that requires it are mapped task is assigned to the PE that maximizes robustness storage location of satellite data set has not been previously determined satellite data set is stored in the HD of this PE's corresponding compute node 22
23
Results 23 DATCP 1: Max parallel with satellite mapping DATCP 2: Max parallel with random satellite mapping DATCP 3: no parallelism with random satellite mapping HRD 1: satellite data (SD) placement based on first task placement with duplication HRD 2: SD placement based on first task placement with no duplication HRD 3: SD placement based on reference count with no duplication HRD 4: random SD placement with duplication HRD 5: random SD placement and no duplication
24
Plot of Makespan vs. Robustness 24
25
Conclusions derived a metric to measure the robustness interdependency of tasks within applications complicate the derivation of a robustness metric DATCP has highest average robustness values initial ordering created by DATCP is much better than the order created by HRD if DATCP order is used in HRD then the results of HRD are significantly improved satellite data placement did not have any apparent effect on robustness 25
26
QUESTIONS? 26
27
Future Work include other heuristics realistic data instead of using the coefficient of variation based method to generate the ETC values for the simulations robustness metric could be improved so that it may be calculated directly instead of using the search procedure 27
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.