Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heuristics for Work Distribution of a Homogeneous Parallel Dynamic Programming Scheme on Heterogeneous Systems Javier Cuenca Departamento de Ingeniería.

Similar presentations

Presentation on theme: "Heuristics for Work Distribution of a Homogeneous Parallel Dynamic Programming Scheme on Heterogeneous Systems Javier Cuenca Departamento de Ingeniería."— Presentation transcript:

1 Heuristics for Work Distribution of a Homogeneous Parallel Dynamic Programming Scheme on Heterogeneous Systems Javier Cuenca Departamento de Ingeniería y Tecnología de Computadores Universidad de Murcia, Spain Domingo Giménez Departamento de Informática y Sistemas Universidad de Murcia, Spain Juan-Pedro Martínez Departamento de Estadística y Matemática Aplicada Universidad Miguel Hernández de Elche, Spain 11 November 2018 HeteroPar2004

2 General Goal: to obtain parallel routines with autotuning capacity
Our Goal General Goal: to obtain parallel routines with autotuning capacity Previous works: Linear Algebra Routines, Homogeneous Systems This communication: Parallel Dynamic Programming Schemes on Heterogeneous Systems In the future: apply the techniques to other algorithmic schemes 11 November 2018 HeteroPar2004

3 Outline Parallel Dynamic Programming Schemes
Autotuning in Parallel Dynamic Programming Schemes Work Distribution Experimental Results 11 November 2018 HeteroPar2004

4 Parallel Dynamic Programming Schemes
There are different Parallel Dynamic Programming Schemes. The simple scheme of the “coins problem” is used: A quantity C and n coins of values v=(v1,v2,…,vn), and a quantity q=(q1,q2,…,qn) of each type. Minimize the quantity of coins to be used to give C. But the granularity of the computation has been varied to study the scheme, not the problem. 11 November 2018 HeteroPar2004

5 Parallel Dynamic Programming Schemes
Sequential scheme: for i=1 to number_of_decisions for j=1 to problem_size obtain the optimum solution with i decisions and problem size j endfor Complete the table with the formula: endfor 1 2 . j N …. i n 11 November 2018 HeteroPar2004

6 Parallel Dynamic Programming Schemes
Parallel scheme: for i=1 to number_of_decisions In Parallel: for j=1 to problem_size obtain the optimum solution with i decisions and problem size j endfor endInParallel 1 2 . j ... i n PO P P PS PK PK 11 November 2018 HeteroPar2004

7 Parallel Dynamic Programming Schemes
Message-passing scheme: In each processor Pj for i=1 to number_of_decisions communication step obtain the optimum solution with i decisions and the problem sizes Pj has assigned endfor endInEachProcessor 1 2 . j ... i n N PO P P PK PK 11 November 2018 HeteroPar2004

8 Parallel Dynamic Programming Schemes
There are different possibilities in heterogeneous systems: Heterogeneous algorithms. Homogeneous algorithms and assignation of: One process to each processor A variable number of processes to each processor, depending on the relative speed The general assignation problem is NP  use of heuristics approximations 11 November 2018 HeteroPar2004

9 Parallel Dynamic Programming Schemes
Dynamic Programming (the coins problem scheme) Homogeneous algorithm + Heterogeneous algorithm distribution 1 2 . j ... i n 1 2 . j ... i n p0 p1 p2 p p4 p ps pr-1 pr P0 P0 P1 P P3 P PS ... PK PK P P P PS PK PK 11 November 2018 HeteroPar2004

10 Autotuning in Parallel Dynamic Programming Schemes
The model: t(n,C,v,q,tc(n,C,v,q,p,b,d),ts(n,C,v,q,p,b,d),tw(n,C,v,q,p,b,d)) Problem size: n number of types of coins C value to give v array of values of the coins q quantity of coins of each type Algorithmic parameters (AP): p number of processes b block size (here n/p) d processes to processors assignment System parameters (SP): tc cost of basic arithmetic operations ts start-up time tw word-sending time 11 November 2018 HeteroPar2004

11 Autotuning in Parallel Dynamic Programming Schemes
Theoretical model: Sequential cost: Computational parallel cost (qi large): Communication cost: The APs are p and the assignation array d The SPs are the unidimensional array tc , and the bidimensional arrays ts and tw one step Maximum values 11 November 2018 HeteroPar2004

12 Work distribution Assignment tree (P types of processors and p processes): P processors 1 2 3 ... P ... ... ... 1 2 3 P 2 3 P 3 P P p processes ... Some limit in the height of the tree (the number of processes) is necessary 11 November 2018 HeteroPar2004

13 Work distribution P =2 and p =3: 10 nodes in general:
Assignment tree (P types of processors and p processes): P =2 and p =3: 10 nodes in general: 1 1 1 11 November 2018 HeteroPar2004

14 Work distribution Systems: SUNEt:
five SUN Ultra 1 and one SUN Ultra 5 (2.5 times faster) Ethernet TORC (Innovative Computing Laboratory): 21 nodes of different types (dual and single, Pentium II III and 4, AMD Athlon, …) FastEthernet, Myrinet, … 11 November 2018 HeteroPar2004

15 Work distribution Assignment tree. SUNEt P=2 types of processors (five SUN1 + one SUN5): nodes: when more processes than available processors are assigned to a type of processor, the costs of operations (SPs) change 2 processors U5 U1 U1 U5 U1 U1 U5 one process to each processor p processes U1 U1 ... U1 11 November 2018 HeteroPar2004

16 Work distribution Assignment tree. TORC, used P=4 types of processors:
one 1.7 Ghz Pentium 4 (only one process can be assigned). Type 1 one 1.2 Ghz AMD Athlon. Type 2 one 600 Mhz single Pentium III. Type 3 eight 550 Mhz dual Pentium III. Type 4 4 processors not in the tree two consecutive processes are assigned to a same node 1 2 3 4 p processes 1 2 3 4 2 3 4 3 4 4 ... the values of SPs change 11 November 2018 HeteroPar2004

17 Work distribution Use Branch and Bound or Backtracking (with nodes elimination) to search through the tree: Use the theoretical execution model to estimate the cost at each node with the highest values of the SPs between those of the types of processors considered, through multiplying the values by the number of processes assigned to the processor of this type with more charge: 11 November 2018 HeteroPar2004

18 Work distribution Use Branch and Bound or Backtracking (with nodes elimination) to search through the tree: Use the theoretical execution model to obtain a lower bound for each node For example, with an array of types of processors (1,1,1,2,2,2,3,3,3,4,4,4), with relative speeds si, and array of assignations a=(2,2,3), the array of possible assignations is pa=(0,0,0,1,1,0,1,1,1,1,1,1), and the maximum achievable speed is the minimum arithmetic cost is obtained from this speed, and the lowest communication costs are obtained from those between processors in the array of assignations 11 November 2018 HeteroPar2004

19 Experimental Results Systems: Varying:
SUNEt: five SUN Ultra 1 and one SUN Ultra 5 (2.5 times faster) + Ethernet TORC: 11 nodes of different types (1.7 Ghz Pentium Ghz AMD Athlon+600 Mhz Pentium III Mhz Dual Pentium III) + FastEthernet Varying: The problem size C = 10000, 50000, , Large value of qi The granularity of the computation (the cost of a computational step) 11 November 2018 HeteroPar2004

20 Experimental Results How to estimate arithmetic SPs:
Solving a small problem on each type of processors How to estimate communication SPs: Using a ping-pong between each pair of processors, and processes in the same processor (CP1) Does not reflect the characteristics of the system Solving a small problem varying the number of processors, and with linear interpolation (CP2) Larger installation time 11 November 2018 HeteroPar2004

21 Experimental Results Three types of users are considered:
GU (greedy user): Uses all the available processors, with one process per processor. CU (conservative user): Uses half of the available processors (the fastest), with one process per processor. EU (user expert in the problem, the system and heterogeneous computing): Uses a different number of processes and processors depending on the granularity: 1 process in the fastest processor, for low granularity The number of processes is half of the available processors, and in the appropriate processors, for middle granularity A number of processes equal to the number of processors, and in the appropriate processors, for large granularity 11 November 2018 HeteroPar2004

22 Experimental Results Quotient between the execution time with the parameters selected by each one of the selection methods and the modelled users and the lowest execution time, in SUNEt: 11 November 2018 HeteroPar2004

23 Experimental Results Parameters selection, in TORC, with CP2: C gra LT
50000 10 (1,2) 50 (1,2,4,4) 100 100000 500000 (1,2,3,4) 11 November 2018 HeteroPar2004

24 Experimental Results Parameters selection, in TORC (without the 1.7 Ghz Pentium 4), with CP2: one 1.2 Ghz AMD Athlon. Type 1 one 600 Mhz single Pentium III. Type 2 eight 550 Mhz dual Pentium III. Type 3 C gra LT CP2 50000 10 (1,1,2) (1,1,2,3,3,3,3,3,3) 50 (1,1,2,3,3,3,3,3,3,3,3) 100 (1,1,3,3) 100000 (1,1,3) 500000 (1,1,2,3) 11 November 2018 HeteroPar2004

25 Experimental Results Quotient between the execution time with the parameters selected by each one of the selection methods and the modelled users and the lowest execution time, in TORC: 11 November 2018 HeteroPar2004

26 Experimental Results Quotient between the execution time with the parameters selected by each one of the selection methods and the modelled users and the lowest execution time, in TORC (without the 1.7 Ghz Pentium 4): 11 November 2018 HeteroPar2004

27 Conclusions and future work
The inclusion of Autotuning capacities in a Parallel Dynamic Programming Scheme for Heterogeneous Networks of Processors has been considered. Parameters selection is combined with heuristics search in the assignation tree. Experimentally the selection proves to be satisfactory, and useful in providing the users with routines capable of reduced time executions. In the future we plan to apply this technique to other algorithmic schemes. 11 November 2018 HeteroPar2004

Download ppt "Heuristics for Work Distribution of a Homogeneous Parallel Dynamic Programming Scheme on Heterogeneous Systems Javier Cuenca Departamento de Ingeniería."

Similar presentations

Ads by Google