Download presentation
Presentation is loading. Please wait.
Published byDamon Harrington Modified over 8 years ago
1
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source: Field-Programmable Technology (FPT), pp. 233-240, Dec. 2010 Presenter: Ming-Chih Li ESL, Dept. of CSIE, CCU 2011/09/16
2
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 2
3
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 3
4
Introduction To increase the raw computation capacity of a system Computational power Number of processing units High Performance Computing (HPC) systems Co-processing accelerators FPGA, GPU Distributed computing Several nodes in a cluster 4
5
Introduction (cont’) Challenges of design Hardware accelerators are customized for specific computation and communication patterns High non-recurring engineering cost Communication overhead 5
6
Introduction (cont’) Focusing the research on the Monte-Carlo (MC) simulation problems Contributions A scalable distributed Monte-Carlo framework for multi-accelerator heterogeneous clusters Load balancing schemes Dynamic runtime scheduling Mapped to two applications 6
7
Introduction (cont’) What’s Monte-Carlo simulation problem? A class of computational algorithms that rely on repeated random sampling to compute their results Financial applications in banks Example: calculation of PI value 7 1 1 0 Random (x, y) | x, y = [0,1] Area of square: 1*1 = 1 # of in-circle-points / total points * Area of square = Area of circle Area of circle = PI*r 2
8
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 8
9
Heterogeneous Framework Three major concerns Application programmer productivity No new languages and tool chains Scalability of approach Hierarchical model Resource utilization efficiency Extensible dynamic scheduling policies Based on computational performance or energy consumption 9
10
Heterogeneous Framework (cont’) 10
11
Heterogeneous Framework (cont’) 11
12
Heterogeneous Framework (cont’) 12
13
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 13
14
Scheduling Polices The computational performance differs between different nodes and between different accelerators of the same node Improper task distribution -> drastic performance reduction For example: Computing rate FPGA = 1000/1s CPU = 1/1s 14 One node FPGACPU MC distributor 20001000 1 1 Total time: 1000s
15
Scheduling Polices (cont’) The computational performance differs between different nodes and between different accelerators of the same node Improper task distribution -> drastic performance reduction For example: Computing rate FPGA = 1000/1s CPU = 1/1s 15 One node FPGACPU MC distributor 200011 19982 Total time: 2s
16
Scheduling Polices (cont’) Proposed one static and two dynamic scheduling polices A. Constant-Size policy B. Linear-Incremental policy C. Exponential-Incremental policy Definitions: : initial task size for all child processes : task size for child i at the jth time of simulation : remaining uncompleted task size 16
17
Scheduling Polices (cont’) A. Constant-Size policy For example: If total simulation tasks size = 120 and = 50 Then TS i 1 = 50, R d = 70 TS i 2 = 50, R d = 20 TS i 3 = 20, R d = 0 17
18
Scheduling Polices (cont’) A. Linear-Incremental policy For example: If total simulation tasks size = 120, = 50, and c = 5 Then TS i 1 = 50, R d = 70 TS i 2 = 55, R d = 15 TS i 3 = 15, R d = 0 18
19
Scheduling Polices (cont’) A. Exponential-Incremental policy For example: If total simulation tasks size = 500, = 50, and m = 2 Then TS i 1 = 50, R d = 450 TS i 2 = 100, R d = 350 TS i 3 = 200, R d = 150 TS i 4 = 150, R d = 0 19
20
Scheduling Polices (cont’) Other possible policies Mixed scheduling policy using Linear-Incremental policy at the beginning and then change the policy to Constant-Size after certain iteration Energy-Equal scheduling policy each MC worker consumes the same amount of computational energy 20
21
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 21
22
Applications The authors have implemented two applications in the proposed framework, namely, Asian option pricing using control variate method GARCH asset simulation 22
23
Applications (cont’) FPGA kernel Constant-Size scheduling policy is the best choice as all MC cores finish the computation in the exact same cycle 23
24
Applications (cont’) The number of pipelined stages must be identical for all the pipelined loops in order to guarantee a consistent computation schedule 24
25
Applications (cont’) Xilinx Vertex-5 xc5vlx330t FPGA 25
26
Applications (cont’) GPU kernel Single Instruction Multiple Data (SIMD) computing devices Design CUDA kernels CPU kernel C language Intel Math Kernel Library (MKL) compiled with Intel compiler (icc) 11.1 with -O3 OpenMP parallel FOR #pragma 26
27
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 27
28
Performance Evaluation An accelerator cluster consists of 8 server nodes two AMD Phenom 9650 Quad-Core 2.3GHz CPUs one nVidia Tesla C1060 GPU one Xilinx Virtex-5 xc5vlx330t FPGA 28
29
Performance Evaluation (cont’) Dynamic scheduling analysis of a single node The number of Monte-Carlo simulations is 10,000,000 Using Linear-Incremental policy with TS init = 1000 29
30
Performance Evaluation (cont’) Dynamic scheduling analysis of a single node 30
31
Performance Evaluation (cont’) Performance, energy and efficiency analysis of accelerator allocation of a cluster Acceleration performance versus energy consumption Power monitor Additional Power Consumption for Computation (APCC) APCC = Run-time Power – Static Power Additional Energy Consumption for Computation (AECC) 31
32
simulations is 100000000 Using Linear-Incremental policy with TS init = 1000 Constant-Size scheduling policy is employed at the higher level MC distributor with TS init = 100M, 50M, 25M, 12.5M for a cluster with 1, 2, 4, 8 nodes. 32
33
Performance Evaluation (cont’) 33
34
Performance Evaluation (cont’) 34
35
Outline Introduction Heterogeneous Framework Scheduling Polices Applications Performance Evaluation Conclusions 35
36
Conclusions Propose a dynamic scheduling Monte-Carlo framework for collaborative computation in a multi-accelerator heterogeneous cluster Load balancing process is automated by employing dynamic scheduling policies using the proposed framework The framework is scalable and extensible for a variety of dynamic scheduling policies We have shown that the proposed framework is viable by mapping two applications involving financial computation 36
37
Conclusions Future works The automation for design development in this framework Applications involving data-dependency will be tested They also intend to collaborate with other institutes to form a “cluster of heterogeneous clusters” in solving practical scientific problems 37
38
Thanks for your Attention! 38
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.