Download presentation
Presentation is loading. Please wait.
1
By Yequn Zhang, Yu Zhang
2
Contents Introduction Problem Analysis Proposed Algorithm Evaluation
3
Contents Introduction Problem Analysis Proposed Algorithm Evaluation
4
Gaus sian Elimination Forward Elimination Back Substitution
5
Contents Introduction Problem Analysis Proposed Algorithm Evaluation
6
Problem Analysis Data size used by kernels changes continuously Difficult to find an appropriate block size to avoid divergence Block-based approach Assign a certain part of computation running on CPU-leave the irregularity to cpu Manually make the data size changes with a step of block size Block number per grid is easy to set
7
Contents Introduction Problem Analysis Proposed Algorithm Evaluation
8
Forward Elimination A block-based approach Try to avoid divergence Try to use GPU Try to be fine-grained
9
K 1 Find Max Row
10
Swap cpu Now start to eliminate the block of data on cpu
11
Calculate coefficients
12
Elimination on CPU
13
K 1 Calculate Coefficients
14
K2K2 K 2 Elimination on CPU
15
Swap on GPU K3K3 K 3
16
K4K4 Elimination on GPU K 4
17
K5K5 Elimination on GPU K 5
18
Intra-block loop
19
Inter-block loop
20
Last inter-block loop processed on CPU
21
Back Substitution Launch kernel when number of coefficients per row exceeds four block size (64*4=256) A fine-grained way, use a similar way as forward elimination, part on CPU and part on GPU
22
Contents Introduction Problem Analysis Proposed Algorithm Evaluation
23
Block size effect
24
The contribution of swap and find max row Is it necessary to implement every part on GPU?
25
Performance breakdown Contribution of each part to the total performance, including kernels as well as CPU part
26
Speedup
27
Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.