Download presentation
Presentation is loading. Please wait.
Published byMadeline Cherry Modified over 9 years ago
1
Strassen's Matrix Multiplication Presented By: Gaurav Jain Lalchand Course Project On : Under The Guidance Of: Prof. Subodh Kumar
2
Basic Matrix Multiplication Suppose we want to multiply two matrices of size N x N: for example A x B = C. C 11 = a 11 b 11 + a 12 b 21 C 12 = a 11 b 12 + a 12 b 22 C 21 = a 21 b 11 + a 22 b 21 C 22 = a 21 b 12 + a 22 b 22 2x2 matrix multiplication can be accomplished in 8 multiplication.(2 log 2 8 =2 3 )
3
Strassens’s Matrix Multiplication
4
P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 )
5
Strassens’s Matrix Multiplication P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 2 = (A 21 + A 22 ) * B 11 P 3 = A 11 * (B 12 - B 22 ) P 4 = A 22 * (B 21 - B 11 ) P 5 = (A 11 + A 12 ) * B 22 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) C 11 = P 1 + P 4 - P 5 + P 7 C 12 = P 3 + P 5 C 21 = P 2 + P 4 C 22 = P 1 + P 3 - P 2 + P 6
6
Strassens’s Matrix Multiplication Ref : Accelerating High Performance Applications with CUDA and MPI
7
Why MPI + CUDA ?.. ➢ Equations naturally suitable for CUDA environment ➢ Incapability of CUDA : No inter GPU communication. ➢ MPI : Data distributing mechanism ➢ CUDA : Main Execution Engine
8
MPI + CUDA
9
➢ Divide the input matrix into four equal parts ➢ Send the appropiate part to the corresponding process ➢ Each process compute the corresponding equation Node Contains GPU Use kernels on their own GPU to compute result Steps Performed
10
➢ Divide the input matrix into four equal parts ➢ Send the appropiate part to the corresponding process ➢ Each process compute the corresponding equation ➢ Process will send their result to the head process of equation ➢ All Heads collect data ➢ Head will compute C's equation ➢ All head send their partial result to master node ➢ Master will combine & display the result Steps Performed
11
P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 5 = (A 11 + A 12 ) * B 22 P 1 = (A 11 + A 22 )(B 11 +B 22 ) P 5 = (A 11 + A 12 ) * B 22 P 2 = (A 21 + A 22 ) * B 11 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 2 = (A 21 + A 22 ) * B 11 P 6 = (A 21 - A 11 ) * (B 11 + B 12 ) P 3 = A 11 * (B 12 - B 22 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) P 3 = A 11 * (B 12 - B 22 ) P 7 = (A 12 - A 22 ) * (B 21 + B 22 ) P 4 = A 22 * (B 21 - B 11 ) Detailed Description – Step 1
12
P 1, P 5 P 2, P 6 P 3, P 7 P4P4 P4P4 Detailed Description – Step 2
13
P 1, P 5 P 2, P 6 P3, P7 P4P4 P4P4 Declare Result Detailed Description – Step 3
14
Experimental Result - 1
15
Experimental Result - 2
16
Experimental Result - 3
17
References : Accelerating High Performance Applications with CUDA and MPI : N. P. Karunadasa & D. N. Ranasinghe Strassen’s Matrix Multiplication on GPUs : Junjie Li, Sanjay Ranka
18
Thanks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.