Download presentation
Presentation is loading. Please wait.
1
CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi
2
An naive version with MPI P1 P1 P2 P2 … Pi Pi … PN PN Result:
3
An naive version with MPI Pi Pi Pi Pi
4
Processor0 reads input file Processor0 distributes one matrix Processor0 broadcasts the other matrix All processors in parallel Do the multiplication of each piece of data Processor0 gathers the result Processor0 writes result to output file
5
MPI_Scatter
7
MPI_Bcast
9
MPI_Gather
11
Data generation Data generation in R with package “igraph” Integer in range of [-1000, 1000] Matrix size: Matrix512*5121024*10242048*20484096*4096 File size2.69 MB10.7 MB43.1 MB172 MB
12
Result Data size: 1024*1024 # ProcessorsExperiments(second)Average(s)Speedup 1444145374241.81 22320211922211.99 4111019181614.82.82 810989 9.24.54 169911968.84.75 3281087785.23 648888885.23 1281096898.44.98
13
Result Data size: 1024*1024
14
Result Data size: 1024*1024
15
Result Data size: 2048*2048 # ProcessorsTime(s)Speedup 17511 24981.508032 42582.910853 81275.913386 16848.940476 325114.72549 645513.65455 1284815.64583
16
Result Data size: 2048*2048
17
Result Data size: 2048*2048
18
Result Data size: 4096*4096 # ProcessorsTime(s)Speedup 159201 236301.630854 428132.104515 89256.4 167457.946309 3257610.27778 64#DIV/0! 128#DIV/0!
19
Analysis To see the superlinear speedup increase the computation, which is not dominant enough larger matrix and larger integer However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)
20
Cannon's algorithm--Example http://www.vampire.vanderbilt.edu/education-outreach/me343_fall2008/notes/parallelMM_10_09.pdf
21
Cannon's algorithm Still Implementing and debugging No result to share at present
22
Thank you Questions & Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.