CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi
An naive version with MPI P1 P1 P2 P2 … Pi Pi … PN PN Result:
An naive version with MPI Pi Pi Pi Pi
Processor0 reads input file Processor0 distributes one matrix Processor0 broadcasts the other matrix All processors in parallel Do the multiplication of each piece of data Processor0 gathers the result Processor0 writes result to output file
MPI_Scatter
MPI_Bcast
MPI_Gather
Data generation Data generation in R with package “igraph” Integer in range of [-1000, 1000] Matrix size: Matrix512* * * *4096 File size2.69 MB10.7 MB43.1 MB172 MB
Result Data size: 1024*1024 # ProcessorsExperiments(second)Average(s)Speedup
Result Data size: 1024*1024
Result Data size: 1024*1024
Result Data size: 2048*2048 # ProcessorsTime(s)Speedup
Result Data size: 2048*2048
Result Data size: 2048*2048
Result Data size: 4096*4096 # ProcessorsTime(s)Speedup #DIV/0! 128#DIV/0!
Analysis To see the superlinear speedup increase the computation, which is not dominant enough larger matrix and larger integer However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)
Cannon's algorithm--Example
Cannon's algorithm Still Implementing and debugging No result to share at present
Thank you Questions & Comments?