Download presentation
Presentation is loading. Please wait.
1
Matrix Multiplication (i,j,k) for I = 1 to n do for j = 1 to n do for k = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor
2
(i,j,k) Memory Map = x i j j k i k
3
Scalar Architecture Registers Cache memory Functional units Functional units Main memory Memory bus
4
Cache lines: matrix stored by rows Stride 1 dimension
5
Matrix Multiplication (i,k,j) Improve Spatial Locality for i = 1 to n do for k = 1 to n do for j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor
6
(i,k,j) Memory Map = x i j j k i k
7
Matrix Multiplication (i,k,j) Improve Temporal Locality =x C11 C12 C13 C21 C22 C23 C31 C32 C33 A11 A12 A13 A21 A22 A23 A31 A32 A33 B11 B12 B13 B21 B22 B23 B31 B32 B33 C11 = A11 x B11 + A12 x B21 + A13 x B31
8
Submatrix Multiplication (i,k,j) for it = 1 to n by s do for kt = 1 to n by s do for jt = 1 to n by s do for i = it to min(it+s-1,n) do for k = kt to min(kt+s-1,n) do for j = jt to min(jt+s-1,n) do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor
9
(i,k,j) Memory Map = x it jt kt it kt s
10
Multiprocessor Architecture Memory bus CPU Cache memory Main memory CPU Cache memory
11
Parallel (i,k,j): Inner loop for i = 1 to n do for k = 1 to n do parfor j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endparfor endfor
12
Parallel (i,k,j): Inner loop memory mapping = x i k i k
13
Parallel (i,k,j): Outer loop parfor i = 1 to n do for k = 1 to n do for j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endparfor
14
Parallel (i,k,j): Outer loop memory mapping = x
15
Parallel (i,k,j): Submatrix parfor it = 1 to n by s do for kt = 1 to n by s do for jt = 1 to n by s do for i = it to min(it+s-1,n) do for k = kt to min(kt+s-1,n) do for j = jt to min(jt+s-1,n) do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endparfor
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.