Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng

Similar presentations


Presentation on theme: "Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng"— Presentation transcript:

1 Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng
FACT: Fast Communication Trace Collection for Parallel Applications through Program Slicing Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng Tsinghua University 11/14/2018

2 Motivation The Importance of Communication Patterns
Optimize the application performance Tuning process placement on non-uniform comm. platform MPIPP[ICS-08], OPP[EuroPar-09] Design better communication subsystems Circuit-switched networks in parallel computing[IPDPS-05] Optimize MPI programs debuggers Communication locality MPIWiz[PPoPP-09] Tsinghua University

3 Communication Pattern
Comm. Patterns: Spatial Volume Temporal Can be acquired from Comm. traces files msg type, size, source, dest, etc. An Example: The spatial and volume attributes of NPB CG program (CLASS=D, NPROCS=64). Tsinghua University

4 Previous Work Previous Work:
Mainly rely on traditional trace collection techniques Instrument original programs  Executed on a full-scale parallel systems  Communication traces are collected at runtime  Communication Pattern Such Tools: ITC/ITA, KOJAK, Paraver, TAU, VAMPIR etc. Tsinghua University

5 Limitations of Previous Work
Huge Resource Requirements Computing resources ASCI SAGE, require processors Memory requirements NPB FT (CLASS=E), more than 600GB memory size Cannot collection comm. traces without full-scale systems Long Trace Collection Time Execute the entire parallel application from the beginning to the end For ASCI SAGE, requires several months to finish Tsinghua University

6 Our Observations Two important observations:
Many important applications do not require communication temporal attributes Process placement optimization Communication locality analysis Most computation and message contents of parallel applications are not relevant to their spatial and volume communication attributes If we can tolerate missing temporal attributes, can we find an efficient method to acquire communication traces? Tsinghua University

7 Our Approach FACT: FAst Communication Trace collection
FACT can acquire comm. traces of large-scale parallel applications on small-scale systems Our idea: Reduce the original program to obtain a program slice at compile time Propose Live-Propagation Slicing Algorithm (LPSA) Execute the program slice to collect comm. traces at runtime Custom communication library FACT combines both static analysis and traditional trace collection methods Tsinghua University

8 Design Overview Two main components: Compilation framework
Input: An MPI program Output: program slice and directives information LPSA Runtime environment Custom MPI comm. Library Comm. Traces Overview of FACT Tsinghua University

9 An Example (Matrix-Matrix Multiplication)
1 program MM 2 include ’mpif.h’ 3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize matrix A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset),size,MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end Fortran Program: C = A * B Tsinghua University

10 After Slicing in FACT The source codes in the red boxes are deleted!
1 program MM 2 include ’mpif.h’ 3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) real A(1,1), B(1,1), C(1,1) 6 call MPI_Init(ierr) 7 [M] call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 [M] call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset),size,MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end The source codes in the red boxes are deleted! Tsinghua University

11 Resource Consumption Resource Consumption of Original Program (Matrix size is N, P is number of processes) Each Worker Process: 3N2 memory 2N3/(P-1) floating point computation 3 communication operations Master Process: 3(P-1) communication operations Tsinghua University

12 Live-Propagation Slicing Algorithm (LPSA)
Program Slice: Slicing Criterion: <p, V> p: is a program point V: is a subset of the program variables Program Slice  A subset of program statements that preserve the behavior of the original program with respect to <p, V> Two Key Points for Complication Framework: Determine slicing criterion Design slicing algorithm Tsinghua University

13 Slicing Criterion in LPSA
Our goal: Preserve Comm. Spatial and Volume Attributes Point-to-Point Communications: msg type, size, source, dest, tag and comm. id Collective Communications: msg type, sending size, receiving size, root id (if exist) and comm. id source, dest, comm. id  Spatial Attributes msg type, msg size  Volume Attributes Tsinghua University

14 Slicing Criterion in LPSA
Comm Variable A parameter of a communication routine in a parallel program, the value of which directly determines the communication attributes of the parallel program Comm Set  Slicing Criterion For a program M, we use a Comm Set to record all the Comm Variables, C(M) C(M) is the Slicing Criterion in LPSA MPI_Send(buf, count, type, dest, tag, comm) buf : initial address of send buffer [Comm] count: number of elements in send buffer [Comm] type : datatype of each buffer element [Comm] dest : rank of destination [Comm] tag : uniquely identify a message [Comm] comm : communication context Tsinghua University

15 Comm Variables and Comm Set
1 program MM 2 include ’mpif.h’ 3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD, myid, ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD, nprocs, ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset), size, MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset), size, MPI_REAL, & source, tag, MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B, size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end C(M) = {(7,myid), (8, nprocs), (24,N), (24,dest), (25, tag), (26, size), (27, dest), (27, tag), (32, size), (33,source), (33, tag), (37,N), (37, master), (37, tag), (39,size), (39, master), (39, tag), (50, size), (50, master), (51,tag)}. Tsinghua University

16 How do we find all the statements and variables that can affect the values of Comm Variables?
Tsinghua University

17 Dependence of MPI Programs
Data Dependence (dd): Can be represented with UD Chains Control Dependence (cd): Can be converted into data dependence Communication Dependence (md): An inherent characteristic for MPI programs Tsinghua University

18 Data Dependence Data Dependence 1 program MM 2 include ’mpif.h’
3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset), size, MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end Data Dependence Tsinghua University

19 Control Dependence Control Dependence 1 program MM 2 include ’mpif.h’
3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset),size,MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end Control Dependence Tsinghua University

20 Communication Dependence
Inherent characteristic for MPI programs due to message passing behavior Statement x in process i is comm. dependent on statement y in process j, if and only if: process j sends a message to process i through explicit communication routines statement x is a receiving operation, statement y is a sending operation (x≠y) Tsinghua University

21 Communication Dependence
1 program MM 2 include ’mpif.h’ 3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset),size,MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end Communication Dependence Tsinghua University

22 Slice Set of An MPI Program
Slice set of an MPI program  Slicing Criterion C(M): Live Variable: Variables can affect the values of any Comm Variable through programs dependences Tsinghua University

23 Slicing Algorithm LPSA Algorithm:
Program Slicing: Backward data flow problem Worklist algorithm Initial Worklist WL[P] Comm Set Compute all the Live Variables iteratively through dd, cd, and md After Slicing Preserve all the statements that define Live Variables and all the MPI statements Mark MPI statements that define Live Variables or have md. with marked MPI statements (will be used at runtime) Tsinghua University

24 Implementation Compilation Framework FACT in Open64
LPSA is implemented in Open64 Compiler DU and UD Chains-PreOPT MOD/REF analysis-IPA CFG and PCG Summary-based IPA Framework FACT in Open64 Tsinghua University

25 Implementation Runtime Environment MPI_Send Routine:
Provide a custom communication library MPI Profile Layer (PMPI) Collect comm. traces from program slice The library will judge the state of MPI statements based on the slicing results MPI_Send Routine: Tsinghua University

26 Evaluation Benchmarks: 7 NPB Programs ASCI Sweep3D
BT, CG, EP, FT, LU, MG and SP NPB-3.3 Data Set  CLASS=D ASCI Sweep3D Solve a three-dimensional particle transport problem Weak-scaling mode Problem size  150*150*150 Tsinghua University

27 Evaluation Platforms Test Platform (32 cores)
4-nodes small-sale system Each node: 2-way Quad-Core Intel E5345, 8GB Memory Gigabit Ethernet Total: 32GB Memory Validation Platform (512 cores) 32-nodes large-scale system Each node: 4-way Quad-Core AMD 8347, 32GB Memory Infiniband Network Total: 1024GB Memory Tsinghua University

28 Validation Validations:
Compare communication traces by FACT on test platform VS. collected by traditional trace collection methods on validation platform Proof of Live-Propagation Slicing Algorithm Tsinghua University

29 Communication Patterns by FACT
Communication Spatial and Volume Attributes Acquired by FACT: Tsinghua University

30 Memory Consumption d FACT collects communication traces on test platform Traditional trace collection methods cannot achieve this on test platform due to memory limitation For example, Sweep3D consumes 1.25GB memory with FACT for 512 processes While the original program consumes GB memory Tsinghua University

31 Execution Time x For example, FACT takes 0.28 seconds for collecting the comm. traces of BT for 64 processes, While the original program takes seconds on validation platform Tsinghua University

32 Application of FACT The sensitivity analysis of communication patterns to key input parameters Key input parameters in Sweep3D: i, j  # of processes = i*j mk  the computation granularity mmi  the angle blocking factor 7 sets of communication traces on test platform Less than 1 second Tsinghua University

33 Application of FACT Communication Locality:
i=8 j=8: Process 8 communicates with Processes 0, 9, 18 frequently i=4 j=16: Process 8 communicates with Processes 4, 9, 12 frequently Tsinghua University

34 Application of FACT Message Size  mk & mmi Tsinghua University

35 Limitations of FACT Limitation:
Absence of Communication Temporal Attributes CAN: Process Mapping, Optimize MPI debugger, Design Better Communication subsystem CAN NOT: Analyze overhead of message transmission Message generation rate Potential Solutions: Analytical methods: (PMaC method) Tsinghua University

36 Related Work Traditional Trace Collection Methods
ITC, KOJAK, TAU, VAMPIR etc. Trace Reduction Techniques Without compression in FACT Can integrate FACT with existing compression methods to reduce communication trace size Symbolic Expression Cannot deal with complex branches, loops etc. Program Slicing Techniques Program debugging, software testing etc. Tsinghua University

37 Conclusions and Future Work
FACT Observation: Most of computation and communication contents are not relevant to communication patterns Efficiently acquire communication traces of large-scale parallel applications on small-scale systems About 1-2 orders of magnitude of improvement Future Work Acquire temporal attributes for performance prediction Tsinghua University

38 Thank you!

39 backup Tsinghua University

40 Live Propagation Slicing Algorithm-LPSA
Tsinghua University

41 Some Considerations for Inter-Procedure
Live Variables can propagate through Global Variables Arguments of Functions Special consideration for inter-procedure analysis MOD/REF Analysis Build precise UD chains Two phase analysis over PCG Top-down and Bottom-up Solve an iterative data flow equation in LPSA Tsinghua University

42 Slicing Results The Results of Example Program:
All the Live Variables: LIVE[P]={(7,myid), (8,nprocs), (27, dest), (33, source), (37,N), (50, size), (50,master), (51, tag), (22, nprocs), (30, nprocs), (13,myid), (13, master), (10, cols), (10,N), (9,N), (9, nprocs)} Slice Sets: S(P) = {3, 7, 8, 9, 10, 11, 12, 13, 22, 30} Marked MPI statements: Lines 7-8 Tsinghua University

43 `buf` is LIVE Variable! LPSA can cover this case:
8:size  Comm Variable dd  7:num LIVE Variable dd  5:num LIVE Variable Line 5 is marked md Line 2 is marked In the worst case, FACT is the same as traditional trace collection tools. Nothing can be sliced! 1 if(myid == 0){ 2 [M] MPI_Send(&num, 1, MPI_INT, 1, 55,...) 3 MPI_Recv(buf, num, MPI_INT, 1, 66,...) 4 }else{ 5 [M] MPI_Irecv(&num, 1, MPI_INT, 0, 55,..., req) 6 MPI_Wait(req,...) 7 size = num 8 MPI_Send(buf, size, MPI_INT, 0, 66,...) 9 } num is LIVE Variable Tsinghua University

44 Communication Dependence
Need to Match Communication Operations Communication Matching is a hard issue! Current method: Simple algorithm to match MPI operations In fact, there is no point-to-point communications Precise methods: Users add some annotations Execute program with small-scale problem size to identify communication dependence More precise algorithm (G. Bronevetsky, 2009 CGO) Tsinghua University

45 Memory Consumption Null micro-benchmark
MPI_Init MPI_Finalize MPI library consumes a certain memory for process management 512 Processes: NULL: 1.04GB mem. EP: 1.11GB mem. CG: 1.22GB mem. Compared with Null micro-benchmark. AVG is the arithmetic mean for all the programs Tsinghua University

46 Execution Time Reasons: More nodes: More Nodes are Used:
Butter limitation of file system Communication Contention BT: MPI_Bcast More nodes: 12 nodes MG: 2.43 sec BT: sec More Nodes are Used: Tsinghua University

47 An Example (Matrix-Matrix Multiplication)
1 program MM 2 include ’mpif.h’ 3 parameter (N = 80) C memory allocation real A(N,N), B(N,N), C(N,N) 6 call MPI_Init(ierr) 7 call MPI_COMM_Rank(MPI_COMM_WORLD,myid,ierr) 8 call MPI_COMM_Size(MPI_COMM_WORLD,nprocs,ierr) 9 cols = N/(nprocs-1) 10 size = cols*N 11 tag = 1 12 master = 0 13 if (myid .eq. master) then 14 C Initialize A and B 15 do i=1, N do j=1, N A(i,j) = (i-1)+(j-1) B(i,j) = (i-1)*(j-1) end do 20 end do 21 C Send matrix data to the worker tasks 22 do dest=1, nprocs-1 offset = 1 + (dest-1)*cols call MPI_Send(A, N*N, MPI_REAL, dest, & tag, MPI_COMM_WORLD, ierr) call MPI_Send(B(1,offset),size,MPI_REAL, & dest, tag, MPI_COMM_WORLD, ierr) 28 end do 29 C Receive results from worker tasks 30 do source=1, nprocs-1 offset = 1 + (source-1)*cols call MPI_Recv(C(1,offset),size,MPI_REAL, & source,tag,MPI_COMM_WORLD,status,ierr) 34 end do 35 else C Worker receive data from master task call MPI_Recv(A, N*N, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) call MPI_Recv(B,size, MPI_REAL, master, tag, & MPI_COMM_WORLD, status, ierr) C Do matrix multiply do k=1, cols do i=1, N C(i,k) = 0.0 do j=1, N C(i,k) = C(i,k) + A(i,j) * B(j,k) end do end do end do call MPI_Send(C, size, MPI_REAL, master, & tag, MPI_COMM_WORLD, ierr) 52 endif 53 call MPI_Finalize(ierr) 54 end Tsinghua University


Download ppt "Jidong Zhai, Tianwei Sheng, Jiangzhou He, Wenguang Chen, Weimin Zheng"

Similar presentations


Ads by Google