Fire Benchmark Parallelisation Programming of Supercomputers WS 11/12 Sam Maurus
What is Fire Benchmark? CFD solver for arbitrary geometries This project concerned itself with the gccg solver
How Fast is Fire Benchmark Sequentially?
What effect does the input file- format have?
Data structures in gccg Points Elements
Data structures in gccg x y z points array
Data structures in gccg elems array
Data structures in gccg lcc array
Data distribution approach Process 0 (root)Process 1 Process 2 Process 3 Root Process Tasks: Read input file Partition elements using chosen approach Create and send relevant mapping arrays to each processes Broadcast common data package to each processor = lcc, ne, epart, countPart, bs_local, be_local …
Communication model
P3 Communication model has_ghost_neighbour array P3 has_ghost_neighbour = 0has_ghost_neighbour = 1 P5
Communication model Process x Process 0Process 1 Process k (k = count) … Computational loop, phase one: Start Isend to required processes ( where cellCountsToSend[i] > 0) Start Irecv from required processes ( where cellCountsToRecv[i] > 0) Process local elements that have no ghost neighbours Wait on all requests Update remaining local elements
Communication model
Problems overcome MPI_WAIT FUNCTION Problem: MPI_Wait was being executed both for the send and receive requests for every element processed Solution: has_ghost_neighbour array introduced, allowing for intermediate computation. MPI_Wait then only called once for each request. BEFOREAFTER
Problems overcome REDUNDANT REPROCESSING OF INPUT FILE Problem: Input file was being read once at initialisation and again for writing the result (redundant) Solution: ‘Write solution’ code was refactored to re-use the relevant file information obtained from the first read BEFOREAFTER
Speedup – cojack
Speedup – pent
Speedup – drall
Speedup – tjunc
Speedup – full execution
Thanks for listening Discussion time!