High Performance Computing (CS 540) Overview and Challenge Jeremy Johnson Dept. of Computer Science Drexel University
High Performance Computing Tools Algorithms FFT (Cooley-Tukey) Integer multiplication (Karatsuba, Shönhage-Strassen) Matrix multiplication (Block, Strassen, Coppersmith-Winograd) Compiler optimization Loop unrolling, fusion Tiling Instruction reordering, CSE High performance computer architecture Instruction level parallelism Memory hierarchy Vectorization (short vector, e.g. SSE) Parallelism (multithreading, multicore, SMP, GPU) Autotuning ATLAS, FFTW, GMP, SPIRAL
The power of a good algorithm
Matrix Multiplication Performance
Challenge of Obtaining Efficient Code Multiple threads: 2x Vector instructions: 3x Memory hierarchy: 5x High performance library development has become a nightmare