4- Performance Analysis of Parallel Programs
Performance Evaluation of Computer Systems CPU time (response time metric): Depends on the program and compiler efficiency.
Performance Evaluation of Computer Systems 2. MIPS and MFLOPS (throughput metric): depends on the program and type of instructions, no differentiation between floating point add and divide!
Performance Evaluation of Computer Systems 3. Performance of Processors with a Memory Hierarchy: In case of multiple cache levels:
Performance Evaluation of Computer Systems 4. Benchmark Programs: Synthetic benchmarks: small artificial programs that represent a large class of real applications. Such as Whetstone, Dhrystone Kernel benchmarks: small but relevant parts of real applications, such as Livermore Loops, or toy programs like quicksort. Real application benchmarks: several entire programs which reflect a workload of a standard user, called benchmark suites. Such as SPEC benchmarks (System Performance Evaluation Cooperation), and EEMBC benchmarks (EDV Embedded Microprocessor Benchmark Consortium).
Performance Metrics for Parallel Programs parallel runtime Tp(n): the time between the start of the program and the end of the execution on all participating processors; execution of local computations of each participating processor; exchange of data between processors of a distributed address space; synchronization of the participating processors when accessing shared data structures; waiting times occurring because of an unequal load distribution; Parallelization overhead
Performance Metrics for Parallel Programs Cost of a parallel program Cp(n): total amount of work performed by all processors; A parallel program is called cost-optimal if Speedup: Efficiency:
Speedup limit Amdahl’s law: When a (constant) fraction f , 0 ≤ f ≤ 1, of a parallel program must be executed sequentially, if 20% of a program must be executed sequentially, then the attainable speedup is limited to 1/f = 5.
Scalability For a fixed problem size n a saturation of the speedup can be observed when the number p of processors is increased. Efficiency can be kept constant if both the number p of processors and the problem size n are increased. Larger problems can be solved in the same time as smaller problems if a sufficiently large number of processors is employed. Gustafson’s law: for the special case that the sequential program part has a constant execution time, independently of the problem size.