Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang gzhang@tju.edu.cn.

Similar presentations


Presentation on theme: "Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang gzhang@tju.edu.cn."— Presentation transcript:

1 Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang School of Computer Sci. & Tech. Tianjin University, Tianjin, P. R. China

2 The Roofline model Roofline is a visually intuitive performance model
One visual, intuitive way to compare potential floating-point performance of variations of SIMD architectures is the Roofline model It ties together floating-point performance, memory performance, and arithmetic intensity in a two-dimensional graph. Arithmetic intensity is the ratio of floating- point operations per byte of memory accessed. Floating-point operations per byte read

3 Figure 4.10 Arithmetic intensity
Arithmetic intensity= (the total number of floating-point operations for a program)/(the total number of data bytes transferred to main memory during program execution) Figure 4.10 Arithmetic intensity

4 How to find the peak memory performance?
Peak floating-point performance can be found using the hardware specifications. Many of the kernels in this case study do not fit in on-chip caches, so peak memory performance is defined by the memory system behind the caches. Note that we need the peak memory bandwidth that is available to the processors, not just at the DRAM pins. One way to find the (delivered) peak memory performance is to run the Stream benchmark.

5 Examples of Roofline model
NEC SX-9 is a vector supercomputer Intel Core i7 920 is a multicore computer with SIMD Extensions Note that the graph is a log–log scale, and that Rooflines are done just once for a computer. Figure 4.11 Roofline model for one NEC SX-9 and the Intel Core i7 920

6 How could we plot the peak memory performance?
Since the X-axis is FLOP/byte and the Y-axis is FLOP/sec, bytes/sec is just a diagonal line at a 45-degree angle. We can express the limits as a formula to plot these lines in the graphs: Attainable GFLOPs/sec = Min(Peak Memory BW × Arithmetic Intensity, Peak Floating-Point Perf.)

7 Exercises What is the meaning of Roofline?
What is the meaning of arithmetic intensity? How to find the peak memory performance? How to plot the peak memory performance?


Download ppt "Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 14 The Roofline Visual Performance Model Prof. Zhang Gang gzhang@tju.edu.cn."

Similar presentations


Ads by Google