The Problem Finding a needle in haystack An expert (CPU)

The Problem Finding a needle in haystack An expert (CPU)
A group of non-experts (GPU)

Micro-benchmarking GPU micro-architectures
Suhas Thejaswi Muniyappa Department of Computer Science Aalto University

Overview Micro-processor trend Micro-benchmarking
CPU micro-processor trend GPU micro-processor trend Micro-benchmarking Pointer chase Fine-grain pointer chase Piecewise linear fine-grain pointer chase Hardware characteristics

CPU micro-processor trend
Hardware support for advanced instructions. Availability of hardware documentation. Expensive hardware. No significant change in per-core performance over the decade. Parallelize the execution to achieve the speedup.

GPU micro-processor trend
Low hardware cost. High arithmetic and memory bandwidth. Thousands of cores. Built for processing graphics. No hardware support for advanced instructions. Limited documentation of memory hierarchy. How to overcome the limitations of GPUs?

Micro-benchmarking Hacking into the system to reveal hardware details. Using access latency to determine hardware architecture. Details of memory system is necessary to achieve optimal hardware performance.

Pointer chase Saavedra et al. (1996) benchmarking approach for CPUs.
Array element is initialized with index of next memory access. Access latency depends on the stride size. Average memory access latency is stored.

Fine-grain pointer chase
Record and analyze every memory access latency. Mei and Chu (2016) designed fine-grain benchmarks for GPUs. Access latency stored in shared memory. Shared memory not sufficient for large arrays.

Piecewise fine-grain pointer chase
Disk storage After each iteration shared memory contents are stored into disk. Sliding window approach to record access latency.

Hardware characteristics
L1 cache Using the access latency the hardware characteristics are deduced.

Summary GPUs can be used for general purpose computations.
GPUs provide an environment for executing algorithms which can scale. Details of memory system is necessary to achieve optimal hardware performance. Benchmarking reveals characteristics of the hardware, which is not revealed by the hardware manufacturers.

References [1] Mei, X., and Chu, X. Dissecting memory hierarchy through microbenchmarking. IEEE Transaction on Parallel and Distributed Systems Preprint, 99 (2016), 1. [2] Mei, X., Zhao, C., and Chu, X. Benchmarking the memory hierarchy of modern GPUs. Network and Parallel Computing: 11th IFIPWG International Conference Proceedings (NPC) (2014), [3] Saavedra, R.H., and Smith, A.J. Measuring cache and TLB performance and their effect on benchmark runtimes. IEEE transactions on computers 44, 10 (1995), [4] Saavedra, R.H. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking. PhD thesis, university of California, Berkley, 1992.

Questions ?

Thank you

The Problem Finding a needle in haystack An expert (CPU)

Similar presentations

Presentation on theme: "The Problem Finding a needle in haystack An expert (CPU)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Problem Finding a needle in haystack An expert (CPU)

Similar presentations

Presentation on theme: "The Problem Finding a needle in haystack An expert (CPU)"— Presentation transcript:

Similar presentations

About project

Feedback