Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang gzhang@tju.edu.cn.

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang School of Computer Sci. & Tech. Tianjin University, Tianjin, P. R. China

NVIDIA GPU Computational Structures
Similarities to vector machines: Works well with data-level parallel problems Scatter-gather transfers Mask registers Large register files Differences: No scalar processor Uses multithreading to hide memory latency Has many functional units, as opposed to a few deeply pipelined units like a vector processor

Grid, Thread block, SIMD thread
A Grid is the code that runs on a GPU that consists of a set of Thread Blocks. Multiply two vectors together, each elements long.

Grid, Thread block, SIMD thread
A Grid is composed of Thread Blocks, each with up to 512 elements. A SIMD instruction executes 32 elements at a time In this example Grid has 16 Thread Blocks Since 8192÷512=16 Thread Blocks contain 16 SIMD threads Since 512÷32=16

Thread Block Scheduler
A Thread Block is assigned to a processor by the Thread Block Scheduler. The Thread Block Scheduler has some similarities to a control processor in a vector architecture. It determines the number of thread blocks needed for the loop and keeps allocating them to different multithreaded SIMD Processors until the loop is completed.

Multithreaded SIMD Processor
The figure shows a simplified block diagram of a multithreaded SIMD Processor. It has 16 SIMD lanes.

SIMD Thread Scheduler The SIMD Thread Scheduler includes a scoreboard
Scheduler knows which threads of SIMD instructions are ready to run Scheduler sends them off to a dispatch unit to be run on the multithreaded SIMD Processor It is identical to a hardware thread scheduler in a traditional multithreaded processor, just that it is scheduling threads of SIMD instructions.

Two levels of hardware scheduler
GPU hardware has two levels of hardware schedulers: (1) the Thread Block Scheduler that assigns Thread Blocks to multithreaded SIMD Processors, which ensures that thread blocks are assigned to the processors whose local memories have the corresponding data (2) the SIMD Thread Scheduler within a SIMD Processor, which schedules when threads of SIMD instructions should run

Exercises What is the meaning of Grid in GPUs?
What is the meaning of Thread block in GPUs? What is the meaning of SIMD thread in GPUs? What are the hardware schedulers in GPUs?

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang gzhang@tju.edu.cn.

Similar presentations

Presentation on theme: "Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang gzhang@tju.edu.cn."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang gzhang@tju.edu.cn.

Similar presentations

Presentation on theme: "Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang gzhang@tju.edu.cn."— Presentation transcript:

Similar presentations

About project

Feedback