Download presentation
Presentation is loading. Please wait.
Published byBarrett Cater Modified over 9 years ago
1
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison 1 Presented at ISCA 2012
2
Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 2
3
Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 3
4
Department of Computer Science CPU Evolution Retrospective IBM 360 era – precise exceptions as a performance tradeoff However, two key shifts in processor design – Virtual memory no longer optional Speculative execution on ILP processors 4
5
Department of Computer Science 5 Precise exception handling and speculation was a key enabler for modern CPUs
6
Department of Computer Science GPU Architectural trends Significant interest in supporting demand paging Emerging necessity for supporting speculation More workloads – “irregular” workloads Handling reliability problems 6 A single unified CPU-GPU address space
7
Department of Computer Science 7 Need general purpose exception and speculation support for GPUs
8
Department of Computer Science Why not just borrow CPU ideas? 8 CPUs use buffering to preserve arch. state Future file, History file, Re-order Buffer … But GPUs have 1000x as many registers Not practical!
9
Department of Computer Science Fundamental Challenges 9 1.Well defined restart point in program GPU pipeline and SIMT model make this hard 2.Preserving architecture state prior to restart Need to save 1000s of registers
10
Department of Computer Science Key Ideas of our Solution 10 1.Well defined restart point in program Idempotent code regions Restartable regions producing same effect 2.Preserving architecture state prior to restart Regions constructed with small live state: 1 to 3 regs Save only this live state Creation of restart points Preservation of necessary state
11
Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 11
12
Department of Computer Science Exception Support Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness 12 Implicit checkpoints using idempotence A B Exception handler B Creation idea
13
Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation 13
14
Department of Computer Science Context Switch 14 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ? Page-fault handling B ?
15
Department of Computer Science Context Switch 15 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ? Page-fault handling B ?
16
Department of Computer Science Context Switch Must save and restore architectural state But...GPUs have megabytes of register state Save only live state Save only live state at points of minimal live state
17
Department of Computer Science Context Switch Must save and restore architecture state But...GPUs have megabytes of register state Save only live state Save state at points of minimal live state 17 Implicit minimum live state checkpoints using idempotence A B B # live registers 23 Candidate cut point 942 B # live registers 2 Exception handler Preserve idea
18
Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 18
19
Department of Computer Science Speculation Speculation generates state that is wrong Need even more buffers Recall: buffers are impractical for GPUs Use idempotence! Reduce re-execution cost by sub-dividing regions 19 Implicit checkpoints with low re-execution overhead using idempotence Tuning the Creation idea
20
Department of Computer Science Speculation 20 A B # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12 B1B1 B2B2 B B2B2 CC Misspeculation
21
Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 21
22
Department of Computer Science iGPU Architecture 22 Compiler Hardware Application
23
Department of Computer Science iGPU Architecture - Software Form regions Preserve state 23 Creation idea Preserve idea state preservation register re- assignment, moves and spills region formation region marker instructions Reg. pressure
24
Department of Computer Science iGPU Architecture - Software 24 Source Code Compiler Device Code Generator Device Code Kernel Source Code
25
Department of Computer Science iGPU Architecture - Software 25 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation
26
Department of Computer Science iGPU Architecture - Software 26 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation State preservation
27
Department of Computer Science iGPU Architecture - Hardware 27 … L2 Cache SIMD Processor L1 cache & TLB General Purpose Registers Core Fetch Unit … … Decode RPCs (not to scale) Creation idea
28
Department of Computer Science iGPU Architecture - Hardware 28 General Purpose Registers Restart PC Register (to scale) 2 RPCs per warp - one each for Sparse and Short regions Compare to 1024 GPRs per warp (32 x 32)
29
Department of Computer Science iGPU Architecture - Hardware State preservation handled purely by compiler! Not hardware’s responsibility 29 Preserve idea
30
Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 30
31
Department of Computer Science Evaluation 31
32
Department of Computer Science Evaluation – Voltage Speculation 32
33
Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 33
34
Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 34
35
Department of Computer Science Conclusions Exception support for GPUs is practical Enables better integration with CPUs in CPU-GPU architectures Speculative execution on GPUs Both for performance and reliability presents interesting possibilities in the context of “irregular” workloads 35
36
Department of Computer Science Questions 36
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.