Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.

Similar presentations


Presentation on theme: "Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical."— Presentation transcript:

1 Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison 1 Presented at ISCA 2012

2 Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 2

3 Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 3

4 Department of Computer Science CPU Evolution Retrospective IBM 360 era – precise exceptions as a performance tradeoff However, two key shifts in processor design – Virtual memory no longer optional Speculative execution on ILP processors 4

5 Department of Computer Science 5 Precise exception handling and speculation was a key enabler for modern CPUs

6 Department of Computer Science GPU Architectural trends Significant interest in supporting demand paging Emerging necessity for supporting speculation More workloads – “irregular” workloads Handling reliability problems 6 A single unified CPU-GPU address space

7 Department of Computer Science 7 Need general purpose exception and speculation support for GPUs

8 Department of Computer Science Why not just borrow CPU ideas? 8 CPUs use buffering to preserve arch. state Future file, History file, Re-order Buffer … But GPUs have 1000x as many registers Not practical!

9 Department of Computer Science Fundamental Challenges 9 1.Well defined restart point in program GPU pipeline and SIMT model make this hard 2.Preserving architecture state prior to restart Need to save 1000s of registers

10 Department of Computer Science Key Ideas of our Solution 10 1.Well defined restart point in program Idempotent code regions Restartable regions producing same effect 2.Preserving architecture state prior to restart Regions constructed with small live state: 1 to 3 regs Save only this live state Creation of restart points Preservation of necessary state

11 Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 11

12 Department of Computer Science Exception Support Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness 12 Implicit checkpoints using idempotence A B Exception handler B Creation idea

13 Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation 13

14 Department of Computer Science Context Switch 14 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ?   Page-fault handling B ?

15 Department of Computer Science Context Switch 15 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ?   Page-fault handling B ?

16 Department of Computer Science Context Switch Must save and restore architectural state But...GPUs have megabytes of register state Save only live state Save only live state at points of minimal live state

17 Department of Computer Science Context Switch Must save and restore architecture state But...GPUs have megabytes of register state Save only live state Save state at points of minimal live state 17 Implicit minimum live state checkpoints using idempotence A B B # live registers 23 Candidate cut point 942 B # live registers 2 Exception handler Preserve idea

18 Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 18

19 Department of Computer Science Speculation Speculation generates state that is wrong Need even more buffers Recall: buffers are impractical for GPUs Use idempotence! Reduce re-execution cost by sub-dividing regions 19 Implicit checkpoints with low re-execution overhead using idempotence Tuning the Creation idea

20 Department of Computer Science Speculation 20 A B # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12 B1B1 B2B2 B B2B2 CC Misspeculation

21 Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 21

22 Department of Computer Science iGPU Architecture 22 Compiler Hardware Application

23 Department of Computer Science iGPU Architecture - Software Form regions Preserve state 23 Creation idea Preserve idea state preservation register re- assignment, moves and spills region formation region marker instructions Reg. pressure

24 Department of Computer Science iGPU Architecture - Software 24 Source Code Compiler Device Code Generator Device Code Kernel Source Code

25 Department of Computer Science iGPU Architecture - Software 25 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation

26 Department of Computer Science iGPU Architecture - Software 26 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation State preservation

27 Department of Computer Science iGPU Architecture - Hardware 27 … L2 Cache SIMD Processor L1 cache & TLB General Purpose Registers Core Fetch Unit … … Decode RPCs (not to scale) Creation idea

28 Department of Computer Science iGPU Architecture - Hardware 28 General Purpose Registers Restart PC Register (to scale) 2 RPCs per warp - one each for Sparse and Short regions Compare to 1024 GPRs per warp (32 x 32)

29 Department of Computer Science iGPU Architecture - Hardware State preservation handled purely by compiler! Not hardware’s responsibility 29 Preserve idea

30 Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 30

31 Department of Computer Science Evaluation 31

32 Department of Computer Science Evaluation – Voltage Speculation 32

33 Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 33

34 Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 34

35 Department of Computer Science Conclusions Exception support for GPUs is practical Enables better integration with CPUs in CPU-GPU architectures Speculative execution on GPUs Both for performance and reliability presents interesting possibilities in the context of “irregular” workloads 35

36 Department of Computer Science Questions 36


Download ppt "Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical."

Similar presentations


Ads by Google