Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.

Slides:



Advertisements
Similar presentations
Ahmad Lashgar, Amirali Baniasadi, Ahmad Khonsari ECE, University of Tehran, ECE, University of Victoria.
Advertisements

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.
Idempotent Code Generation: Implementation, Analysis, and Evaluation Marc de Kruijf ( ) Karthikeyan Sankaralingam CGO 2013, Shenzhen.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
G Robert Grimm New York University Virtual Memory.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
UW-Madison Computer Sciences Vertical Research Group© 2010 Relax: An Architectural Framework for Software Recovery of Hardware Faults Marc de Kruijf Shuou.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
Virtual Memory Art Munson CS614 Presentation February 10, 2004.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Memory Management 2010.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Computer Organization and Architecture
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 CSC 2405: Computer Systems II Spring 2012 Dr. Tom Way.
Implementing Virtual Memory in a Vector Processor with Software Restart Markers Mark Hampton & Krste Asanovic Computer Architecture Group MIT CSAIL.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS-Related Hardware.
Memory Management 3 Tanenbaum Ch. 3 Silberschatz Ch. 8,9.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
G53SEC 1 Reference Monitors Enforcement of Access Control.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Embedded Real-Time Systems
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Memory Management Paging (continued) Segmentation
CS161 – Design and Architecture of Computer
From Address Translation to Demand Paging
idempotent (ī-dəm-pō-tənt) adj
Bruhadeshwar Meltdown Bruhadeshwar
Sequential Execution Semantics
Chapter 9: Virtual-Memory Management
NVIDIA Fermi Architecture
Memory Management Paging (continued) Segmentation
Translation Lookaside Buffer
Architectural Support for OS
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Overcoming main memory size limitation
Computer Architecture: A Science of Tradeoffs
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
Computer System Overview
CSE 451: Operating Systems Winter 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 412 Sieg Hall 1.
Architectural Support for OS
Memory Management Paging (continued) Segmentation
Presentation transcript:

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison 1 Presented at ISCA 2012

Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 2

Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 3

Department of Computer Science CPU Evolution Retrospective IBM 360 era – precise exceptions as a performance tradeoff However, two key shifts in processor design – Virtual memory no longer optional Speculative execution on ILP processors 4

Department of Computer Science 5 Precise exception handling and speculation was a key enabler for modern CPUs

Department of Computer Science GPU Architectural trends Significant interest in supporting demand paging Emerging necessity for supporting speculation More workloads – “irregular” workloads Handling reliability problems 6 A single unified CPU-GPU address space

Department of Computer Science 7 Need general purpose exception and speculation support for GPUs

Department of Computer Science Why not just borrow CPU ideas? 8 CPUs use buffering to preserve arch. state Future file, History file, Re-order Buffer … But GPUs have 1000x as many registers Not practical!

Department of Computer Science Fundamental Challenges 9 1.Well defined restart point in program GPU pipeline and SIMT model make this hard 2.Preserving architecture state prior to restart Need to save 1000s of registers

Department of Computer Science Key Ideas of our Solution 10 1.Well defined restart point in program Idempotent code regions Restartable regions producing same effect 2.Preserving architecture state prior to restart Regions constructed with small live state: 1 to 3 regs Save only this live state Creation of restart points Preservation of necessary state

Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 11

Department of Computer Science Exception Support Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness 12 Implicit checkpoints using idempotence A B Exception handler B Creation idea

Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation 13

Department of Computer Science Context Switch 14 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ?   Page-fault handling B ?

Department of Computer Science Context Switch 15 A B Exception is page fault 1.Cleanly remove process 1 ? 2.Start another process and execute 3.Get page from disk concurrently 4.Restore process 1 ? 5.Restart process 1 ?   Page-fault handling B ?

Department of Computer Science Context Switch Must save and restore architectural state But...GPUs have megabytes of register state Save only live state Save only live state at points of minimal live state

Department of Computer Science Context Switch Must save and restore architecture state But...GPUs have megabytes of register state Save only live state Save state at points of minimal live state 17 Implicit minimum live state checkpoints using idempotence A B B # live registers 23 Candidate cut point 942 B # live registers 2 Exception handler Preserve idea

Department of Computer Science Outline Challenges and Implications iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 18

Department of Computer Science Speculation Speculation generates state that is wrong Need even more buffers Recall: buffers are impractical for GPUs Use idempotence! Reduce re-execution cost by sub-dividing regions 19 Implicit checkpoints with low re-execution overhead using idempotence Tuning the Creation idea

Department of Computer Science Speculation 20 A B # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12 B1B1 B2B2 B B2B2 CC Misspeculation

Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 21

Department of Computer Science iGPU Architecture 22 Compiler Hardware Application

Department of Computer Science iGPU Architecture - Software Form regions Preserve state 23 Creation idea Preserve idea state preservation register re- assignment, moves and spills region formation region marker instructions Reg. pressure

Department of Computer Science iGPU Architecture - Software 24 Source Code Compiler Device Code Generator Device Code Kernel Source Code

Department of Computer Science iGPU Architecture - Software 25 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation

Department of Computer Science iGPU Architecture - Software 26 Source Code Compiler Device Code Generator Idempotent Device Code Kernel Source Code Region formation State preservation

Department of Computer Science iGPU Architecture - Hardware 27 … L2 Cache SIMD Processor L1 cache & TLB General Purpose Registers Core Fetch Unit … … Decode RPCs (not to scale) Creation idea

Department of Computer Science iGPU Architecture - Hardware 28 General Purpose Registers Restart PC Register (to scale) 2 RPCs per warp - one each for Sparse and Short regions Compare to 1024 GPRs per warp (32 x 32)

Department of Computer Science iGPU Architecture - Hardware State preservation handled purely by compiler! Not hardware’s responsibility 29 Preserve idea

Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 30

Department of Computer Science Evaluation 31

Department of Computer Science Evaluation – Voltage Speculation 32

Department of Computer Science Outline Motivation and Background iGPU Mechanisms General exception handling Context switching Speculation support iGPU Architecture Software Hardware Evaluation Conclusion 33

Department of Computer Science Executive Summary Compiler/hardware co-design for efficient, general- purpose GPUs Exception support with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings 34

Department of Computer Science Conclusions Exception support for GPUs is practical Enables better integration with CPUs in CPU-GPU architectures Speculative execution on GPUs Both for performance and reliability presents interesting possibilities in the context of “irregular” workloads 35

Department of Computer Science Questions 36