Chris Rossbach, Microsoft Research Jon Currey, Microsoft Research Emmett Witchel, University of Texas at Austin HotOS 2011.

Slides:



Advertisements
Similar presentations
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.
Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Figure 1.1 Interaction between applications and the operating system.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Volar Video a video encoding project. Our Customer Volar Video in downtown lex Specialize in live event streaming Architect the software to make that.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Panda: MapReduce Framework on GPU’s and CPU’s
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Computer Organization ANGELITO I. CUNANAN JR. 1. What is Computer?  An electronic device used for storing and processing data.  It is a machine that.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
IS Fall 2009 Chapter 3 8/31/2009. LOOKING AT THE PARTS 8/17/2009IS 2101/01---Fall
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Extracted directly from:
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Chris Rossbach, Microsoft Research Emmett Witchel, University of Texas at Austin September
The 4 functions of a computer are 1.Input 2.Output 3.Storage 4.Processing.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Cg Programming Mapping Computational Concepts to GPUs.
2007 Oct 18SYSC2001* - Dept. Systems and Computer Engineering, Carleton University Fall SYSC2001-Ch7.ppt 1 Chapter 7 Input/Output 7.1 External Devices.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
£899 – Ultimatum Computers indiegogo.com/ultimatumcomputers The Ultimatum.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
CS 346 – Chapter 2 OS services –OS user interface –System calls –System programs How to make an OS –Implementation –Structure –Virtual machines Commitment.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Introduction Why are virtual machines interesting?
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Input/Output of a Computer Presented by: Jose Reyes Jose.
The Internet (Gaming) Windows XP or later 1.7 GHz Intel or AMD Processor 512 MB of RAM DirectX 8.1 graphics card Sound card (These requirements are based.
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
Computer Systems Unit 2. Download the unit specification from moodle or the BTEC website Or alternatively visit ahmedictlecturer.wikispaces.com.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
 Input - A device, such as a keyboard, used to enter information into a computer  Output - Electronic or electromechanical equipment connected to.
Computer Storage. What is Primary Storage? ● Primary storage is computer memory that is directly accessible to the CPU of a computer without the use of.
PC Components Microprocessor - performs all computations RAM - larger RAM memory contains more data Motherboard - holds all the above components Ports.
GPU Architecture and Its Application
Mobile Operating System
THE OPERATION SYSTEM The need for an operating system
Input/Output 1 1.
Reducing OS noise using offload driver on Intel® Xeon Phi™ Processor
CSCI 315 Operating Systems Design
Today’s agenda Hardware architecture and runtime system
Hardware Accelerated Video Decoding in
Graphics Processing Unit
Presentation transcript:

Chris Rossbach, Microsoft Research Jon Currey, Microsoft Research Emmett Witchel, University of Texas at Austin HotOS 2011

Lots of GPUs Must they be so hard to use? We need dataflow…

Lots of GPUs Must they be so hard to use? We need dataflow… …support in the OS

 There are lots of GPUs! ◦ ~ more powerful than CPUs ◦ Great for Halo and HPC, but little else ◦ Underutilized  GPUs are difficult to program ◦ SIMD execution model ◦ Cannot access main memory ◦ Treated as I/O device by OS

 There are lots of GPUs! ◦ ~ more powerful than CPUs ◦ Great for Halo and HPC, but little else ◦ Underutilized  GPUs are difficult to program ◦ SIMD execution model ◦ Cannot access main memory ◦ Treated as I/O device by OS A.These two things are related B.We need OS abstractions (dataflow)

programmer- visible interface OS-level abstractions Hardware interface

programmer- visible interface 1 OS-level abstraction! The programmer gets to work with great abstractions… Why is this a problem?

 We expect traditional OS guarantees: ◦ Fairness ◦ Isolation No user-space runtime can provide these!  No kernel-facing interface ◦ The OS cannot use the GPU ◦ OS cannot manage the GPU  Lost optimization opportunities ◦ Suboptimal data movement ◦ Poor composability

Windows 7 x64 8GB RAM Intel Core 2 Quad 2.66GHz nVidia GeForce GT230 Higher is better CPU scheduler and GPU scheduler not integrated!

Windows 7 x64 8GB RAM Intel Core 2 Quad 2.66GHz nVidia GeForce GT230 Flatter lines Are better

Pipes between filter and detect move data to and from GPU even when it’s already there capture detect filter Point cloud “Hand” events Raw images HID Input  OS #> capture | filter | detect | hidinput & Data crossing u/k boundary Double-buffering between camera drivers and GPU drivers

 Process API analogues  IPC API analogues  Scheduler hint analogues  Abstractions that enable: ◦ Composition ◦ Data movement optimization ◦ Easier programming

 ptask (parallel task) ◦ Have priority for fairness ◦ Analogous to a process for GPU execution ◦ List of input/output resources (e.g. stdin, stdout…)  ports ◦ Can be mapped to ptask input/outputs ◦ A data source or sink (e.g. buffer in GPU memory)  channels ◦ Similar to pipes ◦ Connect arbitrary ports ◦ Specialize to eliminate double-buffering

ptask: detect process: hidinput process: capture usbsrc hid_in hands Computation expressed as a graph Synthesis [Masselin 89] (streams, pumps) Dryad [Isard 07] SteamIt [Thies 02] Offcodes [Weinsberg 08] others… ptask: filter cloud rawimg det_inp = process = ptask = port = channel

ptask: detect process: hidinput process: capture usbsrc hid_in hands ptask: filter cloud rawimg det_inp = process = ptask = port = channel USB  GPU mem GPU mem  GPU mem Eliminate unnecessary communication…

ptask: detect process: hidinput process: capture usbsrc hid_in hands ptask: filter cloud rawimg det_inp = process = ptask = port = channel Eliminates unnecessary communication Eliminates u/k crossings, computation New data triggers new computation

 OS must get involved in GPU support  Current approaches: ◦ Require wasteful data movement ◦ Inhibit modularity/reuse ◦ Cannot guarantee fairness, isolation  OS-level abstractions are required Questions?