Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at www.codeplay.com The unique challenges of.

Slides:

Advertisements

Similar presentations

An Overview Of Virtual Machine Architectures Ross Rosemark.

Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Lecture 6: Multicore Systems

Computer Abstractions and Technology

GPU System Architecture Alan Gray EPCC The University of Edinburgh.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.

Higher Computing: Unit 1: Topic 3 – Computer Performance St Andrew’s High School, Computing Department Higher Computing Topic 3 Computer Performance.

University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.

GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.

Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Panda: MapReduce Framework on GPU’s and CPU’s

COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 14 Epilogue: A Look Back at the Journey ©Copyright 2008 Umakishore.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

Computer Hardware and Software Chapter 1. Overview Brief History of Computers Hardware of a Computer Binary and Hexadecimal Numbers Compiling vs. Interpreting.

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

©2003/04 Alessandro Bogliolo Computer systems A quick introduction.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Ranga Rodrigo. The purpose of software engineering is to find ways of building quality software.

Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.

Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.

Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.

GPU Architecture and Programming

CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.

1 Latest Generations of Multi Core Processors

Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.

CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.

Programming Fundamentals Lecture No. 2. Course Objectives Objectives of this course are three fold 1. To appreciate the need for a programming language.

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Full and Para Virtualization

Introduction Why are virtual machines interesting?

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.

McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 OS 1.

1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.

1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.

Virtual Machines Mr. Monil Adhikari. Agenda Introduction Classes of Virtual Machines System Virtual Machines Process Virtual Machines.

System Programming Basics Cha#2 H.M.Bilal. Operating Systems An operating system is the software on a computer that manages the way different programs.

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

OPERATING SYSTEMS DO YOU REQUIRE AN OPERATING SYSTEM IN YOUR SYSTEM?

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Virtualization Neependra Khare

Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.

10/2/20161 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam King,

CS427 Multicore Architecture and Parallel Computing

A Closer Look at Instruction Set Architectures

Texas Instruments TDA2x and Vision SDK

Virtual Machines (Introduction to Virtual Machines)

Java Programming Introduction

Introduction to Virtual Machines

Introduction to Virtual Machines

6- General Purpose GPU Programming

Presentation transcript:

Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of producing compilers for GPUs Andrew Richards

The GPU is taking over from the CPU Why? How? And what does this mean for the compiler developer?

Growth of the GPU in HPC Source: NVIDIA GPU Computing taking over Supercomputing conference floor

The growth of the GPU in mobile: Apple’s A4-A6X Source: Chipworks analysis/resources/recent-teardowns/2012/03/the-apple-a5x-versus-the-a5- and-a4-%E2%80%93-big-is-beautiful/ GPU CPU GPU CPU GPU A4 A5 A5X A6 A6X

What is all this power being used for? Motion blur Depth of field Bloom 1920x1080x60fps x 3 (RGB) x 4x4 (sample) x 4 (flops) = ~23 GFLOPS & ~23GB/s This is just a simple example! Source: Guerrilla Games, Killzone 2

Why is this happening? 1.Because once software is parallel, it might as well be very parallel – The ease of programming reason 2.Because GPUs run existing graphics software much faster, whereas CPUs only run existing parallel software faster – The business reason Because of power consumption

History of Power consumption We have probably hit peak power consumption with current console generation. Unlikely to hit >180W launch of next console generation. Also, hit peak clock frequency. Increases above 3.2GHz will happen slowly. Therefore, all future increases in performance will come from parallelism Power consumption over timeIncrease in CPU clock frequency over time

How do we keep GPU power efficiency high? Cost of data movement is much higher than computation cost GPUs control data movement distances carefully Preserve locality explicitly instead of caching Source: NVIDIA: Bill Dally’s presentation at SC10

What does this mean for the compiler developer? CPUs Widely understood and standardized Can test by running existing software Instruction sets only add new instructions Separated from hardware by OS Only data-movement compiler needs to handle is register/mem GPUs New technologies and standards every year Need to write new test software for new features New GPUs completely change ISAs Compilers, drivers and OS tightly integrated and developed rapidly Need to handle data movement explicitly

New Technologies and Standards New graphics standards need to be implemented very fast to be competitive Need to write new front- ends, libraries and runtimes very quickly OpenCL/OpenGL DirectX/C++ AMP/ HLSL/DirectCompute Renderscript Proprietary graphics technologies

Need to write new tests for new features When writing a compiler for existing language, can run existing software as tests With a new standard, need to write new tests GPUs have varying specifications of accuracy, meaning testing needs to show whether ‘good enough’ Tests need to cover full graphics pipeline, as well as compute capability, so not just purely compiler tests Graphics and compiler test processes are very different

New GPUs completely change ISAs GPUs are programmed in high-level languages, or in virtual ISAs – So can change ISA and run old software – But correctness is a critical problem Need to write GPU back-ends very fast (1-2 years, instead of 1-20 years of CPU back-ends…) GPU back-ends are complex because of extent of optimizations for power and area

Compilers, drivers & OS tightly integrated We have not standardized the interface between GPU compilers and the OS or drivers – Instead, we standardize the API, compiler and driver as a whole CPU compilers can be written independently of the OS (mostly) and with little to no runtime API – But GPU compilers must be written in tandem with runtime API, driver and OS

Need to handle data movement explicitly Register allocation in a GPU compiler is complex because of trade-offs for power and area – Typically there are multiple register files with different rules Memory handling is more complex – Typically there are multiple memory spaces with different instructions – Affects both compiler front-end and back-end

What problems is Codeplay working on? Higher-level C++ programming model for GPUs – Generic programming: parallel reduce algorithms – Abstracting details of GPU hardware: memory sizes, tile sizes, execution models – Data structures shareable between host and device – Performance portability – Standardization

Conclusions GPU compilers are little understood but critical to future innovation and performance Don’t forget that GPUs are mostly for graphics!

Questions?