BWLOCK++: Protecting GPU Kernels on Integrated CPU-GPU Platforms

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
1 Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping Chi-Keung (CK) Luk Technology Pathfinding and Innovation Software.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.
st International Conference on Parallel Processing (ICPP)
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation.
The Structure of the CPU
CUDA and the Memory Model (Part II). Code executed on GPU.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer.
CATIA V6 Live Rendering Need permission from Xavier Melkonian at 3DS before any NDA discussion with CATIA users. NVIDIA/mental images.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
UIUC CSL Global Technology Forum © NVIDIA Corporation 2007 Computing in Crisis: Challenges and Opportunities David B. Kirk.
National Postal Museum – Systems at Work Exhibit Lesson 1 – Handle with Care Presentation.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Use/User:LabServerField Engineer Electrical Engineer Software Engineer Mechanical Engineer Requirements: Small form factor.
Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
GPU Architecture and Programming
OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.
4-Nov-15 Air Force Institute of Technology Electrical and Computer Engineering Object-Oriented Programming Design Topic 1: The Java Environment Maj Joel.
Lecture 8: 9/19/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
MARK WILSON UNIVERSITY OF JOHANNESBURG DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING STUDY LEADER – PROF. FRANCOIS DU PLESSIS.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
Developing the Demosaicing Algorithm in GPGPU Ping Xiang Electrical engineering and computer science.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
CPU-GPU Collaboration for Output Quality Monitoring Mehrzad Samadi and Scott Mahlke University of Michigan March 2014 Compilers creating custom processors.
Performed by:Liran Sperling Gal Braun Instructor: Evgeny Fiksman המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Wei Hong, Feng Qiu, Arie Kaufman Center for Visual Computing and Department of Computer Science, Stony Brook University
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
1 Security Architecture and Designs  Security Architecture Description and benefits  Definition of Trusted Computing Base (TCB)  System level and Enterprise.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
1.5.3 Walkthrough #4 bouncing_ball.py wrapping_ball.py
Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.
Blocked 2D Convolution Ravi Sankar P Nair
Suren Chilingaryan, Andreas Kopmann
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
Image Transformation 4/30/2009
Low Power processors in HEP
Enabling machine learning in embedded systems
Introduction to Parallelism.
Containers in HPC By Raja.
NVIDIA Profiler’s Guide
The Department of Electrical and Computer Engineering welcomes you to…
What is an Operating System?
All-Pairs Shortest Paths
If r(t) = {image} , find r''(t).
Introduction to Micro Controllers & Embedded System Design
Currency Swaps. 1Meaning CurrencySwapsrefertothearrangementwhereprincipaland interestpaymentsinonecurrencyisexchangedforsuch paymentsinanothercurrency.
Department of Computer Science & Engineering, HITEC University, Taxila
Introduction to Operating Systems
Figure 14.1 The modern integrated computer environment
Introduction to Computer Science Seminar - I
Dr.P.Chitra,Professor Department of Computer Science and Engineering
Poster Title Author(s) name
Next Saturday 20/3/2010 Quiz 1.
Evaluate the integral {image}
Introduction to Computer Science
Presentation transcript:

BWLOCK++: Protecting GPU Kernels on Integrated CPU-GPU Platforms Waqar Ali, Heechul Yun Department of Electrical Engineering and Computer Science University of Kansas at Lawrence

Introduction Platforms with integrated GPUs provide excellent SWaP benefits Image Courtesy: http://www.nvidia.com/object/embedded-systems-dev-kits-modules.html NVIDIA Tegra K-1 NVIDIA Tegra X-1 NVIDIA Tegra X-2

Problem Statement Sharing of main memory between CPU and GPU can be harmful (a) Solo Execution (a) Co-run Execution

Solution: BWLOCK++ Step-1 Step-2 Step-3 Limit the bandwidth of non-RT CPU applications Step-2 Periodically throttle the CPU bandwidth offenders Step-3 Ensure that system is not stagnant