Low-power Task Scheduling for GPU Energy Reduction Li Tang, Yiji Zhang.

Slides:



Advertisements
Similar presentations
Linear Equations Review. Find the slope and y intercept: y + x = -1.
Advertisements

Optimization on Kepler Zehuan Wang
Real- time Dynamic Voltage Scaling for Low- Power Embedded Operating Systems Written by P. Pillai and K.G. Shin Presented by Gaurav Saxena CSE 666 – Real.
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida.
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
Performance and Power Analysis on ATI GPU: A Statistical Approach Ying Zhang, Yue Hu, Bin Li, and Lu Peng Department of Electrical and Computer Engineering.
st International Conference on Parallel Processing (ICPP)
Keeping Hot Chips Cool Thermal Management for Green Computing Yang Ge Professor Qinru Qiu.
Investigating the Effect of Voltage- Switching on Low-Energy Task Scheduling in Hard Real-Time Systems Paper review Presented by Chung-Fu Kao.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Chapter 7 Regression and Correlation Analyses Instructor: Prof. Wilson Tang Instructor: Prof. Wilson Tang CIVL 181 Modelling Systems with Uncertainties.
CUDA and the Memory Model (Part II). Code executed on GPU.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Trip report: GPU UERJ Felice Pantaleo SFT Group Meeting 03/11/2014 Felice Pantaleo SFT Group Meeting 03/11/2014.
Nvidia CUDA Programming Basics Xiaoming Li Department of Electrical and Computer Engineering University of Delaware.
Wireless Intelligent Sensor Modules for Home Monitoring and Control Presented by: BUI, Phuong Nhung, 裴芳绒 António M. Silva1, Alexandre Correia1, António.
“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung.
Kernel, processes and threads Windows and Linux. Windows Architecture Operating system design Modified microkernel Layered Components HAL Interacts with.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Power Management of Flash Memory for Portable Devices ELG 4135, Fall 2006 Faculty of Engineering, University of Ottawa November 1, 2006 Thayalan Selvam.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
Power-Performance Simulation and Design Strategies for Single-Chip Heterogeneous Multiprocessors Salah Abdel-Mageid Feb. 28, 2008.
80-Tile Teraflop Network-On- Chip 1. Contents Overview of the chip Architecture ▫Computational Core ▫Mesh Network Router ▫Power save features Performance.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Introduction to String Kernels Blaz Fortuna JSI, Slovenija.
GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
 GPU Power Model Nandhini Sudarsanan Nathan Vanderby Neeraj Mishra Usha Vinodh
Technical Seminar Presentation 2004 Presented by- Geetanjali Konhar EE O81 1 Dynamic power management for embedded system “ Dynamic power management.
© David Kirk/NVIDIA and Wen-mei W. Hwu University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 10 Reduction Trees.
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Author : Cedric Augonnet, Samuel Thibault, and Raymond Namyst INRIA Bordeaux, LaBRI, University of Bordeaux Workshop on Highly Parallel Processing on a.
Class Report 何昭毅 : Voltage Scaling. Source of CMOS Power Consumption  Dynamic power consumption  Short circuit power consumption  Leakage power consumption.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
Martin Kruliš by Martin Kruliš (v1.0)1.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
ECE 692 Power-Aware Computer Systems Final Review Prof. Xiaorui Wang.
全面推开营业税改征 增值税试点政策培训. 什么是营改增? “营改增”中的“营”指的是营业税,“ 增”指的是增值税。对大多数企业来说,增 值税所带来的税负远低于营业税。 减税本身就是积极的财政政策。营改增所 实现的减税,不仅规模大、范围广,它本质 上是一种“结构性减税”,从而构成“结构 性改革”攻坚战中的实招。
Matthew Locke November 2007 A Linux Power Management Architecture.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
CS/EE 217 – GPU Architecture and Parallel Programming
Resource Aware Scheduler – Initial Results
Linchuan Chen, Xin Huo and Gagan Agrawal
Introduction Deregulation of the market: facilities to new producers
CS/EE 217 – GPU Architecture and Parallel Programming
Energy Efficient Scheduling in IoT Networks
Voltage Scaling and Power Management Dynamic Voltage Scaling
Introduction to CUDA.
The University of Adelaide, School of Computer Science
Cases. Simple Regression Linear Multiple Regression.
6- General Purpose GPU Programming
Utsunomiya University
Power-Aware DVFS on PowerPC 405LP: Front Bus Scaling
Presentation transcript:

Low-power Task Scheduling for GPU Energy Reduction Li Tang, Yiji Zhang

Introduction DVFS (dynamic voltage and frequency scaling) implementation Building GPU linear regression power model

DVFS implementation Dynamic Voltage and Frequency Scaling a method to provide variable amount of energy for a task by scaling the operating voltage/frequency. Power & Energy consumption

GPU architecture and linear regression power model On-chip Device Memory GPU linear power model: Total power Maximum power of the i-th component Usage rate of the i-th components Intercept power

Energy measurement NI USB-6216 DAQ+ two FLUKE 80i-110s current clamps Sampling rate: ▫10 readings per millisecond

Preliminary results WAXPY function: ▫W[i]=alpha*X[i]+beta*Y[i] (i: thread number) Kernel launch: ▫WAXPY >> Vector size and type: ▫1,000,000 float Thread*Block1*11*41*161*644*6416*6464*16 WAXPY Time =0.071 WAXPY GPU Power > WAXPY GPU Energy