Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.

Slides:



Advertisements
Similar presentations
Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Advertisements

Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
st International Conference on Parallel Processing (ICPP)
OpenFOAM on a GPU-based Heterogeneous Cluster
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
Energy Model for Multiprocess Applications Texas Tech University.
K-Ary Search on Modern Processors Fakultät Informatik, Institut Systemarchitektur, Professur Datenbanken Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Project Mentor – Prof. Alan Kaminsky
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
JPCM - JDC121 JPCM. Agenda JPCM - JDC122 3 Software performance is Better Performance tuning requires accurate Measurements. JPCM - JDC124 Software.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
Joint Illumination-Communication Optimization in Visible Light Communication Zhongqiang Yao, Hui Tian and Bo Fan State Key Laboratory of Networking and.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
P-Percent Coverage Schedule in Wireless Sensor Networks Shan Gao, Xiaoming Wang, Yingshu Li Georgia State University and Shaanxi Normal University IEEE.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.
Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Author : Cedric Augonnet, Samuel Thibault, and Raymond Namyst INRIA Bordeaux, LaBRI, University of Bordeaux Workshop on Highly Parallel Processing on a.
E-MOS: Efficient Energy Management Policies in Operating Systems
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Pages Tianyi Wang, Gang Quan, Shangping.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Introduction to Load Balancing:
Inc. 32 nm fabrication process and Intel SpeedStep.
Resource Aware Scheduler – Initial Results
High Performance Computing on an IBM Cell Processor --- Bioinformatics
The Problem Finding a needle in haystack An expert (CPU)
Introduction to Parallelism.
STUDY AND IMPLEMENTATION
Maximizing Speedup through Self-Tuning of Processor Allocation
A workload-aware energy model for VM migration
Presentation transcript:

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin Wang, Yi-Song Lin 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS) Presented by Po-Ting Liu 2013/10/24 1

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Outline Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 2

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 3

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Introduction Importance of energy efficiency Coolingoverhead ReducereliabilityEnlarge system running cost Problem of High power consumption 4

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Introduction (cont.) Related work – Most for homogeneous system – None application-aware 5

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Introduction (cont.) 6

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Motivation Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 7

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Motivation Same power budget – Different partition ratio could produce different performance Different power budgets – The best partition ratio may be different 8

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 9

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms 10

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) 11

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) Definition of schedule unit and work space – A loop iteration in a parallel loop is a basic schedule unit – Work space defined as 12

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) Execution time Total power consumption 13

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) 14

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) Use Lagrange multiplier 15

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) 16

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) The model predict the power usage – Some processors can run at their peak frequency – The frequency of residual processors should be smaller than peak 17

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Mathematical Analyze and Algorithms (cont.) 18

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 19

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment Experimental Environment 20 P.S. One CPU core to manage and schedule the GPU, other cores for executing program

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) Tools – Tuning frequency CPU: ACPI (Advanced Configuration and Power Interface) GPU: AMD’s ADL interface (AMD Display Library) – Performance measure CPU: PCM (Performance Counter Monitor) GPU: Calculate from the speed on CPU and the relative speedup of GPU 21

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) Experimental Application 22 Memory-intensive Compute-intensive

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) 23

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) 24

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) Power Control Accuracy 25

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Experiment (cont.) 26 Baseline: Peak frequency frequency Best Choose

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Conclusion Introduction Motivation Mathematical Analyze and Algorithms Experiment Conclusion 27

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Conclusion Power management for heterogeneous system Application-aware power management Maximize the system performance within a given power budget Improves the performance with 7.3% compared with existing method in average 28

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems 29