Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Low Power Techniques in Processor Design
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
Power Reduction for FPGA using Multiple Vdd/Vth
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Houman Homayoun, Sudeep Pasricha, Mohammad Makhzan, Alex Veidenbaum Center for Embedded Computer Systems, University of California, Irvine,
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Sunpyo Hong, Hyesoon Kim
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
CS203 – Advanced Computer Architecture
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
LOW POWER DESIGN METHODS
Computer Architecture & Operations I
Power-Optimal Pipelining in Deep Submicron Technology
CS203 – Advanced Computer Architecture
Memory Segmentation to Exploit Sleep Mode Operation
Hot Chips, Slow Wires, Leaky Transistors
Morgan Kaufmann Publishers
CMSC 611: Advanced Computer Architecture
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Presentation transcript:

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 4, December 2005, Pages Presented by: Manal Houri March 23, 2006

Computer Science and Engineering Motivation Low-power computing has long been an important design objective for mobile, battery-operated devices. More recently, however, power consumption in high-performance microprocessors has drawn considerable attention from industry and researchers. Traditionally, power dissipation in CMOS technology has been significantly lower than other technologies. However, at current speeds and feature sizes, CMOS power consumption has increased dramatically. This makes microprocessor cooling increasingly difficult and expensive. CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Direction As a result, over the last few years, power has become a first- priority concern to microprocessor designers/manufacturers. Industry and researchers are eyeing chip multiprocessor architectures (CMPs), which can attain higher performance by running multiple threads in parallel. By integrating multiple cores on a chip, designers hope to deliver performance growth while depending less on raw circuit speed (power that is). So far, however, very little work has been done on the power- performance issues involving parallel applications executing on multiprocessors, in general and on multicore chips, in particular. CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Problem In a parallel run, processors synchronize and exchange data as they cooperate toward a common goal. Synchronization and communication constitute overheads that typically grow in importance as we increase the number of processors. This generally results in decreased parallel efficiency—speedup over number of processors used. In an application’s changing parallel efficiency, it is not obvious which voltage and frequency levels should be applied, in combination with the appropriate number of processors, to optimize a certain power-performance trade-off and/or meet a particular constraint. CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Objective of this paper CSE 8383 Advanced Computer Architecture Investigate the power-performance issues of running parallel applications on a CMP: Develop an analytical model to study the effect of: the number of processors used, the parallel efficiency, and the voltage/frequency scaling applied, on the performance and power consumption delivered by a CMP. More specifically: Optimizing power consumption given a performance target, Optimizing performance given a certain power budget, Then, Provide detailed simulations of parallel applications running on a power-performance model of a CMP are conducted.

Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

Computer Science and Engineering Analytical Study – Power Equations CSE 8383 Advanced Computer Architecture This equation establishes the relationship between the supply voltage V and maximum operating frequency f max. f max = η (V − V th ) α / V Where: f max : maximum operating frequency η, α: experimentally derived constants V: supply voltage V th : threshold voltage

Computer Science and Engineering Analytical Study – Power Equations CSE 8383 Advanced Computer Architecture This equation defines power consumption P as the sum of dynamic and static components. P = P D + P S = AC V 2 f + V Where: P D : dynamic component P S : static component A: gate activity factor C: total capacitance V: supply voltage f: operating frequency I leak : leakage current at nominal supply voltage V n and room temperature T std (25°C). is the abbreviation of dependency of this curve-fitted formula on supply voltage and temperature.

Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

Computer Science and Engineering Analytical Study – Performance Equations CSE 8383 Advanced Computer Architecture Execution time formula as proposed by Hennessy and Patterson: t = IC CPI f -1 IC: dynamic instruction count CPI: average number of cycles per instruction Parallel efficiency of an application running on N processors: Nominal parallel efficiency Parallel efficiency

Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture Goal Find the configuration that maximizes power savings while delivering a pre-specified level of performance, in particular, deliver the performance of a sequential execution on one processor at full throttle. In other words, t 1 = t N IC CPI f -1 = IC N CPI N f N -1 Using the definition of nominal parallel efficiency, we rewrite the above as:

Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture Performance can be expressed for N-processor configuration as: If we define the voltage scaling ratio the above can be rewritten as:

Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture We look at 2 process technologies: 130 nm and 65 nm. The operating temperature of the single-core is set at 100°C. Using a 32-way CMP baseline, configurations running on different number of processor cores are studies. Assumption: unused cores are shut down.

Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture 130 nm technology

Computer Science and Engineering CSE 8383 Advanced Computer Architecture 65 nm technology Analytical Study - Scenario I: Power Optimization

Computer Science and Engineering Scenario I: Power Optimization – Experimental Evaluation CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture Goal: Find the configuration that maximizes performance under a constrained power budget. The maximum power budget is set to that of executing on one processor at full throttle, P 1 = P N

Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture The performance gain or speedup S on N processors can be expressed as: Using the definition of voltage ratio and equation of frequency, we get:equation of frequency

Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture In order to compute, we use P 1 = P N : Now we can solve for S having obtained. f

Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Scenario II: Performance Optimization – Experimental Evaluation CSE 8383 Advanced Computer Architecture

Computer Science and Engineering Concluding remarks CSE 8383 Advanced Computer Architecture This study shows that, under the right circumstances, parallel computing may bring significant power-performance benefits over a uniprocessor setup of similar performance. However, it also illustrates the dependency of the optimum operating point on multiple interacting factors, such as: the application’s parallel efficiency, the chip’s voltage/frequency scaling characteristics, the process technology, and restrictions in performance, power, and, sometimes, number of available processors. This study shows that these factors can interact in a nonobvious way, making power-performance optimization of on-chip parallel computation quite challenging.