Performance Counter Based Architecture Level Power Modeling ( ) MethodologyResults Motivation & Goals Processor power is increasing.

Slides:



Advertisements
Similar presentations
Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.
Advertisements

Branch prediction Titov Alexander MDSP November, 2009.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Performance What differences do we see in performance? Almost all computers operate correctly (within reason) Most computers implement useful operations.
Computer Abstractions and Technology
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Current-Mode Multi-Channel Integrating ADC Electrical Engineering and Computer Science Advisor: Dr. Benjamin J. Blalock Neena Nambiar 16 st April 2009.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Background: Scan-Based Delay Fault Testing Sequentially apply initialization, launch test vector pairs that differ by 1-bit shift A vector pair induces.
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Alex Shye, Berkin Ozisikyilmaz, Arindam Mallik, Gokhan Memik, Peter A. Dinda, Robert P. Dick, and Alok N. Choudhary Northwestern University, EECS International.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
Low Power Techniques in Processor Design
Chalmers University of Technology FlexSoC Seminar Series – Page 1 Power Estimation FlexSoc Seminar Series – Daniel Eckerbert
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Graduate Category: Engineering and Technology Degree Level: Ph.D. Abstract ID# 122 On-Chip Spectral Analysis for Built-In Testing and Digital Calibration.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Computer Performance Computer Engineering Department.
CAD for Physical Design of VLSI Circuits
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
Low Power Cache Design M.Bilal Paracha Hisham Chowdhury Ali Raza.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces Henry Cook Department of Electrical Engineering and Computer Science.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Issue Logic and Power/Performance Tradeoffs Edwin Olson Andrew Menard December 5, 2000.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Runtime Software Power Estimation and Minimization Tao Li.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CSCI206 - Computer Organization & Programming
Evaluating Register File Size
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
Application-Specific Customization of Soft Processor Microarchitecture
Hyperthreading Technology
CSCI206 - Computer Organization & Programming
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
A High Performance SoC: PkunityTM
Adapted from the slides of Prof
Fixed-point Analysis of Digital Filters
Application-Specific Customization of Soft Processor Microarchitecture
Funded by the Horizon 2020 Framework Programme of the European Union
Presentation transcript:

Performance Counter Based Architecture Level Power Modeling ( ) MethodologyResults Motivation & Goals Processor power is increasing  power management is a “grand challenge” in the semiconductor roadmap (ITRS) Processor architects need accurate architecture-level power models Low-overhead solutions are preferable Potential applications: Hints for low-power compilers/embedded programmers to reduce power consumption. Guidance for processors designers seeking to reduce power “Zero-overhead temperature sensing” for thermal reliability-driven processor throttling (dynamic voltage and frequency scaling) Power Measurement Platform Puneet Sharma ( ) Advisor: Prof. Andrew B. Kahng Electrical & Computer Engineering Joint work with Mr. John Seng and Prof. Dean Tullsen, UCSD CSE department Abstract Modern microprocessors have built-in performance counters that are used primarily for compiler and processor optimization. We investigate whether built-in performance counters can also be used to predict the amount of power consumed by the processor. This poster reports early efforts toward correlation of processor power consumption to increments in performance counters, via statistical model fitting. Only certain subsets of counters may be collected simultaneously due to limitations imposed by the collection method Multiple runs required to collect all counters Different runs collect counters at different time instants  need to synchronize Counter  Micro-operations retired  j+1  j-1 jj  0 i-1 0i0i Interpolated Problem 1: Counter-Counter Synchronization Number of micro-operations retired is collected in all runs and used as a “timeline” Counter values are linearly interpolated from all runs to match the first run (“reference run”) “Poorly interpolable” counters put in reference run Solution: Counter values and power values are collected on different systems which are not synchronized Need to synchronize counters and power to know which counter readings correspond to each power reading Problem 2: Counter-Power Synchronization Solution: Initial sleep phase introduced  both counter and power readings drop to zero, improving initial alignment of counters and power Power and counter readings time-stamped Sliding time windows of n counter readings considered and energy computed in them UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory Power Consumption Power Estimator Processor Read Counters Related Work Joseph et al. (ISLPED-01) model power consumption of an Intel Pentium Pro based on known maximum power dissipations in microarchitectural structures. The relative contribution of structures to total power is dependent on counter readings, but no claims of accuracy are established. Bellosa et al. (SIGOPS-00) studied several performance counters to demonstrate correlation with total chip power, and estimate energy consumed for each microarchitectural event. Power consumption estimates have also been made using statistics from architecture-level performance simulators, with activities of particular structures used to estimate power. Wattch, SimplePower, Architecture Power Model and AccuPower fall in this category. For example, Wattch has an accuracy of 10%-13% for individual processor structures (compared to actual circuit implementations) and 30% for full chip power (compared to reported maximum full chip power values). Pentium 4 motherboard Voltage Regulator A/D Converter Single Board Computer Data Collection Computer.015 Setup Motherboard Gigabyte GA-8IEXP Processor Intel Pentium GHz 1.5 Vdd, 512KB L2 cache A/D converter TI ADS Hz sample rate 22 bits resolution Experiment We use the SPEC 2000 benchmark suite for all experiments Performance counter values are collected on the processor under test at the rate of 50Hz During the run, the power consumption of the processor under test is read at the rate of 100Hz on another machine Our counter collection method restricts us from collecting all counters simultaneously  we perform multiple runs to collect all counters We form two subsets of the benchmark suite: our model it fitted using the training set and model accuracy is evaluated using the test set Relate energy consumed to increment in performance counters Problem 3: Model-fitting Solution: Linear, quadratic, cubic, etc. regression Cluster analysis Conclusion Training set = 14 Floating Point benchmarks,10 Integer benchmarks Test set = 3 FP and 3 Int benchmarks Linear regression model results (error in total energy consumption per benchmark) shown at right The benchmark gap has maximum error (25.17%) Estimated power vs. Actual power Blue = estimated power, Red = actual power art applu crafty gzip gap lucas Initial results: performance counters can potentially yield accurate models and predictions of processor power consumption More flexible nonlinear regression models may yield improved predictions of power from counter values Counters that could be useful for power prediction are not available E.g., number of divides, multiplies, … Splitting certain counters might be useful Pentium 4 processor contains a counter for the number of floating point operations; more specific counters which count different operations separately might be more useful Can architecture power be estimated accurately using existing performance counters? This project: Study feasibility of power modeling based on built-in performance counters Study effects of architectural events on dynamic power Counter  collected at the black points. Blue points represent interpolated values of  at  (micro-operations retired) corresponding to reference run. Power collected at t’ i, t’ i+1 …t’ i+p, t’ I+p+1. Need to find energy consumed in the time window t k to t k+w (given by the shaded area).