3 rd Nov. 2008 CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Analysis of power dissipation in embedded systems using real-time operating systems Dick, R.P. Lakshminarayana, G. Raghunathan, A. Jha, N.K. Dept. of Electr.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Chapter 1 and 2 Computer System and Operating System Overview
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
CISC and RISC L1 Prof. Sin-Min Lee Department of Mathematics and Computer Science.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Processor Organization and Architecture
Design methodology.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Approaches to Low-Power Implementations of DSP Systems Class Advisor : Dr. Fakhraie Presentor : Nariman Moezi DSP Design & Implementation Course Seminar.
Presenter: Jyun-Yan Li Effective Software-Based Self-Test Strategies for On-Line Periodic Testing of Embedded Processors Antonis Paschalis Department of.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Module 1.2 Introduction to Verilog
Power Estimation and Optimization for SoC Design
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
COMPUTER ORGANIZATION AND ASSEMBLY LANGUAGE Lecture 19 & 20 Instruction Formats PDP-8,PDP-10,PDP-11 & VAX Course Instructor: Engr. Aisha Danish.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Schreiber, Yevgeny. Value-Ordering Heuristics: Search Performance vs. Solution Diversity. In: D. Cohen (Ed.) CP 2010, LNCS 6308, pp Springer-
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Computer Architecture 2 nd year (computer and Information Sc.)
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.
Pipelining and Parallelism Mark Staveley
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
ALU (Continued) Computer Architecture (Fall 2006).
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
PowerMixer IP : IP-Level Power Modeling for Processors Shan-Chien Fang 1 Jia-Lu Liao 2 Chen-Wei Hsu 2 Chia-Chien Weng 2 Shi-Yu Huang 2 Wen-Tsan Hsieh 3.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
A Survey on Low Power Multiplication / Accumulation Speaker : Byoung-Woon Kim.
Sunpyo Hong, Hyesoon Kim
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
Assembly Language for Intel-Based Computers, 5th Edition
A Review of Processor Design Flow
CoCentirc System Studio (CCSS) by
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Circuit Design Techniques for Low Power DSPs
Presentation transcript:

3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan

3 rd Nov CSV881: Low Power Design2 Outline Estimation problem Estimation at different levels System level Algorithmic level Processor level RT level Gate level Circuit level

3 rd Nov CSV881: Low Power Design3 Power Estimation Problem The objective in Power estimation is similar to other estimation problems; one tries to minimize time for estimation to achieve a certain accuracy or maximize accuracy for a given effort. Hi-fidelity is another objective

3 rd Nov CSV881: Low Power Design4 Abstraction Levels Higher the level of abstraction, it is likely to take less time but also produce lower accuracy Suitable models to speedup estimation at higher levels Primitive operations or structures keep on changing as we move up the abstraction levels

3 rd Nov CSV881: Low Power Design5 System Level Energy estimation in terms of very coarse granularity events e.g. specific tasks initiated by specific triggers or interrupts Estimates for components like memory, buses etc. handled separately Support for system level power management decisions

3 rd Nov CSV881: Low Power Design6 System Level Approaches Early approaches [3] were based on Monte-Carlo simulation. Random input vectors were generated and power data for them generated using simulation Approaches varied in terms of efficiency and accuracy. Some approaches provided for confidence level to be controlled Quality of results depend on the “statistical” properties of the input vectors and their impact on power Difficult to handle various power modes of operation A recent approach for IP power estimation [4] works with hierarchical models

3 rd Nov CSV881: Low Power Design7 Park et al [4] Estimation at different hierarchical levels –Direct tradeoff between accuracy and time for estimation Creation of power models at different levels TLM (Transaction level modeling)

3 rd Nov CSV881: Low Power Design8 Park et al[4] contd H.264 Prediction IP Active IDLE Inter Intra Luma Chroma Luma (16X16) Chroma Luma (4X4) Loc 0 Loc mMod 0Mod n

3 rd Nov CSV881: Low Power Design9 Algorithmic/Behavioral Level Models of RT level components expressed in terms of input characteristics that can be extracted from the behavior –e.g. adder operation energy in terms of word-length and hamming distance of inputs –e.g. memory energy per read or write access Energy estimation based on weighted sum of such basic operations Behavioral transformations to be supported in terms of energy change Prediction of interconnect power consumption to support data transfer is an issue

3 rd Nov CSV881: Low Power Design10 Algorithmic Energy Components Total energy consumed = energy consumed in computation +energy consumed in storage access +energy consumed in data transfer +energy consumed in control (function of allocation as well as binding)

3 rd Nov CSV881: Low Power Design11 Adder Module Characteristics Hamming distance EnergyEnergy

3 rd Nov CSV881: Low Power Design12 Processor Level First proposed by Tiwari et. al [5,6] for software power estimation The methodology is based on measuring power consumption for each instruction and Overall energy consumption is computed by taking a weighted sum of number of instructions of each type. The weighting factor is the power consumption of the individual instructions This approach based on measurements is valid only for a processor which has been fabricated. Sama et.al [7] have modified it to create an instruction level power model with a gate level simulator

3 rd Nov CSV881: Low Power Design13 Vivel Tiwari’s Model[5] Energy cost of an instruction = base cost (measuring current on a repetitive set of identical instructions) +circuit state overhead cost (measuring current on pairs of instruction) + resource constraint cost (to account for stall cycles due to resource contention) + cache energy costs (to account for cache misses) Tiwari observed that at least for CISC processors, operand and data value variations affect less than 3% of the total energy consumption.

3 rd Nov CSV881: Low Power Design14 Lee et al [6] Approach similar to the one proposed in [5]. Processor used is Fujitsu DSP processor instead of DX486 (Intel processor) Base cost of the instructions varies significantly unlike CISC processor Instructions classified into 6 different classes to reduce the size of measurements. (individual as well as pair wise measurements) Power minimization strategies suggested include –“Intelligent” register bank assignment –Instruction packing to reduce cycles –Instruction scheduling to reduce circuit state switching energy –Operand swapping to reduce computation in Booth’s algorithm

3 rd Nov CSV881: Low Power Design15 Sama et al[7] Instruction set model similar to the models proposed by Tiwari[5] Energy numbers obtained through a power simulator rather than actual measurement; thus models possible at design time and can be part of micro-architecture and/or instruction set architecture exploration Considerable speedup over gate-level or circuit- level simulation of the processor model

3 rd Nov CSV881: Low Power Design16 Issues in Instruction Set Power Models Instructions are not executed one at a time –All current processors are deeply pipelined and as many instructions are active concurrently in the pipeline, their interactions should also be accounted for –Tiwari[5] also measured interactions between consecutive instructions from different classes The effect of varying data (as well as address) is ignored in the model –Though can be accounted by an additive factor

3 rd Nov CSV881: Low Power Design17 RT Level Estimation Models of RTL components like adders, comparators, decoders, multiplexers etc. Models based on effective capacitance Switching activity estimated from the RTL code

3 rd Nov CSV881: Low Power Design18 Gate Level Estimation Effective capacitance models at the gate level in the library Switching activity is estimated for a given application (specified as a set of Boolean equations) Switching activity could be based on probabilistic input vector characteristics or actual input vector characteristics

3 rd Nov CSV881: Low Power Design19 Circuit Level Comparison is always with SPICE –How much faster and how close in terms of prediction? Compact set of vectors –How representative are these vectors in terms of actual use? Powermill [3] a popular tool achieves 2 to 3 order speedup while being within 10% accuracy. This is based on event driven timing simulation and uses a simplified table –driven device models.

3 rd Nov CSV881: Low Power Design20 References M. Pedram, “Power Minimization in IC Design”, ACM TODAES, Vol. 1., No. 1, Jan. 1996, pp Macii et al, “High-level Power Modeling, Estimation and Optimization”, DAC 1997 Burch et al, “ A Monte-carlo Approach for Power Estimation”, IEEE TVLSI, Vol. 1, No. 1, Mar pp Park et al, “System Level Power Estimation Methodology with H.264 Decoder Prediction IP Case Study”, pp

3 rd Nov CSV881: Low Power Design21 References (contd) Tiwari et al, “Power Analysis of Embedded Software: A First Step towards Software Power Minimization”, IEEE TVLSI, Vol. 2, No. 4, Dec. 1994, pp Lee et al, “Power Analysis and Minimization Techniques for Embedded DSP Software ”, IEEE TVLSI, Vol. 5, No. 1, Mar 1997, pp Sama et al, “Speeding up Power Estimation of Embedded Software”, ISLPED 2000, pp