1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.

Slides:



Advertisements
Similar presentations
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Advertisements

Power Reduction Techniques For Microprocessor Systems
 Understanding the Sources of Inefficiency in General-Purpose Chips.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Copyright 2001, Agrawal & BushnellDay-1 PM Lecture 4a1 Design for Testability Theory and Practice Lecture 4a: Simulation n What is simulation? n Design.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Digital Design – Optimizations and Tradeoffs
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
System Partitioning Kris Kuchcinski
VHDL Intro What does VHDL stand for? VHSIC Hardware Description Language VHSIC = Very High Speed Integrated Circuit Developed in 1982 by Govt. to standardize.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Chalmers University of Technology FlexSoC Seminar Series – Page 1 Power Estimation FlexSoc Seminar Series – Daniel Eckerbert
1 System-level Power Estimation and Optimization Chong-Min Kyung KAIST.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Extreme Makeover for EDA Industry
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 H ardware D escription L anguages Modeling Digital Systems.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Power Estimation and Optimization for SoC Design
Electrical and Computer Engineering University of Cyprus LAB 1: VHDL.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Basics of Energy & Power Dissipation
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
Low Power Processor Design VLSI Systems Lab. 3 월 28 일 박 봉 일.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
IT3002 Computer Architecture
Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Processor Organization and Architecture Module III.
1 of 14 Lab 2: Design-Space Exploration with MPARM.

-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Variable Word Width Computation for Low Power
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Architecture & Organization 1
Architecture & Organization 1
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Circuit Design Techniques for Low Power DSPs
A High Performance SoC: PkunityTM
HIGH LEVEL SYNTHESIS.
Presentation transcript:

1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

2 Software power analysis Objective ; –Compare different programs –Select processors –Optimize software Three level of granularity, (acc. to execution speed, availability & accuracy) –Source code level –Instruction level –BFM (Bus Function Model) level

3 Execution performed on –1) Target processor ; Compile source code & run measure the heat generated to estimate the power? Or monitor (with inserted monitoring instructions, or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power? Dynamic code can be also handled. Minimal disturbance of the overhead code is the key to accuracy

4 Execution performed on –2) Another processor ; Run a program estimating the power consumption with the target-compiled code as input data. Only the power consumption of the static code can be estimated. –3) Simulator ; Either in source code level, Or instruction code level (same as ‘Another Processor’)

5 Power estimation of Software Simplest approach ; Energy consumption is proportional to the program execution time. Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency). –Measurement done by running long loops of the same instruction

6 Power estimation of Software Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..) Accurate estimation requires software profiling on ISS with bus access pattern. A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]

7 Algorithmic-level power estimation Algorithmic-level power estimation consists of –Architecture estimation –Activation estimation –Power model evaluation Architecture estimation by High-Level Synthesis (HLS) –Allocation, Scheduling, and Binding (Allocation in narrow sense is ‘unit selection’, where each operation can be performed by more than one unit.) –Allocation and Scheduling affect each other. HLS considering communication (interconnect) –ASB + floorplanning –Cycle time violation check based on wire delay (based on wire length estimation) (HLS considering interconnect) and power

8 Target architecture of HLS –Datapath <- dataflow of CDFG –Controller <- dataflow and control flow –Clock tree

9 Target architecture of HLS Architecture synthesis = –Schedule the operations under timing & resource constraints, and –Allocate the required resources (operation units) Operation unit can be arithmetic module, logic module or memory module. –Output of architecture synthesis is A set of operation units Registers Steering logic to transfer data between operation units and registers, and Controller having control signals to steer MUX, OU and Enable signal of registers How to integrate power optimization into HLS?

10 RTL Power Modeling RTL Power Modeling = Constructing a model Power=P(X 1,X 2,…X n ) from n model parameters

11 Issues of RTL power modeling Granularity ; Choice of model parameters ; –Activity model or complexity model or both? Semantic of the model ; –cumulative or cycle-accurate? How to build and store the model ; –Top-down or bottom-up? –Table or equation?

12 Model granularity Model granularity ; –Should not be too big; E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible –Not too small; –FSMD (FSM with datapath) is a reasonable choice, as RTL design is an interaction of datapath and controller Five main components ; –Controller –Register file –Bus –Memory –Functional blocks

13

14 Activity model or Complexity model, or both? Model Parameters ; –What parameters are to be included in the model? –Model parameters must be observable at the RTL P total = k A i C i ; Power model decoupled into two separate models, i.e., activity model and capacitance model Activity model or Complexity model, or both? –Complexity model can be just capacitance model or include transistor count as well to account for the leakage current.

15 Activity parameters RTL activity : an approximation of all intra-clock cycle activities projected to the relevant clock transition point. Main parameters are static and transition probabilities –Choose between bit-wise and word-wise probability according to the desired accuracy and speed n-input, m-output component has (n+m) bitwise parameters, while has only two word-wise parameters Additional parameters; –Transition density ; average switching rate per second Includes non-periodic signals –Correlation measures ; useful for computing switching power Spatial correlation Temporal correlation = transition probability –Entropy ; somewhat similar to transition probability (2p(1-p) plog 2 (1/p)+(1-p)log 2 (1/(1-p))

16 Complexity parameters Capacitance ~ gate count, TR count,. Only complexity parameters available at RTL are –Width of a component ; # of inputs, outputs –# of states ; applicable for controller Architecture-specific model –k 12 N 2 for NxN multiplier –k 2 N for ripple carry adder

17 Model semantics Cumulative (average) vs. cycle-accurate ; –Cumulative power = summation of average (cumulative) power over module –Cycle-accurate power = summation of power over module for each clock cycle Cumulative power is only as good as tracking battery time, average heat dissipation, etc. Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis. Pseudo-cycle-accurate power estimation may be okay for dynamic power management.

18 How to build and store the model Model construction –Top-down ; good for When the implementation follows some predictable template, e.g., memory When dealing with a new circuit having no measured data available –Bottom-up ; Can be equation-based –Template for the power model is given first, –Statistical techniques are used to fit the measured values to the model by adjusting cofficients Model storage –Equation-based –Table-based

19 Accuracy issue Metric ; E = lP e -Pl/max(P e,P) Average error Standard deviation

20 Macro modeling flow 1.Choose model parameters - Ex) Average switching activity of inputs and/or outputs 2.Design training set –Good coverage, unbiasedness, resembling actual circumstantial conditions 3.Characterization –Running the power-accurate lower-level simulator For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities 4.Model extraction For Equation-based, run LMS regression engine For table-based, merge entries according to the available table space

21