Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.

Slides:



Advertisements
Similar presentations
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Advertisements

VERILOG: Synthesis - Combinational Logic Combination logic function can be expressed as: logic_output(t) = f(logic_inputs(t)) Rules Avoid technology dependent.
Synchronous Sequential Logic
Combinational Logic.
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
Table 7.1 Verilog Operators.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
Give qualifications of instructors: DAP
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Synchronous Digital Design Methodology and Guidelines
Lower Power Design Guide 성균관대학교 조 준 동 교수
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
Modern VLSI Design 2e: Chapter4 Copyright  1998 Prentice Hall PTR.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
VHDL Coding Exercise 4: FIR Filter. Where to start? AlgorithmArchitecture RTL- Block diagram VHDL-Code Designspace Exploration Feedback Optimization.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza [Adapted.
ECE 2372 Modern Digital System Design
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
L7: Pipelining and Parallel Processing VADA Lab..
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
HYPER: An Interactive Synthesis Environment for Real Time Applications Introduction to High Level Synthesis EE690 Presentation Sanjeev Gunawardena March.
Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.
Topics Combinational network delay.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Introduction to ASIC flow and Verilog HDL
Modern VLSI Design 3e: Chapter 8 Copyright  1998, 2002 Prentice Hall PTR Topics n Basics of register-transfer design: –data paths and controllers; –ASM.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
L10 : Lower Power High Level Synthesis(1) 성균관대학교 조 준 동 교수
Retiming EECS 290A Sequential Logic Synthesis and Verification.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Overview Logistics Last lecture Today HW5 due today
Flip Flops Lecture 10 CAP
Register Transfer Specification And Design
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Introduction to cosynthesis Rabi Mahapatra CSCE617
ECE 551: Digital System Design & Synthesis
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Multiplier-less Multiplication by Constants
A SoC Design Automation Seoul National University
Architectural-Level Synthesis
Levels in Processor Design
Levels in Processor Design
Levels in Processor Design
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Presentation transcript:

Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns in the parse trees. –Example: – a= x+ y; b = a+ 1; c = x+ y; – a= x+ y; b = a+ 1; c = a;

Examples of other transformations Dead-code elimination: –a= x; b = x+ 1; c = 2 * x; –a= x; can be removed if not referenced. Operator-strength reduction: –a= x 2 ; b = 3 * x; –a= x * x; t = x<<1; b = x+ t; Code motion: –for ( i = 1; i < a * b) { } –t = a * b; for ( i = 1; i < t) { }

Strength reduction

Strength Reduction

Control- flow based transformations Model expansion. –Expand subroutine flatten hierarchy. – Useful to expand scope of other optimization techniques. – Problematic when routine is called more than once. – Example: –x= a+ b; y= a * b; z = foo( x, y) ; –foo( p, q) {t =q-p; return(t);} –By expanding foo: –x= a+ b; y= a * b; z = y-x; Conditional expansion Transform conditional into parallel execution with test at the end. Useful when test depends on late signals. May preclude hardware sharing. Always useful for logic expressions. Example: y= ab; if ( a) x= b+d; else x= bd; can be expanded to: x= a( b+ d) + a’bd; y= ab; x= y+ d( a+ b);

Pipelining

Associativity Transformation

FIR Parallelization

FIR PARALLELIZATION

FIR Filter Parallelization

FIR parallelization: two working phases

IIR filter recursive function

Recursive Function

Interlaced Accumulation Programming for Low Power

4. Register Transfer Level Design

FIR3 Block Diagram and Flow Graph

High-Level Power Estimation P core = P DP + P MEM + P CNTR + P PROC P DP = P REG +P MUX +P FU + +P FU, where P REG is the power of the registers P MUX is the power of multiplexers P FU is the power of functional units P INT is the power of physical interconnet capacitance

High-Level Power Estimation: P REG Compute the lifetimes of all the variables in the given VHDL code. Represent the lifetime of each variable as a vertical line from statement i through statement i + n in the column j reserved for the corresponding varibale v j. Determine the maximum number N of overlapping lifetimes computing the maximum number of vertical lines intersecting with any horizontal cut-line. Estimate the minimal number of N of set of registers necessary to implement the code by using register sharing. Register sharing has to be applied whenever a group of variables, with the same bit-width b i. Select a possible mapping of variables into registers by using register sharing Compute the number w i of write to the variables mapped to the same set of registers. Estimate n i of each set of register dividing w i by the number of statements S: i =w i /S; hence TR imax = n i f clk. Power of latches and flip flops is consumed not only during output transitions, but also during all clock edges by the internal clock buffers The non-switching power P NSK dissipated by internal clock buffers accounts for 30% of the average power for the 0.38-micron and 3.3 V operating system. In total,

P CNTR After scheduling, the control is defined and optimized by the hardware mapper and further by the logic synthesis process before mapping to layout. Like interconnect, therefore, the control needs to be estimated statistically. Global control model: Local control model: the local controller account for a larger percentage of the total capacitance than the global controller. Where Ntrans is the number of tansitions, nstates is the number of states, Bf is the bus factor, and Clc is the capacitance switched in any local controller in one sample period. Bf is the ratio of the number of bus accesses to the number of busses.

N trans The number of transitions depends on assignment, scheduling, optimizations, logic optimization, the standard cell library used, the amount of glitchings and the statistics of the inputs.

Behavioral Synthesis loop unrolling : localize the data to reduce the activity of the inputs of the functional units or two output samples are computed in parallel based on two input samples. Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation, The transformation yields critical path of 3, thus voltage can be dropped. Clock Selection : Choose optimal system clock period Eliminate slacks/improve resource utilization and Enable greater voltage scaling Module selection : For each operation, choose library template Flow graph restructuring : pull out operations on the critical cycle.

High-Level Power Estimation: P MUX and P FU

Critical Path Longest delayed path from input to output in combinational logic Determine operating clock frequency Resizing non-critical path transistor (In-Place Optimization) Critical path in Synchronous Sequential logic

Loop Unrolling for Low Power

Retiming Flip- flop insertion to minimize hazard activity moving a flip- flop in a circuit

Exploiting spatial locality for interconnect power reduction Global Local Adder1 Adder2

Balancing maximal time-sharing and fully-parallel implementation A fourth-order parallel-form IIR filter (a) Local assignment (2 global transfers), (b) Non-local assignment (20 global transfers)

Retiming/pipelining for Critical path