1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

Slides:

Advertisements

Similar presentations

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Advertisements

EE141 Adder Circuits S. Sundar Kumar Iyer.

Improving Placement under the Constant Delay Model Kolja Sulimma 1, Ingmar Neumann 1, Lukas Van Ginneken 2, Wolfgang Kunz 1 1 EE and IT Department University.

1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.

Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 5 - Hierarchical.

Design of Variable Input Delay Gates for Low Dynamic Power Circuits

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

Power-Aware Placement

Introduction to CMOS VLSI Design Lecture 5: Logical Effort David Harris Harvey Mudd College Spring 2004.

[M2] Traffic Control Group 2 Chun Han Chen Timothy Kwan Tom Bolds Shang Yi Lin Manager Randal Hong Wed. Oct. 27 Overall Project Objective : Dynamic Control.

Logical Effort.

Interconnect Optimizations

1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.

S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.

1 MICROELETTRONICA Logical Effort and delay Lection 4.

Introduction to CMOS VLSI Design Lecture 5: Logical Effort

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

 2000 M. CiesielskiPTL Synthesis1 Synthesis for Pass Transistor Logic Maciej Ciesielski Dept. of Electrical & Computer Engineering University of Massachusetts,

EE 447 VLSI Design Lecture 5: Logical Effort. EE 447 VLSI Design 5: Logical Effort2 Outline Introduction Delay in a Logic Gate Multistage Logic Networks.

Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 5.1 EE4800 CMOS Digital IC Design & Analysis Lecture 5 Logic Effort Zhuo Feng.

Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.

1 Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization TingTing Hwang Tsing Hua University, Hsin-Chu.

1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

Parallel Prefix Adders A Case Study

Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2

A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.

Lecture 18: Datapath Functional Units

Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.

ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.

Review: CMOS Inverter: Dynamic

Power Reduction for FPGA using Multiple Vdd/Vth

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Elmore Delay, Logical Effort

1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,

Arithmetic Building Blocks

Chapter 07 Electronic Analysis of CMOS Logic Gates

New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.

Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.

Introduction  Chip designers face a bewildering array of choices –What is the best circuit topology for a function? –How many stages of logic give least.

Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003

Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang

Optimal digital circuit design Mohammad Sharifkhani.

Low Power – High Speed MCML Circuits (II)

Logical Effort and Transistor Sizing Digital designs are usually expected to operate at high frequencies, thus designers often have to choose the fastest.

Lecture 6: Logical Effort

Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.

Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.

4. Combinational Logic Networks Layout Design Methods 4. 2

Introduction to CMOS VLSI Design Lecture 6: Logical Effort

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Jianhua Liu1, Yi Zhu1, Haikun Zhu1, John Lillis2, Chung-Kuan Cheng1

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

16 Bit Logarithmic Converter Tinghao Liang and Sara Nadeau.

EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.

EE141 Project: 32x32 SRAM Abhinav Gupta, Glen Wong Optimization goals: Balance between area and performance Minimize area without sacrificing performance.

1 Revamping Electronic Design Process to Embrace Interconnect Dominance Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.

Lecture 6: Logical Effort

Digital Integrated Circuits A Design Perspective

Topics Circuit design for FPGAs: Logic elements. Interconnect.

Lecture 6: Logical Effort

Introduction to CMOS VLSI Design Lecture 5: Logical Effort

Estimating Delays Would be nice to have a “back of the envelope” method for sizing gates for speed Logical Effort Book by Sutherland, Sproull, Harris Chapter.

Lecture 6: Logical Effort

Arithmetic Building Blocks

Arithmetic Circuits.

Presentation transcript:

1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

2 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions

3 Motivation Interconnect dominates gate in present process technology Delay, power, reliability, process variation, etc. Conventional datapath design focuses on logic depth minimization Source: ITRS roadmap 2005

4 Technology Trends Device (ITRS roadmap 2005, Table 40a) Updated Berkeley Predictive Interconnect Model Gate length (nm)9065% decreasing Inter-layer dielectric constant Capacitance of local interconnect (fF/um) % Gate length (nm)9065% decreasing Vdd (V)1.1 - Vth (V) NMOS gate Cap (fF/  m) % NMOS intrinsic delay (ps) %

5 Shifter Taxonomy Functionality Logical Shift: MSBs stuffed with 0’s Arithmetic Shift: Extend original MSB Cyclic Shift (rotation) Bidirectional Shift Circuit Topology Barrel Shifter Logarithmic Shifter

6 Barrel Shifter Pros Every data signal pass only one transmission gate Cons Input capacitance is # transistors = Requires additional decoder for control signals Schematiclayout

7 Logarithmic Shifter Schematic layout Pros # transistors = Cons Long inter-stage wires, especially for cyclic shifters Target of Optimization

8 Cyclic Shifter -- Applications Finite Field Arithmetic In normal basis, squaring is done by cyclic shifting. Encryption ShiftRows operation in Rijndael algorithm. DCT processing unit Address generator Bidirectional shifting Can be implemented as a cyclic shifter with additional masking logic CORDIC algorithm etc …

9 Previous Work Bit interleave Two dimensional folding strategy Gate duplicating Ternary shifting Comparison between barrel shifter and log shifter

10 Cyclic Shifter – Traditional Design MUX-based Shifting & non-shifting paths are intertwined together. Wire load on the critical path is Timing Power When configured to pass through, the non-shifting paths have to be switched as well.

11 Fanout Splitting Shifter Use DEMUXes instead of MUXes Power When shifting, non-shifting paths are at rest. Shifting & non-shifting paths are now decoupled. Wire load on the longest path is Timing

12 Example Right rotate 5 bits Red lines are signal lines Green lines are quiet lines

13 Dynamic Power Consumption Dynamic Power Switching Probability MUX based designDEMUX based design SP= 1/4 SP= 3/16 SP = Switching Probability

14 Gate Complexity Re-factoring design No extra complexity at gate level, both are MUX-based DEMUX-based

15 Duality NAND gates network NOR gates network Duality provides flexibility for low level implementation NAND gates are good for static CMOS. NOR gates are good for dynamic circuits.

16 Cell Permutation Datapath usually assumes bit-slice structure The cell order of the input/output stages must be fixed However, the cells in the intermediate stages are free to permute. fixed free

17 Problem Statement Given A N-bit rotator Fixed linear order of the input/output stages Find An optimal permutation scheme of the intermediate stages such that the longest path is minimized (or, the total wire length s.t. delay constraint).

18 ILP Formulation Introduce a set of binary decision variables if and only if logic cell is at physical location on level The solution space is fully defined by constraints

19 ILP Formulation (cont’) Minimum delay formulation Minimum power formulation Which can be expanded into objective

20 ILP Formulation (cont’) Represent the length of a single wire segment Formulating absolute operation Psuedo-linear constraints discarded because we ’ re trying to minimize

21 Complexity Minimum total wire length formulation The case of one level of free cells is a minimum weight bipartite matching problem For the case of two or more levels of free cells, optimal polynomial algorithm is unknown Hardness of minimum delay formulation un- established. Logic index Physical location

22 ILP Complexity The ILP formulation does not scale well Both #integer variables and #constraints are CPLEX uses on branch & bound – exponential growth Sliding window scheme Only cells in the window are allowed to permute Consists of multiple passes; terminate when there is no improvement between passes WW = 8 columns WH = 3 rows HS = 4 columns VS = 1 column

23 Power & Delay Evaluation Overall Flow

24 Optimal solution for 8-bit case A global optimal solution

25 16-bit and 32-bit cases 16-bit, global optimal solution 32-bit, suboptimal solution by sliding window method

26 Analysis Results

27 Power-Delay Tradeoff 8-bit16-bit 32-bit64-bit Given T max constraint, optimize T total 8-bit & 16-bit are global optimum by cplex 32-bit & 64-bit are suboptimal result by sliding window scheme

28 Implementation Results Implementation methodology Standard cell based design using relative placement (by synopsys Physical Compiler) Routing and timing analysis in synopsys Astro Power estimation by synopsys PrimePower Control signals manually buffered following FO4 rule TSMC 90nm technology Three types of designs investigated Mux shifter using NAND2X2 gates Mux shifter using MX2X2 gates Demux shifter using NAND2X2 gates

29 Implementation Results – Cont’ Delay Componentsmuxsft64 A (NAND2X2) muxsft64 B (MX2X2) Demuxsft64 C (NAND2X2) Imp. (C/A) GateWire loadXtalk Global critical path delay (ns) √ % √√ % √√√ % Critical data- in/data-out path delay (ns) * √ % √√ % √√√ % Power 500MCell Power5.165e e e-040.7% Net Power1.112e e e % Total1.629e e e-038.2% 64-bit results Most improvement comes from the interconnect * For mux shifters, this is d[0]->z[0] while for demux shifter it is d[0]->z[1]

30 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions and future work

31 Conclusions & Future Work We have proposed Fanout-splitting design ILP based layout optimization Future directions Extend the fanout splitting idea and ILP formulation to ternary shifter Try alternative hierarchical approach to tackle the ILP complexity issue

32 Thank you! The End

33 Derivation of switching probability Gate level implementation of MUX and DEMUX Truth table For DEMUX For MUX

34 Evaluating interconnect effect Based on logical effort model Technology independent Easy to incorporate interconnect effect Gate delay Logical effort, only depends on gate type Electrical effort, depends on load cap Parasitic delay, only depends on gate type Wire load is integrated into h Electrical effort contributed by wire per column spanned Wire length normalized to cell width

35 Deciding Evaluate for a set of technology nodes It is safe to assume h w =1