1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

Slides:



Advertisements
Similar presentations
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Advertisements

EE141 Adder Circuits S. Sundar Kumar Iyer.
1 Interconnect Layout Optimization by Simultaneous Steiner Tree Construction and Buffer Insertion Presented By Cesare Ferri Takumi Okamoto, Jason Kong.
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia.
Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 5 - Hierarchical.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Introduction to CMOS VLSI Design Lecture 5: Logical Effort David Harris Harvey Mudd College Spring 2004.
[M2] Traffic Control Group 2 Chun Han Chen Timothy Kwan Tom Bolds Shang Yi Lin Manager Randal Hong Wed. Oct. 27 Overall Project Objective : Dynamic Control.
Logical Effort.
Interconnect Optimizations
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 22: Material Review Prof. Sherief Reda Division of Engineering, Brown University.
1 MICROELETTRONICA Logical Effort and delay Lection 4.
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
 2000 M. CiesielskiPTL Synthesis1 Synthesis for Pass Transistor Logic Maciej Ciesielski Dept. of Electrical & Computer Engineering University of Massachusetts,
EE 447 VLSI Design Lecture 5: Logical Effort. EE 447 VLSI Design 5: Logical Effort2 Outline Introduction Delay in a Logic Gate Multistage Logic Networks.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 5.1 EE4800 CMOS Digital IC Design & Analysis Lecture 5 Logic Effort Zhuo Feng.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Datapath Functional Units Adapted from David Harris of Harvey Mudd College.
Lec 17 : ADDERS ece407/507.
Parallel Prefix Adders A Case Study
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
Lecture 18: Datapath Functional Units
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Review: CMOS Inverter: Dynamic
Power Reduction for FPGA using Multiple Vdd/Vth
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Elmore Delay, Logical Effort
1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,
Arithmetic Building Blocks
Chapter 07 Electronic Analysis of CMOS Logic Gates
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
Introduction  Chip designers face a bewildering array of choices –What is the best circuit topology for a function? –How many stages of logic give least.
Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003
Optimal digital circuit design Mohammad Sharifkhani.
Low Power – High Speed MCML Circuits (II)
Logical Effort and Transistor Sizing Digital designs are usually expected to operate at high frequencies, thus designers often have to choose the fastest.
Lecture 6: Logical Effort
Introduction to CMOS VLSI Design Lecture 5: Logical Effort GRECO-CIn-UFPE Harvey Mudd College Spring 2004.
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
4. Combinational Logic Networks Layout Design Methods 4. 2
Introduction to CMOS VLSI Design Lecture 6: Logical Effort
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
EEC 118 Lecture #7: Designing with Logical Effort Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation.
Jianhua Liu1, Yi Zhu1, Haikun Zhu1, John Lillis2, Chung-Kuan Cheng1
EKT 221 : Chapter 4 Computer Design Basics
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
16 Bit Logarithmic Converter Tinghao Liang and Sara Nadeau.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
1 Revamping Electronic Design Process to Embrace Interconnect Dominance Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.
EE586 VLSI Design Partha Pande School of EECS Washington State University
1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
Lecture 6: Logical Effort
Digital Integrated Circuits A Design Perspective
Lecture 6: Logical Effort
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
Estimating Delays Would be nice to have a “back of the envelope” method for sizing gates for speed Logical Effort Book by Sutherland, Sproull, Harris Chapter.
Lecture 6: Logical Effort
Arithmetic Building Blocks
Arithmetic Circuits.
Presentation transcript:

1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

2 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions

3 Motivation Interconnect dominates gate in present process technology Delay, power, reliability, process variation, etc. Conventional datapath design focuses on logic depth minimization Source: ITRS roadmap 2005

4 Technology Trends Device (ITRS roadmap 2005, Table 40a) Updated Berkeley Predictive Interconnect Model Gate length (nm)9065% decreasing Inter-layer dielectric constant Capacitance of local interconnect (fF/um) % Gate length (nm)9065% decreasing Vdd (V)1.1 - Vth (V) NMOS gate Cap (fF/  m) % NMOS intrinsic delay (ps) %

5 Shifter Taxonomy Functionality Logical Shift: MSBs stuffed with 0’s Arithmetic Shift: Extend original MSB Cyclic Shift (rotation) Bidirectional Shift Circuit Topology Barrel Shifter Logarithmic Shifter

6 Barrel Shifter Pros Every data signal pass only one transmission gate Cons Input capacitance is # transistors = Requires additional decoder for control signals Schematiclayout

7 Logarithmic Shifter Schematic layout Pros # transistors = Cons Long inter-stage wires, especially for cyclic shifters Target of Optimization

8 Cyclic Shifter -- Applications Finite Field Arithmetic In normal basis, squaring is done by cyclic shifting. Encryption ShiftRows operation in Rijndael algorithm. DCT processing unit Address generator Bidirectional shifting Can be implemented as a cyclic shifter with additional masking logic CORDIC algorithm etc …

9 Previous Work Bit interleave Two dimensional folding strategy Gate duplicating Ternary shifting Comparison between barrel shifter and log shifter

10 Cyclic Shifter – Traditional Design MUX-based Shifting & non-shifting paths are intertwined together. Wire load on the critical path is Timing Power When configured to pass through, the non-shifting paths have to be switched as well.

11 Fanout Splitting Shifter Use DEMUXes instead of MUXes Power When shifting, non-shifting paths are at rest. Shifting & non-shifting paths are now decoupled. Wire load on the longest path is Timing

12 Example Right rotate 5 bits Red lines are signal lines Green lines are quiet lines

13 Dynamic Power Consumption Dynamic Power Switching Probability MUX based designDEMUX based design SP= 1/4 SP= 3/16 SP = Switching Probability

14 Gate Complexity Re-factoring design No extra complexity at gate level, both are MUX-based DEMUX-based

15 Duality NAND gates network NOR gates network Duality provides flexibility for low level implementation NAND gates are good for static CMOS. NOR gates are good for dynamic circuits.

16 Cell Permutation Datapath usually assumes bit-slice structure The cell order of the input/output stages must be fixed However, the cells in the intermediate stages are free to permute. fixed free

17 Problem Statement Given A N-bit rotator Fixed linear order of the input/output stages Find An optimal permutation scheme of the intermediate stages such that the longest path is minimized (or, the total wire length s.t. delay constraint).

18 ILP Formulation Introduce a set of binary decision variables if and only if logic cell is at physical location on level The solution space is fully defined by constraints

19 ILP Formulation (cont’) Minimum delay formulation Minimum power formulation Which can be expanded into objective

20 ILP Formulation (cont’) Represent the length of a single wire segment Formulating absolute operation Psuedo-linear constraints discarded because we ’ re trying to minimize

21 Complexity Minimum total wire length formulation The case of one level of free cells is a minimum weight bipartite matching problem For the case of two or more levels of free cells, optimal polynomial algorithm is unknown Hardness of minimum delay formulation un- established. Logic index Physical location

22 ILP Complexity The ILP formulation does not scale well Both #integer variables and #constraints are CPLEX uses on branch & bound – exponential growth Sliding window scheme Only cells in the window are allowed to permute Consists of multiple passes; terminate when there is no improvement between passes WW = 8 columns WH = 3 rows HS = 4 columns VS = 1 column

23 Power & Delay Evaluation Overall Flow

24 Optimal solution for 8-bit case A global optimal solution

25 16-bit and 32-bit cases 16-bit, global optimal solution 32-bit, suboptimal solution by sliding window method

26 Results

27 Power-Delay Tradeoff 8-bit16-bit 32-bit64-bit Given T max constraint, optimize T total 8-bit & 16-bit are global optimum by cplex 32-bit & 64-bit are suboptimal result by sliding window scheme

28 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions and future work

29 Conclusions & Future Work We have proposed Fanout-splitting design ILP based layout optimization Future directions Extend the fanout splitting idea and ILP formulation to ternary shifter Try alternative hierarchical approach to tackle the ILP complexity issue

30 Thank you! The End

31 Derivation of switching probability Gate level implementation of MUX and DEMUX Truth table For DEMUX For MUX

32 Evaluating interconnect effect Based on logical effort model Technology independent Easy to incorporate interconnect effect Gate delay Logical effort, only depends on gate type Electrical effort, depends on load cap Parasitic delay, only depends on gate type Wire load is integrated into h Electrical effort contributed by wire per column spanned Wire length normalized to cell width

33 Deciding Evaluate for a set of technology nodes It is safe to assume h w =1