1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University.

Slides:



Advertisements
Similar presentations
1 The 2-to-4 decoder is a block which decodes the 2-bit binary inputs and produces four output All but one outputs are zero One output corresponding to.
Advertisements

Architecture-Specific Packing for Virtex-5 FPGAs
Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
n-bit comparator using 1-bit comparator
Speical purpose Encoders/Comparators
A Regularity-Driven Fast Gridless Detailed Router for High Frequency Datapath Designs By Sabyasachi Das (Intel Corporation) Sunil P. Khatri (Univ. of Colorado,
A Routing Technique for Structured Designs which Exploits Regularity Sabyasachi Das Intel Corporation Sunil P. Khatri Univ. of Colorado, Boulder.
1 8-Bit Barrel Shifter Cyrus Thomas Ekemini Essien Kuang-Wai (Kenneth) Tseng Advisor: Dr. David Parent December 8, 2004.
Activity-Sensitive Architectural Power Analysis for the Control Path Paper by Landman and Rabaey University of Massachusetts, Amherst ECE697M Low Power.
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia.
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.
1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University.
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
Chapter 7. Register Transfer and Computer Operations
1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.
1 A Single-supply True Voltage Level Shifter Rajesh Garg Gagandeep Mallarapu Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
UNIVERSITY OF MASSACHUSETTS Dept
1 Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University.
Multiplexer MUX. 2 Multiplexer Multiplexer (Selector)  2 n data inputs,  n control inputs,  1 output  Used to connect 2 n points to a single point.
S. Reda EN1600 SP’08 Design and Implementation of VLSI Systems (EN0160) Lecture 28: Datapath Subsystems 4/4 Prof. Sherief Reda Division of Engineering,
TEAM ADD Cary Converse Mark Galligan Belinda Stuart Chenqian Gan Portable Instruments Company (PICo) Contract Proposal.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.
AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.
Team MUX Adam BurtonMark Colombo David MooreDaniel Toler.
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
Computer Design Basics
A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of.
1 2-Hardware Design Basics of Embedded Processors (cont.)
1 Lecture 9 Demultiplexers Programmable Logic Devices  Programmable logic array (PLA)  Programmable array logic (PAL)
1 CPSC3850 Adders and Simple ALUs Simple Adders Figures 10.1/10.2 Binary half-adder (HA) and full-adder (FA). Digit-set interpretation: {0, 1}
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
ENG241 Digital Design Week #8 Registers and Counters.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
EKT 221 : Chapter 4 Computer Design Basics
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
16 Bit Logarithmic Converter Tinghao Liang and Sara Nadeau.
EKT 221 : Digital 2 Computer Design Basics Date : Lecture : 2 hrs.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Discrete Systems I Lecture 10 Adder and ALU Profs. Koike and Yukita.
5.3 Sequential Circuits - An Introduction to Informatics WMN Lab. Hey-Jin Lee.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
1 Computer Architecture & Assembly Language Spring 2009 Dr. Richard Spillman Lecture 11 – ALU Design.
16-bit barrel shifter A Mini Project Report
Combinational Circuits
16 Bit Barrel Shifter Using D3L Logic
Registers and Counters
Register Transfer Specification And Design
Computer Design Basics
VLSI Presentation 4 – Bit Shifter
DIGITAL 2 : EKT 221 RTL : Microoperations on a Single Register
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University
12/4/2018 A Regularity-Driven Fast Gridless Detailed Router for High Frequency Datapath Designs By Sabyasachi Das (Intel Corporation) Sunil P. Khatri (Univ.
CS 140 Lecture 14 Standard Combinational Modules
Multiplexor A multiplexor is a device that takes a number of data inputs and selects one of them to pass through as its output. The interface of a multiplexor.
CSE 140 Lecture 14 Standard Combinational Modules
Computer Design Basics
1.Introduction to Advanced Digital Design (14 marks)
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Instruction execution and ALU
Presentation transcript:

1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University

2 What is a Shifter? IC block that performs shifting of data signals IC block that performs shifting of data signals Well-known logic architectures Well-known logic architectures Computationally-intensive Computationally-intensive Occupies significant amount of area Occupies significant amount of area Wide usage in DSP, Graphics, Microprocessors Wide usage in DSP, Graphics, Microprocessors

3 Introduction to Barrel Shifter Widely used Shifter architecture Widely used Shifter architecture Exhibits good timing characteristic Exhibits good timing characteristic Area-efficient as well Area-efficient as well Inherent regularity in physical structure Inherent regularity in physical structure

4 Structure of Barrel Shifter Width of input and output data signals = n bits Width of input and output data signals = n bits Width of input shift signal = log 2 n bits Width of input shift signal = log 2 n bits Shifter consists of log 2 n stages Shifter consists of log 2 n stages Each bit of the shift signal controls one stage Each bit of the shift signal controls one stage Each stage handles single shift of 0 or 2 i bits Each stage handles single shift of 0 or 2 i bits

5 Example of a Barrel Shifter (2-stage) S 0 = 1’b1 s0s0 x 2 x 1 s0s0 x 3 x 2 s 0 x 0 1’b0 s0s0 x 1 x 0 s1s1 s1s1 s1s1 s1s1 1’b0 z0z0 z3z3 z2z2 z1z1 S 1 = 1’b0 1’b0 x0x0 x1x1 x2x2 x2x2 x1x1 x0x0

6 Proposed Dual-Merged Stage Merge stages i and j Merge stages i and j 4 bit input data signal 4 bit input data signal 2 bit shift signal 2 bit shift signal i and j do not need to be two consecutive bits of the shift signal i and j do not need to be two consecutive bits of the shift signal x a x b x c x d sisi sjsj z ij

7 Proposed Dual-Merged Stage For the q th BitSlice (Column) of a left shifter For the q th BitSlice (Column) of a left shifter x a = x q x a = x q x b = x (q- 2 i ) x b = x (q- 2 i ) x c = x (q- 2 j ) x c = x (q- 2 j ) x d = x (q- 2 i - 2 j ) x d = x (q- 2 i - 2 j ) For the q th BitSlice (Column) of a right shifter For the q th BitSlice (Column) of a right shifter x a = x q x a = x q x b = x (q+ 2 i ) x b = x (q+ 2 i ) x c = x (q+ 2 j ) x c = x (q+ 2 j ) x c = x (q+ 2 i + 2 j ) x c = x (q+ 2 i + 2 j )

8 Example of a 2-Stage Shifter Using Dual-Merged Stages s0s0 x 3 x 2 x 1 x 0 s 1 z 3 s0s0 s 1 z 2 s0s0 s 1 z 1 s0s0 s 1 z 0 x 2 x 1 x 0 0 x 1 x x S 1 S 0 = 2’b01 x1x1 x2x2 x0x0 1’b0

9 Proposed Triple-Merged Stage Merge stages i, j and k Merge stages i, j and k 8 bit input data signal 8 bit input data signal 3 bit shift signal 3 bit shift signal i, j and k do not need to be three consecutive bits of the shift signal i, j and k do not need to be three consecutive bits of the shift signal x e x f x g x h sisi sjsj z ijk sksk x a x b x c x d

10 Triple-Merged Stages For the q th BitSlice (Column) of a left shifter For the q th BitSlice (Column) of a left shifter x a = x q x a = x q x b = x (q- 2 i ) x b = x (q- 2 i ) x c = x (q- 2 j ) x c = x (q- 2 j ) x d = x (q- 2 k ) x d = x (q- 2 k ) x e = x (q- 2 i - 2 j ) x e = x (q- 2 i - 2 j ) x f = x (q- 2 i - 2 k ) x f = x (q- 2 i - 2 k ) x g = x (q- 2 j - 2 k ) x g = x (q- 2 j - 2 k ) x h = x (q- 2 i - 2 j - 2 k ) x h = x (q- 2 i - 2 j - 2 k )

11 Identification of Mergeable Stages Timing-driven algorithm Timing-driven algorithm Uses arrival-time of the shift signals Uses arrival-time of the shift signals Uses the timing characteristic of the technology library cells Uses the timing characteristic of the technology library cells

12 Algorithm to Find Mergeable Stages Sort shift signals by arrival time (S i, S j, S k are earliest) Sort shift signals by arrival time (S i, S j, S k are earliest) Analyze dual-merged stage, triple-merged stage and unmerged stage to decide whether Analyze dual-merged stage, triple-merged stage and unmerged stage to decide whether To create a triple-merged stage by merging stages i, j and k To create a triple-merged stage by merging stages i, j and k To create a dual-merged stage by merging stages i and j To create a dual-merged stage by merging stages i and j To create an unmerged stage for the stage i To create an unmerged stage for the stage i Continue analysis with the next three stages corresponding to the 3 earliest arriving shift bits. Continue analysis with the next three stages corresponding to the 3 earliest arriving shift bits.

13 Analysis of an Unmerged Stage sisi x a x b sjsj sksk Arr_T(s k ) xcxc xdxd Arr_T(s j ) Arr_T(s i ) T single Compute the impact of an unmerged stage (i) Compute the impact of an unmerged stage (i) T single = Arr_T (si) + Del 1 T single = Arr_T (si) + Del 1

14 Analysis of Two Unmerged Stages sisi x a x b sjsj sksk Arr_T(s k ) xcxc xdxd Arr_T(s j ) Arr_T(s i ) T single2 Compute the impact of 2 cascaded unmerged stages Compute the impact of 2 cascaded unmerged stages Arr_T (s j ) T single2 = Max (T single, Arr_T (s j )) + Del 1

15 Analysis of Three Unmerged Stages sisi x a x b sjsj sksk Arr_T(s k ) xcxc xdxd Arr_T(s j ) Arr_T(s i ) T single3 Compute the impact of 3 cascaded unmerged stages Compute the impact of 3 cascaded unmerged stages Arr_T (s k ) T single3 = Max (T single2, Arr_T (s k )) + Del 1

16 Analysis of a Dual-merged Stage Arr_T(s i ) T dual Compute the impact of a dual-merged stage (i, j) Compute the impact of a dual-merged stage (i, j) T dual = Arr_T (s j ) + Del 2 T dual = Arr_T (s j ) + Del 2 x a x b x c x d sisi sjsj z ij Arr_T(s j )

17 Analysis of a Triple-merged Stage Arr_T(s i ) T triple Compute the impact of a triple-merged stage (i, j, k) Compute the impact of a triple-merged stage (i, j, k) T triple = Arr_T (s k ) + Del 3 T triple = Arr_T (s k ) + Del 3 Arr_T(s j ) x e x f x g x h sisi sjsj z ijk sksk x a x b x c x d Arr_T(s k )

18 Selection of Mergeable Stages If (T triple <) and (T triple <( + /2)) If (T triple <T single3 ) and (T triple <(T dual + Del 1 /2)) Implement triple-merged stage (for stages i, j and k) Implement triple-merged stage (for stages i, j and k) Else if (T dual <) Else if (T dual <T single2 ) Implement dual-merged stage (for stages i and j) Implement dual-merged stage (for stages i and j) Else Else Implement single unmerged stage for stage i Implement single unmerged stage for stage i

19 Results On an average, 10.19% faster than the result of the commercial Datapath Synthesis tool

20 Summary Merge 2 stages to form Dual-Merged stage Merge 2 stages to form Dual-Merged stage Merge 3 stages to form Triple-Merged stage Merge 3 stages to form Triple-Merged stage Timing-driven algorithm to identify mergeable stages Timing-driven algorithm to identify mergeable stages Reduces the number of stages upto one-third (33.33%) Reduces the number of stages upto one-third (33.33%) On an average, 10.19% faster On an average, 10.19% faster

21 Thank you