Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

Similar presentations


Presentation on theme: "1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College."— Presentation transcript:

1 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

2 2 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions

3 3 Motivation Interconnect dominates gate in present process technology Delay, power, reliability, process variation, etc. Conventional datapath design focuses on logic depth minimization Source: ITRS roadmap 2005

4 4 Technology Trends Device (ITRS roadmap 2005, Table 40a) Updated Berkeley Predictive Interconnect Model Gate length (nm)9065% decreasing Inter-layer dielectric constant 2.82.6- Capacitance of local interconnect (fF/um) 0.1860.1737.0% Gate length (nm)9065% decreasing Vdd (V)1.1 - Vth (V)0.1950.165- NMOS gate Cap (fF/  m) 0.5730.46918.2% NMOS intrinsic delay (ps)0.8700.64026.4%

5 5 Shifter Taxonomy Functionality Logical Shift: MSBs stuffed with 0’s Arithmetic Shift: Extend original MSB Cyclic Shift (rotation) Bidirectional Shift Circuit Topology Barrel Shifter Logarithmic Shifter

6 6 Barrel Shifter Pros Every data signal pass only one transmission gate Cons Input capacitance is # transistors = Requires additional decoder for control signals Schematiclayout

7 7 Logarithmic Shifter Schematic layout Pros # transistors = Cons Long inter-stage wires, especially for cyclic shifters Target of Optimization

8 8 Cyclic Shifter -- Applications Finite Field Arithmetic In normal basis, squaring is done by cyclic shifting. Encryption ShiftRows operation in Rijndael algorithm. DCT processing unit Address generator Bidirectional shifting Can be implemented as a cyclic shifter with additional masking logic CORDIC algorithm etc …

9 9 Previous Work Bit interleave Two dimensional folding strategy Gate duplicating Ternary shifting Comparison between barrel shifter and log shifter

10 10 Cyclic Shifter – Traditional Design MUX-based Shifting & non-shifting paths are intertwined together. Wire load on the critical path is Timing Power When configured to pass through, the non-shifting paths have to be switched as well.

11 11 Fanout Splitting Shifter Use DEMUXes instead of MUXes Power When shifting, non-shifting paths are at rest. Shifting & non-shifting paths are now decoupled. Wire load on the longest path is Timing

12 12 Example Right rotate 5 bits Red lines are signal lines Green lines are quiet lines

13 13 Dynamic Power Consumption Dynamic Power Switching Probability MUX based designDEMUX based design SP= 1/4 SP= 3/16 SP = Switching Probability

14 14 Gate Complexity Re-factoring design No extra complexity at gate level, both are MUX-based DEMUX-based

15 15 Duality NAND gates network NOR gates network Duality provides flexibility for low level implementation NAND gates are good for static CMOS. NOR gates are good for dynamic circuits.

16 16 Cell Permutation Datapath usually assumes bit-slice structure The cell order of the input/output stages must be fixed However, the cells in the intermediate stages are free to permute. fixed free

17 17 Problem Statement Given A N-bit rotator Fixed linear order of the input/output stages Find An optimal permutation scheme of the intermediate stages such that the longest path is minimized (or, the total wire length s.t. delay constraint).

18 18 ILP Formulation Introduce a set of binary decision variables if and only if logic cell is at physical location on level The solution space is fully defined by constraints

19 19 ILP Formulation (cont’) Minimum delay formulation Minimum power formulation Which can be expanded into objective

20 20 ILP Formulation (cont’) Represent the length of a single wire segment Formulating absolute operation Psuedo-linear constraints discarded because we ’ re trying to minimize

21 21 Complexity Minimum total wire length formulation The case of one level of free cells is a minimum weight bipartite matching problem For the case of two or more levels of free cells, optimal polynomial algorithm is unknown Hardness of minimum delay formulation un- established. Logic index Physical location

22 22 ILP Complexity The ILP formulation does not scale well Both #integer variables and #constraints are CPLEX uses on branch & bound – exponential growth Sliding window scheme Only cells in the window are allowed to permute Consists of multiple passes; terminate when there is no improvement between passes WW = 8 columns WH = 3 rows HS = 4 columns VS = 1 column

23 23 Power & Delay Evaluation Overall Flow

24 24 Optimal solution for 8-bit case A global optimal solution

25 25 16-bit and 32-bit cases 16-bit, global optimal solution 32-bit, suboptimal solution by sliding window method

26 26 Results

27 27 Power-Delay Tradeoff 8-bit16-bit 32-bit64-bit Given T max constraint, optimize T total 8-bit & 16-bit are global optimum by cplex 32-bit & 64-bit are suboptimal result by sliding window scheme

28 28 Outline Motivation Previous Work Approaches Fanout-Splitting Cell order optimization by ILP Conclusions and future work

29 29 Conclusions & Future Work We have proposed Fanout-splitting design ILP based layout optimization Future directions Extend the fanout splitting idea and ILP formulation to ternary shifter Try alternative hierarchical approach to tackle the ILP complexity issue 313029282726252423222120191817161514131211109876543210 313029282726252423222120191817161514131211109876543210

30 30 Thank you! The End

31 31 Derivation of switching probability Gate level implementation of MUX and DEMUX Truth table For DEMUX For MUX

32 32 Evaluating interconnect effect Based on logical effort model Technology independent Easy to incorporate interconnect effect Gate delay Logical effort, only depends on gate type Electrical effort, depends on load cap Parasitic delay, only depends on gate type Wire load is integrated into h Electrical effort contributed by wire per column spanned Wire length normalized to cell width

33 33 Deciding Evaluate for a set of technology nodes It is safe to assume h w =1


Download ppt "1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College."

Similar presentations


Ads by Google