Forbidden Transition Free Crosstalk Avoidance CODEC Design Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Chengyu Zhu Polaris Microelectronic.

Slides:



Advertisements
Similar presentations
Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder.
Advertisements

Logic Circuits Design presented by Amr Al-Awamry
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Net-Ordering for Optimal Circuit Timing in Nanometer Interconnect Design M. Sc. work by Moiseev Konstantin Supervisors: Dr. Shmuel Wimer, Dr. Avinoam Kolodny.
March 8, 2006“Bus Stuttering”1 Bus Stuttering : An Encoding Technique To Reduce Inductive Noise In Off-Chip Data Transmission DATE 2006 Session 5B: Timing.
Aug 23, ‘021Low-Power Design Minimum Dynamic Power Design of CMOS Circuits by Linear Program Using Reduced Constraint Set Vishwani D. Agrawal Agere Systems,
Aug 31, '02VDAT'02: Low-Power Design1 Minimum Dynamic Power Design of CMOS Circuits by Linear Program Using Reduced Constraint Set Tezaswi Raja, Rutgers.
May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
A Novel Clock Distribution and Dynamic De-skewing Methodology Arjun Kapoor – University of Colorado at Boulder Nikhil Jayakumar – Texas A&M University,
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
1 Generalized Buffering of PTL Logic Stages using Boolean Division and Don’t Cares Rajesh Garg Sunil P. Khatri Department of Electrical and Computer Engineering,
1 Dynamic Power Estimation With Process Variation Modeled as Min–Max Delay Jins Davis Alexander Vishwani D. Agrawal Department of Electrical and Computer.
TH EDA NTHU-CS VLSI/CAD LAB 1 Re-synthesis for Reliability Design Shih-Chieh Chang Department of Computer Science National Tsing Hua University.
Energy Efficient and High Speed On-Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University,
1 A Single-supply True Voltage Level Shifter Rajesh Garg Gagandeep Mallarapu Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
Performance Driven Crosstalk Elimination at Compiler Level TingTing Hwang Department of Computer Science Tsing Hua University, Taiwan.
UNIVERSITY OF MASSACHUSETTS Dept
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Chapter 7 - Part 2 1 CPEN Digital System Design Chapter 7 – Registers and Register Transfers Part 2 – Counters, Register Cells, Buses, & Serial Operations.
Analysis and Avoidance of Cross-talk in on-chip buses Chunjie Duan Ericsson Wireless Communications Anup Tirumala Jasmine Networks Sunil P Khatri University.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Noise and Delay Uncertainty Studies for Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu † and Devendra Vidhani ‡ UCLA Computer Science Department,
Combinational Logic Design
1 Encoding-based Minimization of Inductive Cross-talk for Off-Chip Data Transmission Brock J. LaMeres Agilent Technologies, Inc. Sunil P. Khatri Dept.
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
Multilevel Coding and Iterative Multistage Decoding ELEC 599 Project Presentation Mohammad Jaber Borran Rice University April 21, 2000.
Outline Analysis of Combinational Circuits Signed Number Arithmetic
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
Digital Logic Design Review Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office: Ahmad Almulhem, KFUPM 2010.
1 © 2015 B. Wilkinson Modification date: January 1, 2015 Designing combinational circuits Logic circuits whose outputs are dependent upon the values placed.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Error-Correction &Crosstalk Avoidance in DSM Busses Ketan Patel and Igor Markov University of Michigan Electrical Engineering & Computer Science 2003 ACM.
Optimal digital circuit design Mohammad Sharifkhani.
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
Recent Topics on Programmable Logic Array
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
1 Bus Encoding for Total Power Reduction Using a Leakage-Aware Buffer Configuration 班級:積體所碩一 學生:林欣緯 指導教授:魏凱城 老師 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION.
Proposed Roadmap Tables on STRJ-WG1
Weikang Qian. Outline Intersection Pattern and the Problem Motivation Solution 2.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
On Coding for Real-Time Streaming under Packet Erasures Derek Leong *#, Asma Qureshi *, and Tracey Ho * * California Institute of Technology, Pasadena,
Timo O. Korhonen, HUT Communication Laboratory 1 Convolutional encoding u Convolutional codes are applied in applications that require good performance.
Module 11.  In Module 9, we have been introduced to the concept of combinational logic circuits through the examples of binary adders.  Meanwhile, in.
Bounds on Redundancy in Constrained Delay Arithmetic Coding Ofer ShayevitzEado Meron Meir Feder Ram Zamir Tel Aviv University.
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
Bus Encoding to Prevent Crosstalk Delay Bert Victor and Kurt Keutzer ICCAD 2001.
An O(nm) Time Algorithm for Optimal Buffer Insertion of m Sink Nets Zhuo Li and Weiping Shi {zhuoli, Texas A&M University College Station,
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
FEC decoding algorithm overview VLSI 자동설계연구실 정재헌.
COE 360 Principles of VLSI Design Delay. 2 Definitions.
January 27, Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs ASP-DAC 2006 Session 8C-5: Inductive Issues in Power Grids.
Chap 7. Register Transfers and Datapaths
King Fahd University of Petroleum and Minerals
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
UNIVERSITY OF MASSACHUSETTS Dept
Scalable light field coding using weighted binary images
Presentation transcript:

Forbidden Transition Free Crosstalk Avoidance CODEC Design Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Chengyu Zhu Polaris Microelectronic System, Shanghai, China Sunil P. Khatri Texas A&M University, College Station, TX, USA

Background On-chip bus crosstalk classification Forbidden Transition Free (FTF) crosstalk avoidance code (CAC) CODEC design for FTF code Previous approaches (exponential growth) Our approach (quadratic growth) Experimental results and comparison Conclusions Outline

On-chip Bus Interconnects  As a consequence:  Wire delay depends on state of adjacent wires  Interconnect delay >> gate delay  Global interconnect becomes the performance bottleneck a C C 2 1 C C 2 C 2 1 a a v v C 1 C 2 C 2 1 C 2 C C C 2 1 C C 2 C 2 1 a v a a v a C C 2 1 C C 2 C 2 1 v a a v a C C 2 1 C C 2 C 2 1 v a  C 1 >> C 2  In DSM processes C 1 >> C 2 and hence, inter-wire crosstalk becomes dominant   λ = C 1 / C 2 > 10 for Metal4 in a 0.1  m CMOS process

Bus Classification  4C sequence  101 → 010  3C sequence  101 → 011  2C sequence  100 → 011  1C sequence  001 → 111  0C sequence  000 → 111 confirmed by SPICE simulations  Delay impact of different sequences confirmed by SPICE simulations  0.1um CMOS process or  classified by maximum value of the effective capacitance charged,  Bus can be classified by maximum value of the effective capacitance charged, over all its bits

Crosstalk Avoidance Codes  The strong dependence of delay on crosstalk class has motivated much work on crosstalk avoidance codes (CACs)  Crosstalk Avoidance Codes (CACs) are a class of codes that when transmitted on the bus, certain undesired classes of crosstalk are avoided crosstalk classes eliminated  CACs can be categorized based on the crosstalk classes eliminated  4C/3C/2C/1C –free codes memory requirement  CACs can also be categorized based on the memory requirement  Memory-based / Memoryless CACs bus type  CACs can be categorized based on the bus type  Binary / Multi-level buses Recovered sequence EncoderDecoder Driver Receiver Transmitted Sequence (n-bit) m-bit bus

Crosstalk Avoidance Codes  Memoryless CACs “ forbidden pattern free ” (FPF)  Earliest work by our group for 4C free and 3C free “ forbidden pattern free ” (FPF) codes in 2001  Forbidden transition free (FTF)  Forbidden transition free (FTF) codes by Victor et al (2001)  We focus on 3C-free, FTF codes ad-hoc manner  CODEC design for these and other codes was done in an ad-hoc manner exponential in bus width  Worst-case area of CODEC is exponential in bus width  Key Contribution: Fibonacci Numeral System (FNS)  Key Contribution: This paper reports a systematic 3C-free CODEC design approach which is based on the Fibonacci Numeral System (FNS) quadratically with bus width  Complexity grows quadratically with bus width

FTF CACs  Forbidden transition  Forbidden transition: two adjacent bits transition in opposite directions, i.e., 01  10 FTF code  An FTF code is a set of vectors such that transitions between codewords have no forbidden transitions  e.g., {00, 01, 11}, {000, 001, 100, 101, 111}.  How to design FTF codes ?  All codewords that are compatible with a class-1 codeword form an FTF code with maximum cardinality.  A class-1 codeword is a vector with alternating ‘0’s and ‘1’s.  or are the two 6-bit class-1 codewords  In other words, we avoid ’01’ in d 2j d 2j-1 (even) boundaries and avoid ’10’ in d 2j+1 d 2j (odd) boundaries  Hence, no forbidden transitions are possible two FTF codes with maximum cardinality  There are two FTF codes with maximum cardinality  Derived from the two possible class-1 codewords

Inductive FTF Code Generation  Generating the set of m bit codewords Q m from the m-1 bit set Q m  Suppose class-1 codeword = … , 01, 11  Q 2 = {00, 01, 11}  For even m > 2, take m-1 bit v  Q m-1 00  v = 0xxx => Q m = Q m U {00xxx} 0111  v = 1xxx => Q m = Q m U {01xxx, 11xxx}  For odd m > 2, take m-1 bit v  Q m  v = 0xxx => Q m = Q m U {10xxx, 00xxx} 11  v = 1xxx => Q m = Q m U {11xxx}

FTF Cardinality, Area Overhead difference equation  A difference equation can be derived from the inductive algorithm T(m) = T(m-1) + T(m-2)  Initial conditions: T(2) = 3, T(3)= 5 cardinality  Maximum cardinality of the FTF code is T(m) = f m+2 area overhead  Define area overhead as ratio of additional wires required in the coded bus to uncoded bus size:  Minimum number of bits m required to code n-bit data is: f m+2 ≥ 2 n It is well known that where φ = 1.618, is the golden ratio Therefore or m ≥ 1.44∙ n (for large n)  Overhead lower bound:

Designing An Efficient CODEC 3C-free FTF CODEC  We focus on the 3C-free FTF CODEC designs  Most efficient, robust and popular codes  Existing solutions have some deficiencies  Potential solutions:  Solution 1: Brute-force logic optimization  Solution 2: Bus partitioning  Solution 3: Fibonacci Numeral System based CODEC

Brute-force Logic Optimization   Multi level implementation based on random mapping   Too many permutations, more codewords than needed   Rely purely on logic optimization   CODEC size grows exponentially   Not composable: design are not extendable  Does not work for large busses * S.R. Sridhara et al ”Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip busses”, ICCD, 2004

Bus Partitioning  Small size bus group → small CODEC  Exhaustively search  Exhaustively search for the optimal CODEC for small bus groups  Forbidden transition across the group boundary  Group complement  Bit overlapping  Area overhead goes up  from 44% to 62% or more b(13:16) b(9:12) b(5:8) b(1:4)

Fibonacci Numeral System  Fibonacci Sequence: F = {0, 1, 1, 2, 3, 5, 8, 13, 21 … }  Useful properties:  Golden ratio expression:  So for large m:  Summation identity:  Fibonacci Numeral System (FNS)  Use Fibonacci numbers as base where complete but ambiguous  Fibonacci numeral system is complete but ambiguous  Range : [0, f m+2 -1]  A total of f m+2 values can be represented by m-bit Fibonacci vectors

FTF CODEC Design  Theorem: For a number v in the range [0, f m+2 ), there exists at least one m -bit FTF vector d m d m-1..d 2 d 1 in the Fibonacci numeral system  Proof  There exists at least one Fibonacci vector for v (completeness)  v ∈ S 01 can be replaced by v ∈ S 00 or v ∈ S 10.  v ∈ S 10 can be replaced by v ∈ S 01 or v ∈ S 11.  If this vector is not FTF, an equivalent FTF vector can be generated by replacing the prohibited patterns at the boundaries. 0 fkfk f k+2 f k+1 2f k f k-1 S 00 S 01 S 10 S 11

Encoding Algorithm <f m dmdm rmrm d m-1 r m-1 <f m-2 d m-2 r m-2 <f 4 d3d3 r3r3 <f 2 d2d2 d1d1 <f m-2 d m-1 dmdm fmfm f m-1 d3d3 d2d2 d1d1 f3f3 v v encoderdecoder   Decoder implements   An m-input adder   No multipliers needed   Encoder consists of m-1 stages   Each stage produces one coded bit   Each stage outputs a remainder   The remainder of one stage is the input of the following stage

Encoding Example  Input: v =19  Output: 7-bit FTF vector ⑦ v ≥ 13 → d 7 = 1, r 7 = v-13 = 6 ⑥ r 7 < 13→ d 6 = 0, r 6 = r 7 -0 = 6 ⑤ r 6 ≥ 5 → d 5 = 1, r 5 = r 7 -5 = 1 ④ r 5 < 5 → d 4 = 0, r 4 = r 7 -0 = 1 ③ r 4 < 2 → d 3 = 0, r 3 = r 4 -0 = 1 ② r 3 < 2 → d 2 = 0, r 2 = r 3 -0 = 1 ① d 1 = r 2 =  Output:

Implementation  Multi-stage structure  Systematic  Extendable  Extendable modular design  Easily pipelined  Internal logic  Even-stage  2 adders + 1 MUX  Odd-stage  1 adder + 1 MUX  Combining 2 stages  2 adders + 1 MUX fkfk CMP f k+1 SUB SEL dkdk r k+1 rkrk even stage fkfk SUB SEL dkdk r k+1 rkrk odd stage fkfk SUB f k+1 SUB SEL dkdk r k+1 r k-1 d k-1 combined stage

CODEC Gate Count & Speed  Gate count grows quadraticallly with bus size as opposed to exponentially for a brute-force design  Brute: 12bit  FTF: 12bit, 32bit  Delay also grows quadratically  Pipelined design with special adder is estimated to reach 3GHz speed  Combined with bus partitioning  Combined with bus partitioning, our approach will  Further reduce CODEC size  Also improve CODEC speed  Require a single ground wire between groups

Results – Speed Improvement  Random sequence directly into bus buffer  10mm trace  45x buffer  >1ns delay variation  Random sequence into an FTF encoder  10mm trace  45x buffer  <500ps delay variation

Results – Speed Improvement  Without coding  Edge jitter > 1000ps  With coding  Edge jitter < 500ps Received data w/o coding -2.00E E E E E E E E E Voo1 Voo2 Voo3 Voo4 Voo5

Summary  Showed Forbidden Transition Free code is an efficient CAC existing CODEC designs are not efficient  Showed existing CODEC designs are not efficient  Exponential growth  Exponential growth in area as bus size increases  Proposed a mapping scheme based on Fibonacci Numeral System  Designed efficient CODECs for the FTF code  A deterministic mapping reaches asymptotic lower bound  Area overhead performance reaches asymptotic lower bound  Systematic implementation quadratic growth in both size and delay  Implementation results confirms quadratic growth in both size and delay

Thank you!