Forbidden Transition Free Crosstalk Avoidance CODEC Design Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Chengyu Zhu Polaris Microelectronic System, Shanghai, China Sunil P. Khatri Texas A&M University, College Station, TX, USA
Background On-chip bus crosstalk classification Forbidden Transition Free (FTF) crosstalk avoidance code (CAC) CODEC design for FTF code Previous approaches (exponential growth) Our approach (quadratic growth) Experimental results and comparison Conclusions Outline
On-chip Bus Interconnects As a consequence: Wire delay depends on state of adjacent wires Interconnect delay >> gate delay Global interconnect becomes the performance bottleneck a C C 2 1 C C 2 C 2 1 a a v v C 1 C 2 C 2 1 C 2 C C C 2 1 C C 2 C 2 1 a v a a v a C C 2 1 C C 2 C 2 1 v a a v a C C 2 1 C C 2 C 2 1 v a C 1 >> C 2 In DSM processes C 1 >> C 2 and hence, inter-wire crosstalk becomes dominant λ = C 1 / C 2 > 10 for Metal4 in a 0.1 m CMOS process
Bus Classification 4C sequence 101 → 010 3C sequence 101 → 011 2C sequence 100 → 011 1C sequence 001 → 111 0C sequence 000 → 111 confirmed by SPICE simulations Delay impact of different sequences confirmed by SPICE simulations 0.1um CMOS process or classified by maximum value of the effective capacitance charged, Bus can be classified by maximum value of the effective capacitance charged, over all its bits
Crosstalk Avoidance Codes The strong dependence of delay on crosstalk class has motivated much work on crosstalk avoidance codes (CACs) Crosstalk Avoidance Codes (CACs) are a class of codes that when transmitted on the bus, certain undesired classes of crosstalk are avoided crosstalk classes eliminated CACs can be categorized based on the crosstalk classes eliminated 4C/3C/2C/1C –free codes memory requirement CACs can also be categorized based on the memory requirement Memory-based / Memoryless CACs bus type CACs can be categorized based on the bus type Binary / Multi-level buses Recovered sequence EncoderDecoder Driver Receiver Transmitted Sequence (n-bit) m-bit bus
Crosstalk Avoidance Codes Memoryless CACs “ forbidden pattern free ” (FPF) Earliest work by our group for 4C free and 3C free “ forbidden pattern free ” (FPF) codes in 2001 Forbidden transition free (FTF) Forbidden transition free (FTF) codes by Victor et al (2001) We focus on 3C-free, FTF codes ad-hoc manner CODEC design for these and other codes was done in an ad-hoc manner exponential in bus width Worst-case area of CODEC is exponential in bus width Key Contribution: Fibonacci Numeral System (FNS) Key Contribution: This paper reports a systematic 3C-free CODEC design approach which is based on the Fibonacci Numeral System (FNS) quadratically with bus width Complexity grows quadratically with bus width
FTF CACs Forbidden transition Forbidden transition: two adjacent bits transition in opposite directions, i.e., 01 10 FTF code An FTF code is a set of vectors such that transitions between codewords have no forbidden transitions e.g., {00, 01, 11}, {000, 001, 100, 101, 111}. How to design FTF codes ? All codewords that are compatible with a class-1 codeword form an FTF code with maximum cardinality. A class-1 codeword is a vector with alternating ‘0’s and ‘1’s. or are the two 6-bit class-1 codewords In other words, we avoid ’01’ in d 2j d 2j-1 (even) boundaries and avoid ’10’ in d 2j+1 d 2j (odd) boundaries Hence, no forbidden transitions are possible two FTF codes with maximum cardinality There are two FTF codes with maximum cardinality Derived from the two possible class-1 codewords
Inductive FTF Code Generation Generating the set of m bit codewords Q m from the m-1 bit set Q m Suppose class-1 codeword = … , 01, 11 Q 2 = {00, 01, 11} For even m > 2, take m-1 bit v Q m-1 00 v = 0xxx => Q m = Q m U {00xxx} 0111 v = 1xxx => Q m = Q m U {01xxx, 11xxx} For odd m > 2, take m-1 bit v Q m v = 0xxx => Q m = Q m U {10xxx, 00xxx} 11 v = 1xxx => Q m = Q m U {11xxx}
FTF Cardinality, Area Overhead difference equation A difference equation can be derived from the inductive algorithm T(m) = T(m-1) + T(m-2) Initial conditions: T(2) = 3, T(3)= 5 cardinality Maximum cardinality of the FTF code is T(m) = f m+2 area overhead Define area overhead as ratio of additional wires required in the coded bus to uncoded bus size: Minimum number of bits m required to code n-bit data is: f m+2 ≥ 2 n It is well known that where φ = 1.618, is the golden ratio Therefore or m ≥ 1.44∙ n (for large n) Overhead lower bound:
Designing An Efficient CODEC 3C-free FTF CODEC We focus on the 3C-free FTF CODEC designs Most efficient, robust and popular codes Existing solutions have some deficiencies Potential solutions: Solution 1: Brute-force logic optimization Solution 2: Bus partitioning Solution 3: Fibonacci Numeral System based CODEC
Brute-force Logic Optimization Multi level implementation based on random mapping Too many permutations, more codewords than needed Rely purely on logic optimization CODEC size grows exponentially Not composable: design are not extendable Does not work for large busses * S.R. Sridhara et al ”Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip busses”, ICCD, 2004
Bus Partitioning Small size bus group → small CODEC Exhaustively search Exhaustively search for the optimal CODEC for small bus groups Forbidden transition across the group boundary Group complement Bit overlapping Area overhead goes up from 44% to 62% or more b(13:16) b(9:12) b(5:8) b(1:4)
Fibonacci Numeral System Fibonacci Sequence: F = {0, 1, 1, 2, 3, 5, 8, 13, 21 … } Useful properties: Golden ratio expression: So for large m: Summation identity: Fibonacci Numeral System (FNS) Use Fibonacci numbers as base where complete but ambiguous Fibonacci numeral system is complete but ambiguous Range : [0, f m+2 -1] A total of f m+2 values can be represented by m-bit Fibonacci vectors
FTF CODEC Design Theorem: For a number v in the range [0, f m+2 ), there exists at least one m -bit FTF vector d m d m-1..d 2 d 1 in the Fibonacci numeral system Proof There exists at least one Fibonacci vector for v (completeness) v ∈ S 01 can be replaced by v ∈ S 00 or v ∈ S 10. v ∈ S 10 can be replaced by v ∈ S 01 or v ∈ S 11. If this vector is not FTF, an equivalent FTF vector can be generated by replacing the prohibited patterns at the boundaries. 0 fkfk f k+2 f k+1 2f k f k-1 S 00 S 01 S 10 S 11
Encoding Algorithm <f m dmdm rmrm d m-1 r m-1 <f m-2 d m-2 r m-2 <f 4 d3d3 r3r3 <f 2 d2d2 d1d1 <f m-2 d m-1 dmdm fmfm f m-1 d3d3 d2d2 d1d1 f3f3 v v encoderdecoder Decoder implements An m-input adder No multipliers needed Encoder consists of m-1 stages Each stage produces one coded bit Each stage outputs a remainder The remainder of one stage is the input of the following stage
Encoding Example Input: v =19 Output: 7-bit FTF vector ⑦ v ≥ 13 → d 7 = 1, r 7 = v-13 = 6 ⑥ r 7 < 13→ d 6 = 0, r 6 = r 7 -0 = 6 ⑤ r 6 ≥ 5 → d 5 = 1, r 5 = r 7 -5 = 1 ④ r 5 < 5 → d 4 = 0, r 4 = r 7 -0 = 1 ③ r 4 < 2 → d 3 = 0, r 3 = r 4 -0 = 1 ② r 3 < 2 → d 2 = 0, r 2 = r 3 -0 = 1 ① d 1 = r 2 = Output:
Implementation Multi-stage structure Systematic Extendable Extendable modular design Easily pipelined Internal logic Even-stage 2 adders + 1 MUX Odd-stage 1 adder + 1 MUX Combining 2 stages 2 adders + 1 MUX fkfk CMP f k+1 SUB SEL dkdk r k+1 rkrk even stage fkfk SUB SEL dkdk r k+1 rkrk odd stage fkfk SUB f k+1 SUB SEL dkdk r k+1 r k-1 d k-1 combined stage
CODEC Gate Count & Speed Gate count grows quadraticallly with bus size as opposed to exponentially for a brute-force design Brute: 12bit FTF: 12bit, 32bit Delay also grows quadratically Pipelined design with special adder is estimated to reach 3GHz speed Combined with bus partitioning Combined with bus partitioning, our approach will Further reduce CODEC size Also improve CODEC speed Require a single ground wire between groups
Results – Speed Improvement Random sequence directly into bus buffer 10mm trace 45x buffer >1ns delay variation Random sequence into an FTF encoder 10mm trace 45x buffer <500ps delay variation
Results – Speed Improvement Without coding Edge jitter > 1000ps With coding Edge jitter < 500ps Received data w/o coding -2.00E E E E E E E E E Voo1 Voo2 Voo3 Voo4 Voo5
Summary Showed Forbidden Transition Free code is an efficient CAC existing CODEC designs are not efficient Showed existing CODEC designs are not efficient Exponential growth Exponential growth in area as bus size increases Proposed a mapping scheme based on Fibonacci Numeral System Designed efficient CODECs for the FTF code A deterministic mapping reaches asymptotic lower bound Area overhead performance reaches asymptotic lower bound Systematic implementation quadratic growth in both size and delay Implementation results confirms quadratic growth in both size and delay
Thank you!