Energy Efficient and High Speed On-Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University,

Slides:



Advertisements
Similar presentations
Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder.
Advertisements

Topics Electrical properties of static combinational gates:
Logic Circuits Design presented by Amr Al-Awamry
Slides based on Kewal Saluja
CHALLENGES IN EMBEDDED MEMORY DESIGN AND TEST History and Trends In Embedded System Memory.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
March 8, 2006“Bus Stuttering”1 Bus Stuttering : An Encoding Technique To Reduce Inductive Noise In Off-Chip Data Transmission DATE 2006 Session 5B: Timing.
11/01/05ELEC / Lecture 171 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
04/11/02EECS 3121 Lecture 26: Interconnect Modeling, continued EECS 312 Reading: 8.2.2, (text) HW 8 is due now!
The Wire Scaling has seen wire delays become a major concern whereas in previous technology nodes they were not even a secondary design issue. Wire parasitic.
I.Ben Dhaou and H.Tenhunen. Royal Institute of Technology, Dept. Of Elect., ESDLab, SE Kista, Sweden Energy Efficient High Speed On-Chip Signaling.
מודלים של חיבורי ביניים מודלים חשמליים של חיבורי ביניים עבור מעגלי VLSI פרופ ’ יוסי שחם המחלקה לאלקטרוניקה פיזיקלית, אוניברסיטת ת ” א.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 13 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Pseudo-nMOS, Dynamic CMOS and Domino.
October 5, 2005“Broadband Impedance Matching”1 Broadband Impedance Matching for Inductive Interconnect in VLSI Packages ICCD 2005 Authors: Brock J. LaMeres,
Analysis and Avoidance of Cross-talk in on-chip buses Chunjie Duan Ericsson Wireless Communications Anup Tirumala Jasmine Networks Sunil P Khatri University.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 8 - Comb. Logic.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Encoding-based Minimization of Inductive Cross-talk for Off-Chip Data Transmission Brock J. LaMeres Agilent Technologies, Inc. Sunil P. Khatri Dept.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Radio-Frequency Effects in Integrated Circuits
A Methodology for Interconnect Dimension Determination By: Jeff Cobb Rajesh Garg Sunil P Khatri Department of Electrical and Computer Engineering, Texas.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Review: CMOS Inverter: Dynamic
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
On-chip power distribution in deep submicron technologies
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
MICAS Department of Electrical Engineering (ESAT) AID–EMC: Low Emission Digital Circuit Design Junfeng Zhou Wim Dehaene Update of the “Digital EMC project”
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit October 27th, 2005 AID–EMC: Low Emission Digital Circuit Design.
Washington State University
Modern VLSI Design 2e: Chapter 3 Copyright  1998 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Low Power – High Speed MCML Circuits (II)
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
Recent Topics on Programmable Logic Array
Forbidden Transition Free Crosstalk Avoidance CODEC Design Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Chengyu Zhu Polaris Microelectronic.
1 Bus Encoding for Total Power Reduction Using a Leakage-Aware Buffer Configuration 班級:積體所碩一 學生:林欣緯 指導教授:魏凱城 老師 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION.
VLSI Design Lecture 5: Logic Gates Mohammad Arjomand CE Department Sharif Univ. of Tech. Adapted with modifications from Wayne Wolf’s lecture notes.
Inverter Chapter 5 The Inverter April 10, Inverter Objective of This Chapter  Use Inverter to know basic CMOS Circuits Operations  Watch for performance.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Wire delay. n Buffer insertion. n Crosstalk. n Inductive interconnect. n Switch logic.
Basics of Energy & Power Dissipation
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Bi-CMOS Prakash B.
By Nasir Mahmood.  The NoC solution brings a networking method to on-chip communication.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
Electrical Characteristics of Logic Gates Gate Characteristics Last Mod: January 2008  Paul R. Godin.
7-1 Integrated Microsystems Lab. EE372 VLSI SYSTEM DESIGNE. Yoon MOS Inverter — All essential features of MOS logic gates DC and transient characteristics.
CS203 – Advanced Computer Architecture
High Gain Transimpedance Amplifier with Current Mirror Load By: Mohamed Atef Electrical Engineering Department Assiut University Assiut, Egypt.
Damu, 2008EGE535 Fall 08, Lecture 51 EGE535 Low Power VLSI Design Lecture #5 & 6 CMOS Inverter.
January 27, Controlling Inductive Cross-talk and Power in Off-chip Buses using CODECs ASP-DAC 2006 Session 8C-5: Inductive Issues in Power Grids.
Circuits and Interconnects In Aggressively Scaled CMOS
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Guihai Yan, Yinhe Han, Xiaowei Li, and Hui Liu
Energy Efficient Power Distribution on Many-Core SoC
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Presentation transcript:

Energy Efficient and High Speed On-Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University, College Station, TX, USA

03/13/ Motivation Trends in VLSI design –Shrinking feature size Deep SubMicron (DSM) and Very Deep SubMicron (VDSM) processes –Scaling down supply voltage –Increasing die-size (e.g. SoC, NoC, CMP) Impacts Smaller gate delay (high speed logic) Lower switching power per gate High complexity (>billion gates) χIncreasing power consumption χHigher leakage current (standby power) χReduced noise margin χIncreasing interconnect delay Interconnect delay >> gate delay Global interconnect becomes the performance bottleneck

03/13/ On-chip Bus Interconnects The impact of DSM / VDSM: –W↓, P↓ – L↑, T↑ to avoid quadratic increase in resistance of the wire: Inter-wire capacitance C I is much greater than substrate capacitance C L, → crosstalk becomes dominant – λ = C I / C L > 10 for metal 4 in a  m CMOS process CLCL CLCL CLCL T W CICI CICI CICI CICI CLCL CLCL CLCL Earlier process P DSM process

03/13/ Ternary Bus and Mapping Advantage of a ternary bus –low voltage step: V dd/2 instead of V dd We propose a bit-to-bit binary-ternary mapping scheme –Each binary bit is mapped directly to a line on the ternary bus. –A binary 0 is mapped to a middle value on the ternary bus. i.e. 0 b ->0 t. –A binary 1 is mapped to either high or low value on the ternary bus. i.e. 1 b  + or 1 b  -. Disadvantage: lower bit density (1 bit/line vs 1.58 bit/line for true ternary bus) Advantages: direct mapping and flexible polarity –Ternary to binary conversion is very slow and complex –Flexible polarity results in low crosstalk. e.g., the ternary vectors +0+, -0-, +0- and -0+ all represent the same binary value 101. Each ternary value is represented by the polarity P j and the magnitude D j Ternary driver truth table DjDj PjPj TjTj Vout 0X0V0V0 10-V- 11+V+V+

03/13/ Crosstalk in a Multi-valued Bus Define the effective crosstalk as –where  j,k = sgn(  j )  V k is the normalized voltage change, and. NOL is the number of logic levels Delay can be approximated as –for  Energy consumption is –when  >> 1, For ternary bus, V step = V dd /2, we know –max(X eff,j )= 8 –min(X eff,j )=0 Bus speed/power is highly data pattern dependent! Table 1. Examples of Total Crosstalk V t-1 VtVt X eff

03/13/ A Low Power, High Speed 4X Ternary Bus Using direct bit-to-bit mapping Coding rules: –Rule #1: A direct - ↔ + transition is prohibited. –Rule #2: A 1 b  0 b is mapped as - t  0 t or + t  0 t depending only on the current polarity of the 1 b. –Rule #3: For a 0 b  1 b transition on b j, if b j-1 is transitioning, P j is coded so both lines transition in the same direction. –Rule #4: For a 0 b  1 b transition on b j, if b j-1 is not transitioning and and b j+1 is transitioning from 1 to 0, P j is coded so that the j th and (j+1) th line transition in the same direction. –Rule #5: For a 0 b  1 b transition on b j, if no transition on either neighbor, P j is coded so {P j = P j-1 or P j = P j+1 } with P j = P j-1 having the higher priority. The 1 st rule guarantees max(X eff,j ) = 4, therefore a 2X speed up from a conventional binary bus The other rules are designed to lower the probability of high value X eff,j ’s occurrence on the bus Identical encoder/decoder logic for each bit An example of 4X ternary sequences BinaryTernaryX eff —

03/13/ An Even Faster 3X Ternary Bus Partition the bus into 5-bit groups Insert shield wire between groups Apply the same rules for 4X bus It can be proven that such a configuration guarantees max(X eff ) = 3 –Additional 33% speed up over 4X ternary bus At the cost of 20% additional wires 4X bus encoder and driver circuit 3X bus encoder and driver circuit

03/13/ Circuit Implementations Encoder implemented based on the 5 rules Decoder is extremely simple (implemented with two 2-input gates) Ternary driver and receiver can be implemented in current or voltage mode –Current mode is more power hungry (static current) –Voltage mode requires a low impedance Vdd/2 supply M1 M3 M2 V dd V /2 bus w xtalk I r e f V dd I ref 2 I r e f out2 1 d I-receiver ENCd in M3 M4 M5 M1 M2 I-driver to D j+1 j-1 C L C I R bus to D j+1 j-1 C L C I R ENC din V dd V ref1 V 2 V dd V V ref2 V 1 d out V-driver V-receiver shared V-ref (B)Voltage mode (A)current mode

03/13/ Experimental Results Crosstalk distribution and normalized energy consumption comparison (code ternary vs. half-swing binary) Bus Size 0X1X2X3X4XEF (x10 4 ) % 5B T B T B T B T The power saving comes from the redistribution of the X eff –More transitions are pushed towards lower X eff The average power saving is ~27% 4X: ternary bus using 4X code; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus

03/13/ Experimental Results The proposed 4X and 3X busses are advantageous over other bus coding schemes. EF: Normalized total energy PDP: power delay product Bus type4XT3XTSBHBRPTT EF ( x10 4 ) Delay4x3x4x 8x PDP ( x10 5 ) Pwr saving (%) PDP gain (%) Bus Area XT: ternary bus using 4X code; 3XT: ternary bus with 3X code; SB: binary bus with shielding; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus Bus performance comparison

03/13/ Experimental Results Eye diagrams for uncoded an coded busses (10mm)

03/13/ Summary Crosstalk classification was extended to multi-valued buses We proposed a direct bit-to-bit binary-ternary mapping scheme which results in a simple CODEC design. We proposed a 4X coding scheme that allows us to double the speed of a conventional ternary bus and save energy. We proposed a coding scheme (3X coding) to attain an additional 33% speed gain at the cost of 20% area overhead. We designed and implemented the CODEC and ternary driver/receiver. Our experimental results show significant power saving (27%) and speed gain (2X or more) over other schemes