ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

10 December 2012 Clive Max Maxfield All Programmable FPGAs, SoCs, and 3D ICs Part V. Advanced Concepts and Future Trends 1.
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Electrónica de Potência © 2008 José Bastos Chapter 2 Power Semiconductor Switches: An Overview 2-1 Chapter 2 Overview of Power Semiconductor Devices Introduction.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 3 Data Transmission.
Sequential Logic Design
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Fundamental Relationship between Node Density and Delay in Wireless Ad Hoc Networks with Unreliable Links Shizhen Zhao, Luoyi Fu, Xinbing Wang Department.
Chapter 14 Feedback and Oscillator Circuits
Some Recent Topics in Physical-Layer System Standards Felix Kapron Standards Engineering Felix Kapron Standards Engineering.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Addition Facts
TDC130: High performance Time to Digital Converter in 130 nm
6-k 43-Gb/s Differential Transimpedance-Limiting Amplifiers with Auto-zero Feedback and High Dynamic Range H. Tran 1, F. Pera 2, D.S. McPherson 1, D. Viorel.
Review 0、introduction 1、what is feedback?
Signal and Timing Parameters I Common Clock – Class 2
JAZiO Incorporated 1 Change No-Change Concept. JAZiO Incorporated 2 Change /No Change Concept Comp A Data In VTR Data In Comp A No Change This band is.
Bus arbitration Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The device that is allowed to initiate.
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
1 Networks for Multi-core Chip A Controversial View Shekhar Borkar Intel Corp.
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
©2004 Brooks/Cole FIGURES FOR CHAPTER 16 SEQUENTIAL CIRCUIT DESIGN Click the mouse to move to the next page. Use the ESC key to exit this chapter. This.
ABC Technology Project
Introduction to Network
Chapter 3 Basic Logic Gates 1.
CMOS Circuits.
Static CMOS Circuits.
CMOS Logic Circuits.
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
Digital Components Introduction Gate Characteristics Logic Families
Feb. 17, 2011 Midterm overview Real life examples of built chips
Flip-Flops and Registers
Digital Techniques Fall 2007 André Deutz, Leiden University
Squares and Square Root WALK. Solve each problem REVIEW:
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
1 Introduction to Network Layer Lesson 09 NETS2150/2850 School of Information Technologies.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Overview Memory definitions Random Access Memory (RAM)
The op-amp Differentiator
Addition 1’s to 20.
25 seconds left…...
Week 1.
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
We will resume in: 25 Minutes.
©2004 Brooks/Cole FIGURES FOR CHAPTER 12 REGISTERS AND COUNTERS Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter.
Interfacing to the Analog World
A SMALL TRUTH TO MAKE LIFE 100%
©2004 Brooks/Cole FIGURES FOR CHAPTER 11 LATCHES AND FLIP-FLOPS Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter.
Lecture 2 Complex Power, Reactive Compensation, Three Phase Dr. Youssef A. Mobarak Department of Electrical Engineering EE 351 POWER SYSTEM ANALYSIS.
ECE 424 – Introduction to VLSI
Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits.
Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Design and Application of Power Optimized High-Speed CMOS Frequency Dividers.
SLIP-2008, Newcastle upon Tyne, UKApril 5, 2008 Parallel vs. Serial On-Chip Communication Rostislav (Reuven) Dobkin Arkadiy Morgenshtein Avinoam Kolodny.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
Async. Seminar FOX Fast On-Chip Interconnect Reuven Dobkin Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab April.
MICRO-MODEM RELIABILITY SOLUTION FOR NOC COMMUNICATIONS Arkadiy Morgenshtein, Evgeny Bolotin, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel.
LOW-LEAKAGE REPEATERS FOR NETWORK-ON-CHIP INTERCONNECTS Arkadiy Morgenshtein, Israel Cidon, Avinoam Kolodny, Ran Ginosar Technion – Israel Institute of.
1 A Single-supply True Voltage Level Shifter Rajesh Garg Gagandeep Mallarapu Sunil P. Khatri Department of Electrical and Computer Engineering, Texas A&M.
Design studies of a low power serial data link for a possible upgrade of the CMS pixel detector Beat Meier, Paul Scherrer Institut PSI TWEPP 2008.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 33: November 20, 2013 Crosstalk.
A 20/30 Gbps CMOS Backplane Driver with Digital Pre-emphasis Paul Westergaard, Timothy Dickson, and Sorin Voinigescu University of Toronto Canada.
Status and Plans for Xilinx Development
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 21, 2012 Crosstalk.
Day 33: November 19, 2014 Crosstalk
Day 31: November 23, 2011 Crosstalk
Presentation transcript:

ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab March 12, 2007

ASYNC07 2 Presentation Outline Why Serial Link? Fast Asynchronous Serial Link Transmitter, Fast LEDR Encoder Receiver, Fast Toggle Circuit Channel, Current Mode Async Signaling Performance Summary

ASYNC07 3 Serial Link Employment Benefits Why Serial Link? Less interconnect area Less routing congestion Less coupling Less power (depends on range) The relative improvement grows with technology scaling. The example on the right refers to: Single gate delay serial link Fully-shielded parallel link with 8 gate delay clock cycle Equal bit-rate Word width N=8 Parallel Link dissipates less power Serial Link dissipates less power Technology Node [nm] Link Length [mm] Parallel Link requires less area Serial Link requires less area

ASYNC07 4 Serial Link Applications P2P long-range interconnect Long range NoC links Pin-limited on-chip module interfaces Presently chips are pin-limited, and that will migrate inside Cross-bar Simpler routing and congestion Communications inside many-core CMPs

ASYNC07 5 Serial Link – Top Structure Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS) Acknowledge per word instead of per bit Wave-pipelining over channel Differential encoding (DS-DE, IEEE ) Low-latency synchronizers

ASYNC07 6 Encoding –Two Phase NRZ LEDR Two Phase Non-Return-to-Zero Level Encoded Dual Rail delta encoding (one transition per bit) Uncoded (B) State bit (S) Phase bit (P)

ASYNC07 7 Transmitter – Fast SR Approach Transition Generator Targeted Speed: One gate delay between bits

ASYNC07 8 Fast Asynchronous Shift Register

ASYNC07 9 Wave-pipelined Control Characteristics The highest speed (the single gate- delay cycle) relates to the pole of the Bode diagram This operating point results in signal degradation along the inverter chain Single Gate Delay Rate

ASYNC07 10 Splitter Architecture The shift-register is partitioned into M shift-registers M slower operation in each shift-register Signal is no longer degraded Single gate-delay operation is localized to output (input) stage only

11 Transmitter Splitter Architecture

ASYNC07 12 Transmitter – SPICE Simulation (65nm node) Simulations done at

ASYNC07 13 Receiver

14 Receiver Splitter Architecture

ASYNC07 15 Toggle Circuit Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle) Novel toggle: Single gate delay operation support Internal and output latches

ASYNC07 16 Channel Four transmission lines (DS-DE) High metal layers utilization Metals 5-8 of 65nm process RLC modeled Careful layout Small crosstalk Small relative variations

ASYNC07 17 SSPPSP LEDR Interconnect Layout

ASYNC07 18 Differential Channel Driver and Receiver Current mode differential low-swing signaling Currents in opposite directions Controllable current return path P / S

ASYNC07 19 Channel Characteristic Impedance Based on data from BPTM. Drawn for constant R, L, C Z depends on F Voltage changes with F Fast changes voltage drifts The drifts bound the operating speed F Z S S

ASYNC07 20 Channel Driver with Adaptive Control Compensates for Z changes Turned on for low frequencies Adaptive Control Inertial Delay

ASYNC07 21 Adaptive Control – Simulation Example SPICE simulation setup: 65nm technology, 4mm range, 67Gbps data rate RLC modeled channel (using Raphael-like three-dimensional field solver) Adaptive control is turned on only for low frequencies

ASYNC07 22 Channel Receiver Amplifier

ASYNC07 23 Performance SPICE simulation show correct operation at target data cycle of 15ps (65nm technology node) Power for 67Gbps 4mm 16-bit word link under 100% utilization: Total power: 150mW Channel differential pair: 18mW Leakage power: 4mW (due to low V T transistors employment) Power reduction Deeper split ( M power reduction) Circuit optimizations Circuit shut down during idle states

ASYNC07 24 In-Die Variations Splitter architecture High-speed operation localized to input and output stages High-speed components design and verification Monte-Carlo simulations (>5 ) 26 PVT Corners Iterative design with legging and sizing for sensitive transistors Asynchronous structure Supports any slow down Minimal time separation between successive bits must be provided!

ASYNC07 25 Summary High speed Serial Link requires special circuits: Fast serializers and de-serializers Wave-pipelined control Splitter architecture: Long word transmission Power reduction On-the-fly LEDR encoding Adaptive control for fast asynchronous signals handling Low crosstalk interconnect layout Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps) The Serial Link preferred over Parallel Link thanks to: Reduced Interconnect and Active area Easier routing, less coupling Reduced power for long on-chip interconnects

ASYNC07 26 The End Thank you