EE222 Winter 2013 Steve Kang Lecture 5 Interconnects and Clock Signaling Open systems interconnect (
t w L H U= supply voltage scaling
power-delay product
Average noise power Derivation on the next slide
Derivation of Optimal Buffer Insertion
Level Buffering
Twisted wire idea
E=equalizing sw P=precharging sw
Capacitively Driven Wires R Z
Pierre-Emmanuel Gaillardon, Davide Sacchetto, Shashikanth Bobba, Yusuf Leblebici, Giovanni De Micheli EPFL-LSI, Lausanne, Switzerland GMS: Generic Memristive Structure for Non-Volatile FPGAs International VLSI-SOC Oct. 9, 2012
The FPGA Organization CLB SB CLB SB CLB … … … … … … … … … … … … … … … …… … … …… … … … N BLEs I BLE N.. DFF Clk... LUT K Memories (Routing) Memories (Logic) Routing resources inN outW inS inE inW
FPGA: Where to play ? Lin et al., 2007 Area reduction Delay reduction Memory area 14% 8% 43% 35% LogicMemory Interconnects + buffers + MUXs Memory Logic Block (LB) Routing Resources (RR) 20% 80% 40%60% Area Delay Power 8% 35% 43% Routing costs
GMS-based Configuration Node Programming Scheme GMS-based Programming Need to address all the nodes uniquely Use of standalone memory inspired architecture
GMS-based Routing Multiplexer Structure Multi-stage Multiplexers are based on pass-gates –Memristor = Non-Volatile Switches –Replacement of all the Pass-Gates –Non-Volatile Routing MUX –Performance improvement Multi-stage Multiplexers paths are complementarily selected –GMS-based operation !
GMS-based MUX: Electrical Characterization Memristors demonstrate low R ON values –As low as 20Ω –CMOS 45nm R ON is around 4kΩ (Min. n-type transistor from NANGATE) GMS-based MUX introduces memristor in the datapath Pt/TIO 2 /PT HSPICE simulator PTM 45nm CMOS model
GMS-based Configuration Node: Performance Results Compared to baseline Flash technology, Compact solution: Structure –Reduction of the memory FE footprint to only 1 transistor Faster writing operation: memristor technology Lower leakage power: memristor technology Cell Elements Area [F 2 ] Write time [ns] Prog. Energy [pJ] Leakage at 1V [nW] SRAM5T Flash cell2T ReRAM1T2R (Pt/TiO 2 /PT) 2 (Cu/TiO 2 /Pt) Flash vs. ReRAM -x 3x 16.6x 8.3x 0.2 – x 105
FPGA Final View CLB SB CLB SB CLB … … … … … … … … … … … … … … … …… … … … N BLEs I BLE N.. DFF Clk... LUT K Memories (Routing) Memories (Logic) Routing resources inN outW inS inE inW = GMS-based Configuration Nodes= GMS-based MUXs
FPGA architecture : Results Area reduction up to 8% –Slight reduction due to the programming circuits Delay reduction up to 73% –Low On resistance in data paths 100Ω wrt. 4kΩ (Pt/TiO 2 /Pt) Toolflow ABC – T-VPACK – VPR MCNC Benchmarks CLB Leakage power reduction to 10% No leakage current in the MUXs Further improvements are expected
Red indicates inverted data p=1 if inverted
Additional References for Energy Saving C. W. Kim and S. M. Kang, “A low-swing clock edge-triggered flip- flop,” IEEE Journal of Solid-State Circuits, vol.37, no.11, pp , Nov Lowers power consumption through reduced clock voltage swing and at the same time improving throughput by outputting using both edges of the clock signal. K. W. Kim, K. H. Baek, N. Shanbhag, C. L. Liu, and S. M. Kang, “Coupling-driven signal encoding scheme for low-power interface design,” IEEE International Conf. on Computer-Aided Design, San Jose, CA, Nov. 5-9, 2000, pp To avoid the Miller effect, signal encoding is used to reduce transitions in interface circuits
Intel NoC-based 80 Core Programmable Processor
CLOCK- Block diagram of a Phased Locked Loop (PLL) phase freq. detector Charge pump loop filter
Harmonic and Relaxation Oscillator
Voltage Controlled Oscillator
All-Digital PLL Time-to-digital converter Digital loop filter delay cell