University of Michigan, Ann Arbor A Carbon Nanotube Transistor based RISC-V Processor using Pass Transistor Logic Aporva Amarnath, Siying Feng, Subhankar Pal, Tutu Ajayi, Austin Rovinski, Ronald G Dreslinski University of Michigan, Ann Arbor
Introduction End of Dennard scaling Performance stagnation Discovery of new technologies – To continue performance scaling while maintaining power density Spintronics Tunnel FETs Quantum computing Carbon-nanotube field-effect transistor (CNTFET) – More promising CNTFETs Great strides in manufacturability – No longer spaghetti on the wall. Better device scaling and yield High carrying capacity High carrier velocity Exceptional electrostatics p-type and n-type devices Spintronics: https://static.electronicsweekly.com CNTFET: M. Shulaker et al., Computing with Carbon Nanotubes, IEEE Spectrum (2016) Tunnel FETs: https://masumnawaz.wordpress.com/emerging-nano-devices Quantum computing: https://media.licdn.com
Deposit gold on CNTs and peel with tape CNTFETs CNT Growth on quartz Deposit gold on CNTs and peel with tape Transfer CNTs to wafer Similar to Si-FETs Use CNTs as channel medium instead of Si CNTFET Fabrication breakthroughs Stanford : Chemical Vapor Deposition (CVD) method Grow CNTs on quartz and then transfer onto a wafer Short all m-CNTs 100 CNTs/µm IBM: Purify CNTs Attract suspended CNTs into adhesive-filled trenches 20 CNTs/µm University of Wisconsin-Madison: Floating Evaporative Self-Assembly (FESA) method Suspend on water to align and deposit on substrate 50 CNTs/µm CVD: Chemical vapor Deposition FESA: Floating evaporative self-assembly Spend little less time Increase font Increase figure size Create a table for comparison Stanford Method: M. Shulaker et al., “Carbon nanotube computer”. Nature (2013) IBM Method: H. Park et al., “High-density integration of carbon nanotubes via chemical self-assembly”. Nat. Nanotechnol. (2012). Univer. Of Wisconsin-Madison Method: Y. Joo et al., “Dose-Controlled, Floating Evaporative Self-assembly and Alignment of Semiconducting Carbon Nanotubes from Organic Solvents”. Langmuir 2014
Overview Design parameters CNTFET Characterization Methodology Evaluation Conclusion Don't go over the overview Bring back the slide in bold maybe
Design Parameters
CNTFET Design Parameters CNT Pitch (s) CNTFET Width (W) Number of CNTs (NCNT) = 𝑊 𝑠 Width (W) Pitch (s) Put into table Increase and decrease different font color Or divide into 2 slides CNTFET showing the design parameters Source: M. Shulaker et al., Computing with Carbon Nanotubes, IEEE Spectrum (2016)
Energy remains almost same Effect of Pitch Put into table Increase and decrease different font color Or divide into 2 slides Effect of pitch on delay s ↑ => NCNT ↓ Delay ↑ Effect of pitch on energy s ↑ => NCNT ↓ Delay ↑ but Power ↓ Energy remains almost same
Effect of Width Effect of width on delay Effect of width on energy W ↑ => Output load ↑ Delay remains same Put into table Increase and decrease different font color Or divide into 2 slides Effect of width on energy W ↑ => NCNT ↑ Power ↑ => Energy ↑
CNTFET Characterization
FO4 Inverter Characterization Only 1.8x EDP improvement Effect of Voltage on Delay CNTFET < Si-CMOS at low voltage CNTFETs suffer from high contact resistance Effect of Voltage on Energy CNTFET < Si-CMOS at all voltages Effect of Voltage on Energy-Delay Product CNTFET < Si-CMOS at all voltages Low energy and relatively faster devices at 0.4V Pop up a bulletin to say not what we are promised with CNTFETs But, theoretical models promise 5-10x EDP improvements Hence, we need better design techniques
Pass-Transistor Logic using CNTFETs FA Cin Cout A0 B0 A1 B1 A2 B2 A3 B3 Si-FETs with PTL Rapid Vthreshold drop across each additional PTL gate CNTFETs with PTL Very low threshold voltage Low power dissipation Equal strength PFETs and NFETs Still require restoring logic. Operating Voltage (V) Max. no. of full adder stages Restoring logic for cascaded full adders More frequent restoring logic at higher voltages due to larger contact resistance Talk across voltage what happens (atleast 2 points) Say at the start what I'm comparing at the start of the slide The number of restoring buffers reduces by 6x at 0.4 V
Methodology
Experimental Setup Operating Voltage Transistor Model Vthreshold ~ 0.35 V 0.4-0.7 V voltage study Transistor Model Stanford’s VS-CNFET model Design Parameters Tools used: HSpice – Circuit characterization and simulation Design Compiler – Netlist synthesis Implementation 16 nm technology Standard cell libraries Complementary CNTFET gate library (CCNT) Pass transistor logic CNTFET library (PTL-CNT) Restoring logic for PTL-CNT designs Pitch (s) 40 nm Minimum width CNTFET 200 nm Number of CNTs/FET 5 16nm technology
RISC-V V-scale Processor 32-bit, single-issue, in-order, 3-stage pipeline Based on the RISC-V ISA Critical components ALU Implemented using a 32-bit adder Ripple Carry Adder Kogge-Stone Adder Hybrid sparse-tree adders Multiplier 32-bit, 2-stage array-based pipelined multiplier PC Gen. IF DE WB MEM MUL I-Bus Request Response D-Bus EX EX MUL Font size Implementations Si-CMOS CCNT Hybrid (CCNT + PTL-CNT)
Evaluation
Evaluation Evaluation components: Evaluation metrics: Full Adder 32-bit Adder and ALU Multiplier Registers Full Pipeline Evaluation metrics: Delay Energy Energy-delay product (EDP) Metric for evaluation Cut out on full adder, Registers
V-scale ALU (sparse-tree adder) 32-bit Adder and ALU Ripple Carry Adder With periodic restoring buffer insertion Achieves a 1.5-9.8x EDP improvement over Si-CMOS Kogge-Stone Adder Custom restoring buffer insertion due to varying load Achieves a ~2x EDP improvement over Si-CMOS V-scale ALU (sparse-tree adder) Custom for non-RCA chains and periodic for RCA components Achieves a 0.9-3.5x EDP improvement over Si-CMOS Circles with 1st animation Don't talk about limitations here in Kogge-Stone
Improvement of PTL-CNT and CCNT over silicon for the multiplier 32-bit, 2-stage pipelined array-based multiplier Carry-save adders with periodic restoring logic insertion 0.9-1.6x EDP reduction over Si-CMOS Modify animations Talk about limitations here Full pipeline has a hybrid multiplier
Improvement of PTL-CNT and CCNT over silicon for the multiplier Full Pipeline Improvement of PTL-CNT and CCNT over silicon for the multiplier Critical path = ALU + parts of the multiplier CCNT achieves 1.0-2.9x improvement over Si-CMOS pipeline implementation Hybrid design includes PTL-CNT components for the critical path 1.9-5.0x reduction in EDP for the hybrid implementation Greater than ALU and multiplier speed-up due to lower leakage energy than Si-CMOS Take offline for different improvement for ALU and full pipeline: Consider both dynamic and idle energy here
Conclusions Theoretical CNTFET models promise a 5 - 10x EDP improvements over Si-FET-based designs. But, current physical experiment based models only provide ~2x improvements We need circuit and architectural overhauls along with further fabrication improvements to suit CNTFETs while building larger blocks and systems This work shows 5x improvement in EDP of PTL-based CNTFET V-scale core over a Si-CMOS based design bringing us one step closer to the full potential of CNTFETs.
Backup slides
Backup slides Backup slides
Fig: Comparison of PTL-CNT Full adder over CCNT and Si-CMOS 20 transistor PTL – based full adder 2 transistors on critical path De-coupled Sum and Cout Comparison to CCNT and Si-CMOS Very low energy at all voltage ranges Least delay at 0.4V 7 – 19x reduction in EDP from 0.7 to 0.4V Fig: Pass transistor-based full adder Capitalization on all Titles Fig: Comparison of PTL-CNT Full adder over CCNT and Si-CMOS
Fig: Comparison of CCNT D-flip flop over Si-CMOS Registers D-flip flops made of inverters and transmission gates Implementation CCNT Si-CMOS Si-CMOS has better delay than CCNT CCNT D-flip flops consume much lower energy Achieve a 0.9 – 1.8x improvement in EDP over 0.7 – 0.4 V Fig: Comparison of CCNT D-flip flop over Si-CMOS