Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Logic cells and interconnect strategies for nanoscale reconfigurable computing fabrics I. O'Connor, K. Jabeur, D. Navarro, N. Yakymets Lyon Institute of Nanotechnology, Lyon, France P.E. Gaillardon, M.H. Ben Jamaa, F. Clermidy CEA-LETI-MINATEC, Grenoble, France nano.grain
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Outline Some technology fabric considerations Logic cells –Reduced-complexity dynamic standard cells –Reconfigurable logic cells and design methods Interconnect strategies –Matrix topologies –Island-style architecture –Metrics and comparisons Conclusions
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Explaining the jargon nanoscale computing fabric (nanoFabric): –nanoFabric: an array of connected nanoscale logic blocks (nanoBlocks) –nanoBlock: a circuit block containing programmable devices to compute boolean logic functions and means to route data usually hybrid approach (silicon die, or CMOS compatible): –bottom-up structure: chemical self-assembly for dense and regular arrangement of elements –top-down structure: conventional process options for interconnect or for computation –and memory …
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Double-gate ambipolarity In DG-CNTFETS, the I d -V g characteristic demonstrates ambipolarity –V bg > 0V: device behavior = n-type FET –V bg < 0V: device behavior = p-type FET –V bg floating / 0V: device is in the off state Verilog-A model developed (IMS) Reduced-complexity logic cells Ultra fine-grain reconfigurable logic cells Y.-M. Lin et al., IEEE Trans. Nanotechnology, 4(5),2005 -V +V -V on (n) off (n) on (p) off (p) GPGstate +V X0off (n/p)
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Hybrid technology "Selective Growth of Well-Aligned Semiconducting Single-Walled Carbon Nanotubes", L. Ding et al., Nano Lett., 9(2), 800 (2009) "Monolithic integration of CMOS VLSI and carbon nanotubes for hybrid nanotechnology applications", D. Akinwande et al., IEEE Trans. Nanotechnology, 7(5), 636 (2008)
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Outline Some technology fabric considerations Logic cells –Reduced-complexity dynamic standard cells –Reconfigurable logic cells and design methods Interconnect strategies –Matrix topologies –Island-style architecture –Metrics and comparisons Conclusions
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Dynamic logic standard cells use the extra gate (PG) to reduce complexity function path includes EV phase transistor count: –2n (static logic) –n+2 (conventional DL) –n+1 (this work) clocking strategy: –Double clock (DCK) –Multiple clock (MCK) –Single clock (SCK) EV function path PC Out +V In 1 In 2 In n +V A B V bA V bB Y Pc gnd V dd Ev Layout flipping: rich set of operators
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece SCKMCKDCK Clocking strategies and cell variants PUN EV{0,+V}, PC{0,+V} Precharge: (PC=+V, EV=0) Evaluation: (PC=0, EV=+V) PUN EV+{0,+V}, EV-{0, -V}, PC{0,+V} mixed N- and P- function path: more complex functions function path PC EV- Out +V In 1 In 2 In n EV+ function path Clk Out +V In 1 In 2 In n -V function path PC EV Out +V In 1 In 2 In n PDN Clk{0,+V} Precharge: (Clk=0) Evaluation: (Clk=+V) complementary functions
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Comparison (simulation) V dd =1V f clk =3GHz, t r =t f =20ps, C L =150aF av. power +(0-20)% wc. delay -(30-50)% –no EV transistor, lower branch resistance pdp –(25-40)%
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Reconfigurable logic cell CNT-DR7T boolean data inputs A and B, data output Y {0,+V) four-phase non-overlapping clock signals PC 1, PC 2, EV 1, EV 2 {0,+V) ternary configuration inputs V bgA, V bgB, V bgC {-V,0,+V) EV 2 PC 2 EV 1 PC 1 t C Y f(A,B,V bA,V bB ) f(C,V bC ) = 1.5nm I off = A I on /I off =10 5 V bA A V bB B EV 1 PC 1 EV 2 PC 2 V bC V dd Y C J. Liu, I. O'Connor, D. Navarro, F. Gaffiot, El. Lett., 43(9), April 2007 V bgA V bgB V bgC Y +V -V +V -V +V -V +V -V +V -V +V -V +V -V +V -V A+B A.B A+B A.B A+B A A B B 1 0 +V0 0-V 0+V 0 -V
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Towards complete operator sets 1.5X-2X decrease in power consumption more functions, fewer transistors, one extra configuration signal DRLC-6T 15 functions DRLC-9T 16 functions
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Outline Some technology fabric considerations Logic cells –Reduced-complexity dynamic standard cells –Reconfigurable logic cells and design methods Interconnect strategies –Matrix topologies –Island-style architecture –Metrics and comparisons Conclusions
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Physical view: clusters of matrices
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Directed matrix interconnect topologies Mod_Omega_4d4w Baseline_4d4w Flip_4d4w Banyan_4d4w Adata inputs f rc B YY logic function data output (x2) configuration inputs
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Mapping success rate for matrices omega topology can achieve up to 25% more functions % exploitable cases 0-fault baseline 0-fault omega 0-fault flip 0-fault banyan
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Towards undirected topologies f 11 f 12 f 21 f 22 f 11 f 12 f 21 f 22 f 11 f 12 f 21 f 22 MetricsBanyanSystolic array Cross-cap Max. I/O data width / side---+ Intra-matrix connectivity--+ Total wire length wa+2a(w-1) Max. primary I/O path length wa+2a(w-1) Av. mapping success rate (2x2)61%58%66% Cross-capBanyanSystolic array
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece When to move to island-style? MetricsIsland-styleCell matrix No. transistors involved in mapping, T % mapped matrices in a cluster75100 No. of switches added to connect matrices1680 Island-styleCell-matrix 1-bit FA application
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Wrap-up Logic with ambipolar DG-CNTFETS: –reduced-complexity dynamic-logic standard cells with –(25-40)% PDP –complete operator set dynamic-logic reconfigurable cells with low transistor count and power consumption Interconnect strategies: –directed matrix interconnect topology exploration –cross-cap topology proposed to relieve latency and data-directivity issues –matrices within islands allow efficient packing –routing between islands to be explored …
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Ambipolar double-gate FET BDD (A-BDD) for reconfigurable logic cells a 01 A BB F b c i k L d e f g h j Different edge value (reconfigurable) Shared edge value (non- reconfigurable) 5. Pass-transistor logic circuit implementation 4. Define implementation rules to implement the A-BDD into the circuit level 3. Label every edge connecting two different nodes 2. Combine their BDDs in a common A-BDD 1. Define output functions A-BDD of 2-inputs reconfigurable cell PTL reconfigurable cell (RSL-12T) 6X decrease in power consumption Slightly improved time delay Full functionality More transistors High number of config signals
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Methodology for mapping applications onto matrix-based nanocomputer architectures Aims: mapping applications onto selected architecture; obtaining diverse solutions with required area, power and delay; comparing different architectures Architecture Logic Cell, Matrix, Cluster Application n ≤ pi, m ≤ po Mapping application onto a matrix using GA Pi, number of primary inputs Po, number of primary outputs n, number of inputs m, number of outputs Partitioning Shannon’s expansion yesno Configuration for mapping matrices onto clusters Application Cluster architecture Application is mapped? yes Truth table Interconnection type Matrix size Cell characteristics no Mapping matrices onto clusters (e.g. VPR tool) 1. Mapping matrices using GA 2.1. Partitioning outputs 2.2. Partitioning inputs /Shannon exp.
Institut des Nanotechnologies de Lyon UMR CNRS 5270 ICECS 2010 – Athens, Greece Methodology evaluation Normalized area, delay and track count Flow_1: ABC → T-VPack → VPR Flow_2: new methodology → VPR Two versions of a 1-bit full adder Version 1: 12 cells Version 2: 10 cells