Download presentation
Presentation is loading. Please wait.
Published byNatalie Lynch Modified over 9 years ago
1
Computer Architectures for DNA Self-Assembled Nanoelectronics Alvin R. Lebeck Department of Computer Science Duke University + = Duke Computer Architecture
2
2 © 2006 A. R. Lebeck Duke Computer Architecture Acknowledgements People Students: Jaidev Patwardhan, Constantin Pistol, Vijeta Johri, Sung-Ha Park, Nathan Sadler, Niranjan Soundararajan, Ben Burnham, R. Curt Harting Chris Dwyer, Daniel J. Sorin, Thomas H. LaBean, Jie Liu, John H. Reif, Hao Yan Sean Washburn, Dorothy A. Erie (UNC) Funding Air Force Research Lab National Science Foundation (ITR) Duke University Office of the Provost Equipment from IBM & Intel
3
3 © 2006 A. R. Lebeck Duke Computer Architecture Current Processor Designs Large Complex Systems (millions/billions of transistors) Mature technology (CMOS) Precise control of entire design and fabrication process Lithographic process to create smaller and smaller features. –But has limits… Cost of facility, high defect rates, process variation, etc. Silicon N doped S D Gate Transistor
4
4 © 2006 A. R. Lebeck Duke Computer Architecture The Red Brick Wall “Eventually, toward the end of the Roadmap or beyond, scaling of MOSFETs (transistors) will become ineffective and/or very costly, and advanced non-CMOS solutions will need to be implemented.” [International Technology Roadmap for Semiconductors, 2003 Edition, Difficult Challenge #10]
5
5 © 2006 A. R. Lebeck Duke Computer Architecture The Potential Solution Self-Assembled Nanoelectronics Self-assembly –Molecules self-organize into stable structures (nano) What nanostructures? What nanoelectronic devices? How does self-assembly affect computer system design?
6
6 © 2006 A. R. Lebeck Duke Computer Architecture Outline Nanostructures & Components Circuit Design Issues Architectural Implications Proposed Architectures Defect Tolerance Conclusion
7
7 © 2006 A. R. Lebeck Duke Computer Architecture DNA Self-Assembly Well defined rules for base pair matching –Thermodynamics driven hybridization Can specify sequence of pairs, forms double helix –Synthetic DNA –Engineered Nanostructures –Inexpensive lab equipment Adenine (A) (T) Thymine Cytosine (C)(G) Guanine Sticky End (Tag) 20 nm [Seeman ’99, Winfree et al. ’98, Yan, et al. ’03] Strands→Tiles→Structures
8
8 © 2006 A. R. Lebeck Duke Computer Architecture DNA-based Self-Assembly of Nanoscale Systems Use synthetic DNA as scaffolding for nanoelectronics Create circuits (nodes) using aperiodic patterning –Demonstrated aperiodic patterns with 20nm pitch [FNANO ’05, Angewandte Chemie ’06, DAC ’06]
9
9 © 2006 A. R. Lebeck Duke Computer Architecture Nanoelectronic Components Many Choices / Challenges –Good Transistor Behavior –Interaction with DNA Lattice Crossed Nanotube Transistor [Fuhrer et al. ’01] Demonstrated Functionalization of Tube Ends [Dwyer, et al. ’02] Other candidates: Ring- gated, Crossed Nanorod, Crossed Carbon Nanotube FETs A C G T [Dwyer, et al. IEEE FNANO ’04]
10
10 © 2006 A. R. Lebeck Duke Computer Architecture Circuit Design Issues Goal Construct a computing system using the DNA Lattice and nanoelectronic components. Proposal Use DNA tags (sticky-ends) to place nano-components on lattice 1.Regularity of DNA Lattice Easy to replicate simple structures on a moderate scale 2.Complexity of Digital Circuits Large Graph with many unique nodes and edges 3.Tolerating Defects Single-stranded DNA for tags (sticky-ends) may have partial matches (must minimize number of unique tags) Nanotubes may not work as advertised
11
11 © 2006 A. R. Lebeck Duke Computer Architecture Balancing Regularity & Complexity Array of simple objects Unit Cell based on lattice cavity –Uniform length nanotubes –Minimizes # of DNA Tags => reduces probability of partial match –20nm x 20nm Two levels of interconnect Complex circuits on single lattice (10K FETS) Envision ~9µm 2 node size: ~10,000 FETs + interconnect How to get billions or more? A B 20nm V dd plane Ground plane Insulating Layer Interconnect Layers
12
12 © 2006 A. R. Lebeck Duke Computer Architecture Self-Assembled System Self-assemble ~ 10 9 - 10 12 simple nodes (~10K FETs) Potential: Tera to Peta-scale computing Random Graph of Small Scale Nodes –There will be defects –Scaled CMOS may look similar How do we perform useful computation? + A B 20nm Node Interconnect Node Wire [Yan ’03] (selective metallization)
13
13 © 2006 A. R. Lebeck Duke Computer Architecture Outline Nanostructures & Components Circuit Design Issues Architectural Implications Proposed Architectures Defect Tolerance Conclusion
14
14 © 2006 A. R. Lebeck Duke Computer Architecture Implications of Small Nodes Node: DNA Grid FETs –3 m x 3 m node –Carbon nanotube [Dwyer ’02] –Ring Gated [Skinner ’05] Small Scale Control –Controlled complexity only within one node Limited space on each node –Simple circuits (e.g., full adder) Limited communication between nodes –only 4 neighbors –No global (long haul) interconnect Limited coordination –Difficult to get many nodes to work together (e.g., 64-bit adder) A B 20nm
15
15 © 2006 A. R. Lebeck Duke Computer Architecture Implications of Randomness Self-assemble interconnect of nodes 1.Random node placement 2.Random node orientation 3.Random connectivity 4.High defect rates (assume fail stop node) Limitations -> architectural challenges
16
16 © 2006 A. R. Lebeck Duke Computer Architecture Architectural Challenges Node Design Utilizing Multiple Nodes –Each node is very simple Routing Execution Model –Must overcome implementation constraints Instruction Set Micro-scale Interface
17
17 © 2006 A. R. Lebeck Duke Computer Architecture Outline Nanostructures & Components Circuit Design Issues Architectural Implications Proposed Architectures –Defect Isolation & Structure –NANA [JETC ’06] –SOSA [ASPLOS ’06] Defect Tolerance Conclusion
18
18 © 2006 A. R. Lebeck Duke Computer Architecture Nano-scale Active Network Architecture Large-scale fabrication (10 12 nodes, 10 9 cells) Via provides micro-scale interface, Multiple Node Types First Cut: Understand issues A Single Cell System View
19
19 © 2006 A. R. Lebeck Duke Computer Architecture Defect Isolation/Structure Grid w/ Defects → Random Graph Reverse path forwarding [Dalal ’78] Broadcast on all links except input [Nanoarch ’05] –Forward broadcast if not seen before –Implement fail-stop nodes [Nanoarch ’06] RPF maps out defective regions –No external defect map –Can tolerate up to 30% defective nodes Distributed algorithm to create spanning tree Route packets along tree –Up*/down* –Depth first How do we compute? Anchor Defective Node Node Node (after RPF) Root Direction
20
20 © 2006 A. R. Lebeck Duke Computer Architecture NANA: Computing on a Random Graph Perform 3 operations: Add, Add, Multiply Search along path for correct blocks to perform function Execution packets carry operation and values Proof-of-concept simulations X + X - X + + Enter Exit
21
21 © 2006 A. R. Lebeck Duke Computer Architecture NANA: Execution Model & ISA Accumulator based ISA Carry data and instructions in a “packet” Use bit-serial processing elements –Each element operates on one bit at a time –Minimize inter-bit communication HeaderTailop 1op 2op 3 A0A0 B0B0 C0C0 D0D0 A1A1 B1B1 C1C1 D1D1 B 31 C 31 D 31 A 31 opcode Bit Interleaved operands
22
22 © 2006 A. R. Lebeck Duke Computer Architecture NANA: System Overview Simple programs –Fibonacci –String compare Utilization is low Divide 10 12 nodes into 10 9 cells Peak performance potentially higher than IBM Blue Gene and NEC Earth Simulator Need to use more nodes! Log Peak Performance (bitops/sec)
23
23 © 2006 A. R. Lebeck Duke Computer Architecture Self-Assembled System Self-assemble ~ 10 9 - 10 12 simple nodes (~10K FETs) Potential: Tera to Peta-scale computing Random Graph of Small Scale Nodes –There will be defects –Scaled CMOS may look similar How do we perform useful computation? + A B 20nm Node Interconnect Node Wire [Yan ’03] (selective metallization) PE Control Processor Group many nodes into a SIMD PE PEs connected in logic ring Familiar data parallel programming
24
24 © 2006 A. R. Lebeck Duke Computer Architecture Self-Organizing SIMD Architecture (SOSA) Nodes Grouped to form SIMD Processing Element (PE) –Head, Tail, N computation nodes (k-wide bit-slice of PE) Configuration: Depth First Traversal of Spanning Tree –Orders nodes within PE (Head → LSB → …→ MSB → Tail) –Orders PEs Many SIMD PEs on logical ring → familiar data parallel programming abstraction VIA Tree Edge PE boundary 1 3 5 6 8 9 12 2 4 7 10 11
25
25 © 2006 A. R. Lebeck Duke Computer Architecture SOSA: Instruction Broadcast Instructions broadcast to all nodes Instructions decomposed into three “microinstructions” (opcode, registers, synch) Can reach nodes/PEs at different times (5 before 9) 1 3 5 6 8 9 12 2 4 7 10 11 Enter PE boundary
26
26 © 2006 A. R. Lebeck Duke Computer Architecture SOSA: Instruction Execution Instructions execute asynchronously within/across PEs XOR parallel within PE vs. Addition serial within PE ISA: Three register operand, predication, optimizations, see paper for details… 1 3 5 6 8 9 12 2 4 7 10 11 Enter PE boundary
27
27 © 2006 A. R. Lebeck Duke Computer Architecture Two System Configurations One Large System Latency Space sharing Multiple “cells” Throughput
28
28 © 2006 A. R. Lebeck Duke Computer Architecture Outline Nanostructures & Components Circuit Design Issues Architectural Implications Proposed Architectures –SOSA [ASPLOS ’06] –Node Design –Evaluation Defect Tolerance Conclusion
29
29 © 2006 A. R. Lebeck Duke Computer Architecture SOSA Node Homogeneous Nodes –Specialized during configuration Asynchronous Logic Communication –4 tranceivers (4 phase handshake) –3 virtual channels (inst bcast, ring left & right) Computation –ALU –Register (32-bits: 32x1 or 16x2) –Inst Buffer Configuration –Route Setup Subcomponent BIST [nanoarch ’06] A L U SR VC1 Buffer Output VC2 Buffer Output Entry Routing Transceiver 1 Register File Control Transceiver 0 Transceiver 2 Transceiver 3 Channel 1 Virtual Channel 2 Virtual Channel 0 Analog Control Synch Control Reg Data Buffer VC2 VC1 DeMux Instruction Buffer Register Specifiers Buffer Input Buffer Input C VC0 Opcode Mu x Point to Point Network Logic Send Point Logic Point to Route Logic CMD Logic Writing Logic Receive DS DS DeMux VC0 Buffer Input Mux Buffer Output Route Setup Logic
30
30 © 2006 A. R. Lebeck Duke Computer Architecture SOSA Node VHDL –~10K FETs Area ~= 9 m 2 –Custom layout tools for standard cells Power ~= 6.5 W/cm 2 –Semi-empirical spice model [IEEE Nano ’04] –1ns switching time –88% devices active –0.775 W / node Modern proc > 75 W/cm 2 Transceiver 1 Transceiver 0 Transceiver 2 Transceiver 3 Configuration Logic Compute Logic
31
31 © 2006 A. R. Lebeck Duke Computer Architecture Evaluation Methodology Custom event simulator –Conservative 1ns time quantum (switching time) –2 bits per node (16 registers, 16 + 2 for 32-bit PE) Nine benchmarks –Integer code only – no hardware support for floating point –Matrix multiplication, image filters (gaussian, generic, median), encryption (TEA, XTEA), sort, search, bin-packing Compare performance to four other architectures –Pentium 4 (P4) (real hardware) –ideal out-of-order superscalar (I-SS) 10GHz, 128-wide, 8K ROB –ideal Chip Multiprocessor (I-CMP) 16-way ideal –ideal SOSA (I-SOSA) no communication overhead, unit inst latency –extrapolate for large SOSA systems (back validate)
32
32 © 2006 A. R. Lebeck Duke Computer Architecture Matrix Multiply (Execution Time) Hand optimizations (loop unrolling, etc.) Better scalability than other systems (crossover < 1000) Still room for improvement
33
33 © 2006 A. R. Lebeck Duke Computer Architecture TEA Encryption (Throughput) ArchitectureEncryptions/sec P4 @ 3 GHz (100mm 2 )3.9 M/sec I-SS73.62 M/sec 16-CMP1180 M/sec SOSA (1 cell ~ 0.019mm 2 )0.175 M/sec I-SOSA (1 cell)27.7 M/sec SOSA (5400 cells, 100mm 2 )940 M/sec I-SOSA(5400 cells)72300 M/sec Used in XBOX shift, add and xor 64 bit data blocks 128-bit key Pipelined on 64 PEs Configure Multiple Cells of 64 PEs Single Cell poor 200X better than P4 in same area
34
34 © 2006 A. R. Lebeck Duke Computer Architecture Outline Nanostructures & Components Circuit Design Issues Architectural Implications Proposed Architectures Defect Tolerance (not transient faults) Conclusion
35
35 © 2006 A. R. Lebeck Duke Computer Architecture Defect Tolerance Simple Fail Stop model Encryption gracefully degrades MXM < 10% degradation up to 20% defective nodes Matrix Multiply Encryption
36
36 © 2006 A. R. Lebeck Duke Computer Architecture Node Failure Modes [Nanoarch ’06] Transceiver Compute Logic Configuration Simple Transceiver Compute Logic Configuration Compute - Centric Communication - Centric Transceiver Compute Logic Configuration Hybrid – Any Two Components Exploit modular node design –VHDL BIST for communication & configuration (all stuck-at faults) –Assume software test for compute logic Configuration logic is critical
37
37 © 2006 A. R. Lebeck Duke Computer Architecture Evaluation Simple node model in C Model network with 10,000 nodes Vary transistor defect probability from 0%-0.1% –Map defective transistors to defective components Average 500 runs per data point How much do we benefit by node modularity? –What device defect probability can it handle?
38
38 © 2006 A. R. Lebeck Duke Computer Architecture Results: Usable Nodes Hybrid failure mode can tolerate a higher device failure probability –Three orders of magnitude greater than typical CMOS designs (10 -4 vs. 10 -7 )
39
39 © 2006 A. R. Lebeck Duke Computer Architecture Results: Reachable Nodes Hybrid increases the number of reachable nodes –More nodes with functioning compute logic reachable and usable
40
40 © 2006 A. R. Lebeck Duke Computer Architecture Fail-Stop Summary Test logic detects defects in node components Modular node design enables partial node operation Node is useful if –It can compute OR –It can improve system connectivity Hybrid failure mode increases available nodes –Can help tolerate a device failure probability of 1.5x10 -4 (1000 times greater than typical CMOS designs)
41
41 © 2006 A. R. Lebeck Duke Computer Architecture SOSA Summary Distributed algorithm for structure & defect tolerance –No external defect map Configuration groups nodes into SIMD PEs High utilization w/ familiar programming model Ability to reconfigure –One system for latency critical systems –Multiple cells for throughput systems Limitations: I/O bandwidth, general purpose codes, FP, transient faults
42
42 © 2006 A. R. Lebeck Duke Computer Architecture Conclusion Future limits on traditional CMOS scaling –Multicore, etc. -> tera/peta scale w/ 1M nodes Defects, cost of fabrication, process variation, etc. High performance, low power despite randomness and defects + = Engineered DNA Nanostructures Nanoelectronics Computers Of Tomorrow
43
43 © 2006 A. R. Lebeck Duke Computer Architecture Duke Nanosystems Overview DNA Self-Assembly [FNANO 2005, Ang. Chemie 2006, DAC 2006] Nano Devices Electronic, optical, etc. [Nanoletters 2006 ] Large Scale Interconnection [NANONETS 2006] Circuit Architecture [FNANO 2004] Logical Structure & Defect Isolation [NANOARCH 2005] SOSA - Data Parallel Architecture [NANOARCH 2006, ASPLOS 2006] NANA - General Purpose Architecture [JETC 2006]
44
44 © 2006 A. R. Lebeck Duke Computer Architecture Generic Filter (Execution Time) 3x3 generic filter (Gaussian & Median similar)
45
45 © 2006 A. R. Lebeck Duke Computer Architecture Circuit Architecture Unit Cell based on lattice cavity –Place uniform length nanoelectronic devices –Reduces probability of partial matches –Two layers of interconnect Achieve balance between –Regularity of DNA lattice –Complexity required for circuits –Defect Tolerance Node: DNA Lattice with CNFETs 20nm Carbon nanotubes V dd plane Ground plane Insulating Layer Interconnect Layers Metal nanoparticles
46
46 © 2006 A. R. Lebeck Duke Computer Architecture 0 0 1 Fail-Stop Transceivers Minimize test overhead –Reuse node hardware during test Hardware Test –Send ‘0’ and ‘1’ in a loop –If data returns, enable component –If data does not return, component remains disabled Similar principle for configuration logic Modular design enables graceful degradation Transmit Logic Receive Logic Output Buffer Input Buffer Test Logic Test loopback path TEST_OK=0
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.