ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

Chapter 10 Digital CMOS Logic Circuits
Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
Resonant Tunnelling Devices A survey on their progress.
Transistors (MOSFETs)
Kameshwar K. Yadavalli, Alexei O. Orlov, Ravi K. Kummamuru, John Timler, Craig Lent, Gary Bernstein, and Gregory Snider Department of Electrical Engineering.
Metal Oxide Semiconductor Field Effect Transistors
ITRS Emerging Logic Device working group George Bourianoff, Intel San Francisco, Ca July 10, 2011 April 10, ERD Meeting Potsdam, Germany 1.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
Single Electron Transistor
Digital Integrated Circuits A Design Perspective
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 6 Memory and Programmable Logic Devices
© Digital Integrated Circuits 2nd Devices VLSI Devices  Intuitive understanding of device operation  Fundamental analytic models  Manual Models  Spice.
TOWARDS AN EARLY DESIGN SPACE EXPLORATION TOOL SET FOR STT-RAM DESIGN Philip Asare and Ben Melton.
Emerging Memory Technologies
CS231: Computer Architecture I Laxmikant Kale Fall 2004.
Test #1 rescheduled to next Tuesday, 09/20/05 The contents will cover chapter 1, 2, and part of Chapter 4. The contents will cover chapter 1, 2, and part.
The Devices Digital Integrated Circuit Design Andrea Bonfanti DEIB
International ERD TWG Emerging Research Devices Working Group Face-to-Face Meeting Emerging Research Memory Devices Victor Zhirnov and Rainer Waser Seoul,
Work in Progress --- Not for Publication 1 ERD WG 7/10/11 San Francisco FxF Meeting 2011 ERD Critical Review Survey 2011 Process for Critically Reviewing.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Introduction to FinFet
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Contents:  Introduction  what do you mean by memristor.  Need for memristor.  The types of memristor.  Characteristics of memristor.  The working.
NANOCOMPUTING BY FIELD-COUPLED NANOMAGNETS n AUTHORS : Gyorgy Csaba Alexandra Imre Gary H. Bernstein Wolfang Porod (fellow IEEE) Vitali Metlushko n REFERENCE.
Ralph K. Cavin, III March 18, 2009 Brussels.  Is there a Carnot-like theorem for computation? ◦ e.g., a limit on rate of information throughput/power.
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit December 5th, 2005 Low Emission Digital Circuit Design Junfeng Zhou.
Design of an 8-bit Carry-Skip Adder Using Reversible Gates Vinothini Velusamy, Advisor: Prof. Xingguo Xiong Department of Electrical Engineering, University.
Chapter 7 Logic Circuits 1.State the advantages of digital technology compared to analog technology. 2. Understand the terminology of digital circuits.
Supply Voltage Biasing Andy Whetzel and Elena Weinberg University of Virginia.
Representing Numerical Data Analog Any signal that varies continuously over time Mechanical Pneumatic Hydraulic Electrical Digital Quantities are represented.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Work in Progress --- Not for Publication 1 ERD WG 1/15/09 ERD TWG Emerging Research Devices Telecon Meeting No. 3 Jim Hutchby - Facilitating Thursday,
Work in Progress --- Not for Publication 1 ERD WG 1/22/2009 ERD TWG Emerging Research Devices Telecon Meeting No. 4 Jim Hutchby - Facilitating Thursday,
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.
Critical Review of Critical Assessment Section
Basics of Energy & Power Dissipation
COMP 1321 Digital Infrastructure Richard Henson University of Worcester October 2015.
ERD Logic Section for 2009 ITRS Logic Workshop San Francisco, Ca. Dec 14, 2008 George Bourianoff facilitating.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
11. 9/15 2 Figure A 2 M+N -bit memory chip organized as an array of 2 M rows  2 N columns. Memory SRAM organization organized as an array of 2.
ICC Module 3 Lesson 1 – Computer Architecture 1 / 12 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 6 – Logic parallelism.
Monday, January 31, 2011 A few more instructive slides related to GMR and GMR sensors.
Solid-State Devices & Circuits
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
CMOS technology and CMOS Logic gate. Transistors in microprocessors.
Sarvajanik College of Engineering & Tech. Project By: Bhogayata Aastha Chamadiya Bushra Dixit Chaula Tandel Aayushi Guided By: Bhaumik Vaidya.
Chapter 3 Boolean Algebra and Digital Logic T103: Computer architecture, logic and information processing.
1 Computer Organization Wireless & Mobile Networks Lab Li-hua Dong
Modeling of Failure Probability and Statistical Design of Spin-Torque Transfer MRAM (STT MRAM) Array for Yield Enhancement Jing Li, Charles Augustine,
VLSI Testing Lecture 5: Logic Simulation
VLSI Testing Lecture 5: Logic Simulation
EE141 Microelectronic Circuits Chapter 10. Semiconductors, Diodes, and Power Supplies School of Computer Science and Engineering Pusan National University.
COMP 1321 Digital Infrastructure
Vishwani D. Agrawal Department of ECE, Auburn University
EE141 Microelectronic Circuits Chapter 10. Semiconductors, Diodes, and Power Supplies School of Computer Science and Engineering Pusan National University.
Architecture & Organization 1
Advisor: Hamid Mahmoodi Group Mentor: Ali Attaran
Invitation to Computer Science, Java Version, Third Edition
Fundamentals of Computer Science Part i2
Architecture & Organization 1
Literature Review Scalable Spin-Transfer Torque RAM Technology for Normally-Off Computing T. Kawahara Richard Dorrance July 13, 2012.
Computer Evolution and Performance
STT-RAM Design Fengbo Ren Advisor: Prof. Dejan Marković Dec. 3rd, 2010
Literature Review A Nondestructive Self-Reference Scheme for Spin-Transfer Torque Random Access Memory (STT-RAM) —— Yiran Chen, et al. Fengbo Ren 09/03/2010.
Dr. Clincy Professor of CS
Presentation transcript:

ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Goals of the NRI/MIND Benchmarking Project Develop circuit/subsystem level examples of the applications of novel devices Evaluate the circuits/subsystems in the energy-time-space context versus CMOS implementations Determine most promising applications for emerging devices with an emphasis on integration with CMOS

Architectural Innovations haven’t been the major driver for system performance Analysis of high perf architectures and the technologies they were built in, examining device vs arch contributions to throughput - Predominant influence on SPEC2000 is from device technology - Modest contributions from architecture

Four Architectural Projections 1)CMOS is not going away anytime soon. Charge (state variable), and the MOSFET (fundamental switch) will remain the preferred HPC solution until new switches appear as the long term replacement solution in years. 2)Hdwre Accelerators execute selected functions faster than software performing it on the CPU. Accelerators are responsible for substantial improvements in thru-put. 3)Alternative switches often exhibit emergent, idiosyncratic behavior. We should exploit them. Certain physical behaviors may emulate selected HPC instruction sequences. Some operations may be superior to digital solutions. 4)New switches may improve high-utilitization accelerators The shorter term supplemental solution (5-15 years) improves or replaces accelerators “built in CMOS and designed for CMOS”, either on-chip or on-3D-stack or on-planar

Matching Logic Functions & New Switch Behaviors Single Spin Spin Domain Tunnel-FETs NEMS MQCA Molecular Bio-inspired CMOL Excitonics ? Popular Accelerators New Switch Ideas Encrypt / Decrypt Compr / Decompr Reg. Expression Scan Discrete COS Trnsfrm Bit Serial Operations H.264 Std Filtering DSP, A/D, D/A Viterbi Algorithms Image, Graphics Example: Cryptography Hardware Acceleration Operations required:Rotate, Byte Alignmt, EXORs, Multiply, Table Lookup Circuits used in Accel:Transmission Gates (“T-Gates”) New Switch Opportunity:A number of new switches (i.e. T-FETs) don’t have (example)thermionic barriers: won’t suffer from CMOS Pass-gate V T drop, Body Effect, or Source-Follower delay. Potential Opportunity:Replace 4 T-Gate MOSFETs with 1 low power switch.

Examples of Benchmarking Work in Progress Magnetic Tunnel Junction one-bit adder Magnetic Logic for one-bit adder Magnetic Ring Logic Devices Many other devices are being evaluated in a variety of circuit configurations.

Background - MTJ Researchers have been investigating post-CMOS devices for many years. In short term, people are looking for switches that supplement CMOS and are CMOS-compatible, supporting ultra-low power operation. MTJ (Magnetic Tunnel Junction) is one of the strongest candidate which is available in practice rather than only in theory. – Excellent for memory and storage. STT-RAM using MTJ is strong candidate for universal memory. – For logic design, good or not? Any memory device can also be used to build logic circuits, in theory at least, and MTJ is no exception. The discovery of spin torque transfer (STT) makes MTJ scalable and completely CMOS-compatible.

MTJ-based DyCML 1 Bit Full Adder MTJ is used as both a memory cell and functional input. The switching of MTJ conducted by STT using control signals WL, BL. It is actually a CMOS-MTJ-combined version of DyCML. Thus, it is more reasonable to compare it with CMOS-based DyCML to see MTJ’s impact.

Results ED Curve of 65nm process DyCML-MTJ SCMOS DyCML-CMOS

Nanomagnet Logic (NML) PIs Gary Bernstein 1, X. Sharon Hu 2, Michael Niemier 2, Wolfgang Porod 1 Student Researchers: M. Tanvir Alam 1, Michael Crocker 2, Aaron Dingler 2, Steve Kurtz 2, Shawn Liu 2, M. Jafar Siddiq 1, Edit Varga 1 Affiliations: 1 Department of Electrical Engineering, 2 Department of Computer Science and Engineering

Comparison to CMOS Hard to compare magnet to transistor – Need to make technology comparison at functional unit level; consider initial projections here Natural comparison = low power CMOS systems, sub-threshold, etc. 11 A C B Sum C out M1 M2 M3 Base performance projections on adder design.

Trends 12 V &  r EDP (pJ ns) Because of sensitivity to sub-threshold slope, threshold voltage … energy, delay can vary significantly from technology to technology. These are best data points for CMOS (0.3V - 1V) Energy (pJ)Delay (ns) CMOS NML NP NML P With  r = 1, can still see ~15X performance gain due to higher throughput CMOS If higher supply voltage to match delay, ~7X energy savings NML NP NML P With  r = 5, ~17x (NP) and ~158X (P) energy savings with better performance

Magnetic Ring Logic Devices – Benchmarks/Metrics Caroline Ross - MIT These devices work by the movement of domain walls around thin film rings with general structure Hard layer/Spacer/Soft layer, e.g. Co/Cu/NiFe or Co/MgO/NiFe. Rings can have several remanent states with different resistances. This is useful for multibit memory. However, digital logic uses two levels so in these examples, some of the complexity available in ring devices is wasted NAND/NOR configurations are being analyzed.

Existing prototypeProjection Device area1 µm 2 Improve x 100? Switching speed5 nsProportional to 1/device length (improve x 10?) and domain wall velocity (improve x 10?) Switching energy J (10 7 kT)Proportional to switching speed (improve x 100??) and to device x-section area (improve x 10-20?) and to critical current for wall motion (improve x ?) Prototype Magnetic Ring Device Performance

Summary The Nanoelectronics Research Initiative benchmarking project should be nearing completion by mid-August, 2009 The ERA section plans to provide a summary of findings for 2009