A New Design Approach for High-Throughput Arithmetic Circuits for Single-Flux-Quantum Microprocessors Masamitsu Tanaka, Nagoya Univ., JSPS Co-workers:

Slides:



Advertisements
Similar presentations
Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
Advertisements

A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
1 4-BIT ARITHMETIC LOGIC UNIT Motorola MC54/74F181 Heungyoun Kim Lu Gao Jun Li Advisor: Dr. David W. Parent DATE: 12/05/2005.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Introduction to Registers Being just logic, ALUs require all the inputs to be present at once. They have no memory. ALU AB FS.
L23 – Arithmetic Logic Units. Arithmetic Logic Units (ALU)  Modern ALU design  ALU is heart of datapath  Ref: text Unit 15 9/2/2012 – ECE 3561 Lect.
Combinational Circuits Chapter 3 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer, 2003.
1 Registers and Counters A register consists of a group of flip-flops and gates that affect their transition. An n-bit register consists of n-bit flip-flops.
Using building blocks to make bigger circuits
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
HCL and ALU תרגול 10. Overview of Logic Design Fundamental Hardware Requirements – Communication: How to get values from one place to another – Computation.
1/8/ L3 Data Path DesignCopyright Joanne DeGroat, ECE, OSU1 ALUs and Data Paths Subtitle: How to design the data path of a processor.
Eng.Samra Essalaimeh Philadelphia University 2013/ nd Semester PIC Microcontrollers.
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
מבנה מחשב תרגול 2. 2 Boolean AND Operation Truth Table Equivalent Gate Different notations:
PPI-8255.
CSE115: Introduction to Computer Science I Dr. Carl Alphonce 219 Bell Hall
Chapter 3 Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Transistor: Building.
1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.
Varadarajan Srinivasan, Julian W. Farquharson,
CO5023 Latches, Flip-Flops and Decoders. Sequential Circuit What does this do? The OUTPUT of a sequential circuit is determined by the current output.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
How does a Computer Add ? Logic Gates within chips: AND Gate A B Output OR Gate A B Output A B A B
EKT 221 : Digital 2 Computer Design Basics Date : Lecture : 2 hrs.
4–1. BSCS 5 th Semester Introduction Logic diagram: a graphical representation of a circuit –Each type of gate is represented by a specific graphical.
1 The ALU l ALU includes combinational logic. –Combinational logic  a change in inputs directly causes a change in output, after a characteristic delay.
LOGIC CIRCUITLOGIC CIRCUIT. Goal To understand how digital a computer can work, at the lowest level. To understand what is possible and the limitations.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.
Copyright © 2001 Stephen A. Edwards All rights reserved Busses  Wires sometimes used as shared communication medium  Think “party-line telephone”  Bus.
George Mason University Finite State Machines Refresher ECE 545 Lecture 11.
End OF Column Circuits – Design Review
Instructor:Po-Yu Kuo 教師:郭柏佑
Combinational Circuits
Subtitle: How to design the data path of a processor.
Lecture 11: Hardware for Arithmetic
Computer Design Basics
Introduction to Registers
Instructor:Po-Yu Kuo 教師:郭柏佑
Chap 7. Register Transfers and Datapaths
Swamynathan.S.M AP/ECE/SNSCT
Morgan Kaufmann Publishers
EKT 221 : Digital 2 Serial Transfers & Microoperations
Homework Reading Machine Projects Labs
EKT 221 : Digital 2 COUNTERS.
Basics of digital systems
Morgan Kaufmann Publishers
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
Fundamentals of Computer Science Part i2
Digital Logic Structures Logic gates & Boolean logic
CSE Winter 2001 – Arithmetic Unit - 1
Arithmetic Circuits (Part I) Randy H
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Enemies make you stronger, allies make you weaker. Frank Herbert
Masamitsu Tanaka, Nagoya Univ.
Instructor:Po-Yu Kuo 教師:郭柏佑
Lecture 11: Hardware for Arithmetic
Homework Reading Machine Projects Labs
Instructor: Mozafar Bag-Mohammadi University of Ilam
Thought of the Day To be what we are, and to become
Digital Logic Circuits
Clockless Logic: Asynchronous Pipelines
Computer Design Basics
Combinational Circuits
Yuki Yamanashi, I. Okawa, N. Yoshikawa
Digital Circuits and Logic
Instruction execution and ALU
Clockless Computing Lecture 3
Presentation transcript:

A New Design Approach for High-Throughput Arithmetic Circuits for Single-Flux-Quantum Microprocessors Masamitsu Tanaka, Nagoya Univ., JSPS Co-workers: Y. Yamanashi2, Y. Kamiya1, A. Akimoto2, N. Irie1, H. Park2, A. Fujimaki1, N. Yoshikawa2, H. Terai3, S. Yorozu4 1Nagoya Univ., 2Yokohama National Univ., 3 NICT, 4ISTEC-SRL Acknowledgment: This work was supported by the NEDO through ISTEC as Collaborative Research and Superconductors Network Device Project.

Introduction The single-flux-quantum (SFQ) logic circuits use impulse-shaped voltage pulses as signals. Ultra-high throughput performance is achieved in applications with a unidirectional data flow. 1mm A cross/bar switch demonstrated up to 50 Gbps/ch* * Y. Kameda et al, IEEE Trans. Appl. Supercond., vol. 15, issue 1, pp. 6-10, 2005.

Problem in SFQ Circuits with Loop Paths Microprocessors have very complex interconnects including loops of data in the datapath. The loops spoil the high-throughput nature of SFQ logic. 1mm ALU Typical bit-serial adder CORE1β v6 [see 3EY01] 10,927 JJs, 3.3 mW 4-stage pipeline, 1500 MOPS

Purpose of This Study A simple approach: to reduce junctions in the loops. Optimize wiring and physical pin alignments of logic gates. However, the removable junctions are limited. We present a new design approach for high-throughput computation in SFQ complex arithmetic circuits.

Conventional Implementation In the conventional implementation of a sequential logic circuit, the feedback loop is required. The state is stored in latches with destructive readout. decoder inputs output state feedback loop to update the state latches (destructive readout gates)

Our Approach Based on State Transitions In our design approach, we use nondestructive readout gates such as NDROs to store the state. Calculations are once decoded into state transitions. The circuit has no loops, and can be fully pipelined. pre-decoder post- decoder decoded transitions inputs output state SFQ NDROs (nondestructive readout )

Implementation of Bit-Serial Adder We select the carry signal as the internal state. The state transition is killing (k), propagating (p), or generating (g) the carry according to inputs X and Y. NDRO is controlled by the condition k and g. (Nondestructive readout operation corresponds to p.) store the carry as internal state X Y Carry to the Next Bit killed (k) 1 propagated (p) generated (g)

Benefits of New Approach We achieve the high throughput performance. The loop path of the carry signal is eliminated. It is easily possible to control the carry externally by inserting confluence buffers just before the NDRO, WITHOUT the throughput decreased. This scheme is applicable to bit-slice adders. Calculate p, g as well as carry lookahead adders*: pi:j = pi:k-1 • pk:j, gi:j = gi:k-1 + pk:j • gk:j. Finally, k is obtained using p and g (k = p NOR g). * P. Bunyk et al, IEEE Trans. Appl. Supercond., vol. 9, no. 2, pp. 3714-20, 1999.

Demonstration of Bit-Serial Adder designed adder shift registers ladder oscillator Fabricated using the NEC 2.5 kA/cm2 Nb standard process II

Dc bias current [%] (normalized by the designed value) Experimental Results We confirmed correct operations up to 36 GHz. The throughput will not decrease even if we add the circuitry to control the carry externally. Limit of conventional approach is ~20GHz Dc bias current [%] (normalized by the designed value) Operating region Max. 36 GHz Frequency [GHz]

Summary We have proposed a new design approach for high-throughput SFQ arithmetic circuits without loops. We use NDROs to store the internal state. By translating the calculations into transitions of the state, we can eliminate the loops and achieve high throughput. We have implemented a bit-serial adder using the new approach, and demonstrated it up to 36 GHz. The high throughput is also expected even if we add some functions such as controller of carry propagation to it.

Thank You!

Implementation of 4-Bit-Slice Adder decode inputs (calculate p, g, k) output update the state

Implementation of 1-bit ALU with a Buffer Cin Din Cout B Dout 80µm 80µm Op. Din Cin State Trans. AND X reset 1 OR set XOR invert ADD (invert x2) test result of nondestructive resettable/settable TFF