A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo

Slides:

Advertisements

Similar presentations

Logical Design.

Advertisements

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.

UNIVERSITY OF MASSACHUSETTS Dept

CSE-221 Digital Logic Design (DLD)

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 24 - Subsystem.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Why Systolic Architecture ? VLSI Signal Processing 台灣大學電機系吳安宇.

Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

1 Clockless Logic Montek Singh Tue, Mar 16, 2004.

Nov. 29, 2005ELEC Power Minimization Using Voltage Reduction and Parallel Processing By Sudheer Vemula.

UNIVERSITY OF MASSACHUSETTS Dept

Digital Design – Optimizations and Tradeoffs

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits Rong Lin Ronald B. Alonzo SUNY.

Lecture 18: Datapath Functional Units

Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.

Adders and Multipliers Review. ARITHMETIC CIRCUITS Is a combinational circuit that performs arithmetic operations, e.g. –Addition –Subtraction –Multiplication.

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

1 CHAPTER 4: PART I ARITHMETIC FOR COMPUTERS. 2 The MIPS ALU We’ll be working with the MIPS instruction set architecture –similar to other architectures.

Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

Chapter # 5: Arithmetic Circuits

Topic: Arithmetic Circuits Course: Digital Systems Slide no. 1 Chapter # 5: Arithmetic Circuits.

Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.

Arithmetic Building Blocks

5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

HCL and ALU תרגול 10. Overview of Logic Design Fundamental Hardware Requirements – Communication: How to get values from one place to another – Computation.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.

Arithmetic Building Blocks

Description and Analysis of MULTIPLIERS using LAVA.

Multi-operand Addition

Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.

1 Dynamic Interconnection Networks Miodrag Bolic.

CHAPTER 4 Combinational Logic

L 19: Low Power Circuit Optimization. Power Optimization Modeling and Technology Circuit Design Level –logic Families –low-power Flip-Flops –low-power.

July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.

4. Computer Maths and Logic 4.2 Boolean Logic Logic Circuits.

1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.

EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Logic and computers 2/6/12. Binary Arithmetic /6/ Only two digits: the bits 0 and 1 (Think: 0 = F, 1.

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

Priority encoder. Overview Priority encoder- theoretic view Other implementations The chosen implementation- simulations Calculations and comparisons.

Combinational Circuits

CPEN Digital System Design

Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.

A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.

ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.

CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,

Explain Half Adder and Full Adder with Truth Table.

VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.

Array Multiplier Haibin Wang Qiong Wu. Outlines Background & Motivation Principles Implementation & Simulation Advantages & Disadvantages Conclusions.

Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Arithmetic Circuits (Part I) Randy H

Part III The Arithmetic/Logic Unit

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

Lecture 9 Digital VLSI System Design Laboratory

Description and Analysis of MULTIPLIERS using LAVA

Arithmetic Building Blocks

Computer Architecture

UNIVERSITY OF MASSACHUSETTS Dept

Presentation transcript:

A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo

1. Overview of the Reconfigurable Matrix Multiplier Architecture And The Circuit-Level Reconfiguration 2. Overview of the Implementation Circuits: Borrow Parallel Counters Main topics of the presentation:

1. The Reconfigurable Matrix Multiplier Architecture And The Circuit-Level Reconfiguration

(a) The 4x4 partial product matrix; (b) addition of the partial product bits; (c, d) multiplication of two 8-bit numbers using four 4x4 multipliers The Partial Product Decomposition-Based Arithmetic Architecture

Multiplying two 8-b numbers with four 4 x 4 multipliers

Pipelined multiplying X 2x2 and Y 2x2, producing Z 2x2

The size-8 base-4 reconfigurable matrix multiplier architecture

The size-16 base-4 reconfigurable matrix multiplier architecture

The mapping of partial product matrix and the 64 8x8 multipliers The square recursive partial product bit matrix decomposition

The reconfiguration switch states The input duplication network

The two of 4 matrix products which are produced in parallel, each with 16 output elements

3. The Implementation Circuits: Borrow Parallel Counters

The building block circuits: borrow parallel counters 5_1 borrow parallel counter

About the large parallel counter 5_1 Receiving 5 binary Input bits with 1 of them being weighted 2 (called borrow bit), and others weighted 1. Producing 2 output bits and 3 In-stage carry in and out bits), so that the weighted sums of all in bits and all out bits are equal. CMOS pass-transistor circuit processing 4-b 1-hot encoded signals, each representing an integer of value ranging 0 to 3.

(1) Low switching activity (2) Fewer hot lines (data paths) (3) Low transistor count (78; equivalent to 3.3 FA’s; 23 per FA)

(4) A very compact layout due to good transistor distribution and regularity (processing four data paths of the same structure; binary logic does not have the advantage; layout-simulation: Cadence Analog Affirma tools with Spectre simulator 0.18  m models )

The borrow bit (1) Simplify the logic, reduce the number of transistors (2) Reduce the number of pass transistors cascaded (no more than 4 including 1 within the input inverter) (3) Rearrange and balance input bits for small multipliers (see Topic 2)

No type-conversion needed -- major improvement from the previous work: The embedded full adder adding two 4-b 1-hot encoded bits (s0-column j+1, s1-column j) and 1 binary bit (q- column j-1) directly they have the same weight!

The typical simulation data Note: We use the best (3, 2) to the best of our knowledge; It’s meaningful to compare speeds in application

The 6 x 6-b borrow parallel multiplier An array of borrow parallel counters ( virtually eliminating all area needed for inter-counter connections ) Input: two 6-b numbers; output two numbers: p10 - p0 and q10 - q5 (note: first half (6 bits) is a single number) CSA style output, because it serves as an intermediate block) Inheriting all advantages of borrow parallel counters Delay = a single counter delay Height = a single counter height A unique property: extra compact with a near zero area for inter-counter connection The height of the block is very small ovals with the same color form an embedded FA (or HA or a binary bit) (3,2): 3 ovals (2,2): 2 ovals single bit: 1 oval

The 8 x 8 multiplier with 10 borrow counters.

Concluding Remarks 1. A reconfigurable matrix multiplier architecture has been presented (1) The processor can be run-time reconfigured to trade bitwidth for matrix size. (2) Efficiently reconfigured to compute the product of matrices X4x4 and Y4x4 for typical graphics and image applications (3) The hardware equivalent to one 64 x 64 bit high precision multiplier can provide four computation options (4) Minimized the common irregularity (5) Simplified the overall logic scheme and wiring structures

Concluding Remarks (cont’d) 2. New arithmetic circuits for implementation, which achieve low-power high-performance through a novel logic approach including: (1) 4-b 1-hot data paths are dominated (lower switching activity in each logic stage) (2) Fewer hot lines generated in logic process (power & leakage power) (3) Lower transistor count (4) Higher circuit regularity, lower layout complexity (5) Lower complexity of component interconnection

Concluding Remarks (cont’d) (6) Utilizing borrow bits for simple circuit and high speed, more importantly, reducing pass-transistor path length (no more than 4) and rearranging and balancing input bits to each column of small multipliers (7) Utilizing partial product bit matrix decomposition for full self-testability, achieving high observability and controllability for component circuits (small multipliers are exhaustively testable)