Datapath Designs CK Cheng CSE Department UC, San Diego.

Slides:

Advertisements

Similar presentations

Division & Divisibility. a divides b if a is not zero there is a m such that a.m = b a is a factor of b b is a multiple of a a|b Division.

Advertisements

Function Evaluation Using Tables and Small Multipliers CS252A, Spring 2005 Jason Fong.

ECE 645 – Computer Arithmetic Lecture 11: Advanced Topics and Final Review ECE 645—Computer Arithmetic 4/22/08.

Distributed Arithmetic

Altera FLEX 10K technology in Real Time Application.

Arithmetic Intro Computer Organization 1 Computer Science Dept Va Tech February 2008 © McQuain Algorithm for Integer Division The natural (by-hand)

Integer division Pencil and paper binary division (dividend)(divisor) 1000.

Lecture Objectives: 1)Perform binary division of two numbers. 2)Define dividend, divisor, quotient, and remainder. 3)Explain how division is accomplished.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.

EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.

1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.

361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.

EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Lecture 4: Adders.

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

Programmable logic and FPGA

1 CS 140 Lecture 18 Sequential Modules: Serial Adders, Multipliers Professor CK Cheng CSE Dept. UC San Diego.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 8: Division.

Distributed Arithmetic: Implementations and Applications

A Parameterized Floating Point Library Applied to Multispectral Image Clustering Xiaojun Wang Dr. Miriam Leeser Rapid Prototyping Laboratory Northeastern.

Lecture 17: Adders.

Physical Implementation 1)Manufactured Integrated Circuit (IC) Technologies 2)Programmable IC Technology 3)Other Technologies Manufactured IC Technologies.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Winter 2004 Lecture 7.

Computer Arithmetic Integers: signed / unsigned (can overflow) Fixed point (can overflow) Floating point (can overflow, underflow) (Boolean / Character)

Parallel Prefix Adders A Case Study

GPGPU platforms GP - General Purpose computation using GPU

EXAMPLE 1 Use polynomial long division

Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.

A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

1 Design Space Exploration for Power-Efficient Mixed-Radix Ling Adders Chung-Kuan Cheng Computer Science and Engineering Depart. University of California,

SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.

Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.

Lecture 6: Multiply, Shift, and Divide

Jianhua Liu1, Yi Zhu1, Haikun Zhu1, John Lillis2, Chung-Kuan Cheng1

Partial Quotient Method In this division algorithm the children record on the right side of the problem. The first thing they do is divide. They ask themselves.

Csci 136 Computer Architecture II – Multiplication and Division

Mohamed Younis CMCS 411, Computer Architecture 1 CMSC Computer Architecture Lecture 11 Performing Division March 5,

5.5: Apply Remainder and Factor Theorems (Dividing Polynomials) Learning Target: Learn to complete polynomial division using polynomial long division and.

Adding the Superset Adder to the DesignWare IP Library

Department of Communication Engineering, NCTU 1 Unit 4 Arithmetic and Logic Units.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

CDA 3101 Spring 2016 Introduction to Computer Organization

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 7 Division.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.

1 Lecture 5Multiplication and Division ECE 0142 Computer Organization.

More Binary Arithmetic - Multiplication

Fang Fang James C. Hoe Markus Püschel Smarahara Misra

MIPS mul/div instructions

CSE 140 Lecture 12 Combinational Standard Modules

FPGAs in AWS and First Use Cases, Kees Vissers

Arithmetic and Logic Units

CDA 3101 Summer 2007 Introduction to Computer Organization

UNIVERSITY OF MASSACHUSETTS Dept

Digital Systems Section 14 Registers. Digital Systems Section 14 Registers.

CSE 140 Lecture 12 Combinational Standard Modules

CSCI206 - Computer Organization & Programming

Topic 3c Integer Multiply and Divide

Partial Quotients Help Students build on multiplies of ten and find easy multiples of the divisor.

CS 140 Lecture 15 Sequential Modules

Reading: Study Chapter (including Booth coding)

Partial Quotients Help Students build on multiplies of ten and find easy multiples of the divisor.

Montek Singh Mon, Mar 28, 2011 Lecture 11

Multiplying and Dividing Integers

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

1 Lecture 5Multiplication and Division ECE 0142 Computer Organization.

Presentation transcript:

Datapath Designs CK Cheng CSE Department UC, San Diego

Prefix Adder – Well-known and Well-developed? Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.

Prefix Adder – New Respects, New Method Realistic design considerations: Timing, Power and Area. Integer Linear Programming for prefix adder: –Logic effort timing model (gate cap. + wire cap.) –Activity-statistic power model –Non-uniform signal arrival/required times Logic Levels Max FanoutsMax Wire Tracks Timing PowerArea

Prefix Adder – Optimum Prefix adders Uniform signal arrival/required times Sklansky AdderKogge-Stone Adder Fastest depth-4 optimal prefix adder Fastest depth-3 optimal prefix adder

Prefix Adder – Optimum Prefix adders Uniform signal arrival/required times

Prefix Adder – Optimum Prefix adders Non-uniform signal arrival/required times Increasing Signal Arrival TimesDecreasing Signal Arrival TimesConvex Signal Arrival Times

Division – Iteration effort Pencil and paper method: (A=Q  B+2 -n R and R<B) 1 bit partial quotient per iteration, n iterations A = , B = ; Q = A / B. Q = Q i : Partial Quotient R i : Partial Remainder R i+1 = R i – B  Q i R0=AR0=A R2R R3R R4R R1R1 Q 1 = 0.1 Q 2 = 0.01 Q 3 = Q 4 =

Division – Memory effort Lookup table is the simplest way to obtain multiple partial quotient bits in each iteration. SRT method: a lookup tables stores m-bit partial quotients decided by m bits of partial remainder and m bits of divisor. Table size: 2 2m  m STR method is limited by memory wall.

Division – Arithmetic effort Partial quotient is calculated by arithmetic functions. Prescaling: Taylor expansion: Series expansion:

Division – Solution space Modern FPGAs contains plenty of memory and build-in multipliers, which enable high performance divider. Iteration Effort Memory Effort Arithmetic Effort Memory Wall Pencil-and-paper SRT Prescaling Taylor Expansion Low area Series Expansion Low latency Our target

Division – PST algorithm Utilize the power of series expansion, but need a good start point. Prescaling provide a scaled divisor close to 1. 0-order Taylor expansion iterates to reach the final quotient

Division – PST algorithm E 0 = Table (B (m) )  1/B A 1 = A  E 0 ; B 1 = B  E 0 E 1 = (2  B 1 )  INV(B 1 (2m) ) Q i = R i-1  E 1 R i = R i-1  Q i  B 1 Q = Q + Q i A = ,0110 B = ,1011 B (m) =  E 0 = E 1 = INV(B 1 (2m) ) = ,1110 A 1 = A  E 0 = ,1000,0010 B 1 = B  E 0 = ,0001,0001 Q 1 = A 1  E 1 = ,0011 R 1 = B 1 – Q 1  B 1 = ,0010,0101,1110,1101 Q 2 = R 1  E 1 = ,1111 R 2 = R 1 – Q 2  B 1 = ,0001,1111,1011,0001 Q = , ,0010,0111,11 = ,0101,0111,11

Division – FPGA Implementation PST algorithm is suitable for high- performance division unit design in FPGAs Fmax (Period) ALUT s Memor y Bits DSP Blocks Power Consumption (Dynamic+Static) Throughput IP Core (no DSP) 50.16M Hz (19.935n s) mW (52mW+329mW) 50.16Mdiv/s PST (DSP) 72.8MHz (13.737n s) mW (23mW+327mW) 24.3Mdiv/s PST (no DSP) 73.20M Hz (13.661n s) mW (50mW+328mW) 24.4Mdiv/s PST-pipelined (DSP) 74.15M Hz (13.486n s) mW (17mW+327mW) 74.15Mdiv/s PSTp (no DSP) 76.05M Hz (13.150n s) mW (31mW+328mW) 76.05Mdiv/s 32-bit division with 5-cycle latency