Grade School Again: A Parallel Perspective Great Theoretical Ideas In Computer Science Steven RudichCS 15-251 Spring 2004 Lecture 17Mar 16, 2004Carnegie.

Slides:



Advertisements
Similar presentations
Grade School Revisited: How To Multiply Two Numbers Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture.
Advertisements

UNIVERSITY OF MASSACHUSETTS Dept
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
VLSI Arithmetic. Multiplication A = a n-1 a n-2 … a 1 a 0 B = b n-1 b n-2 … b 1 b 0  eg)  Shift.
Chapter 6 Arithmetic. Addition Carry in Carry out
L10 – Multiplication Division 1 Comp 411 – Fall /19/2009 Binary Multipliers ×
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Grade School Revisited: How To Multiply Two Numbers Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2004 Lecture 16March 4,
On Time Versus Input Size Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2004 Lecture 15March 2, 2004Carnegie Mellon University.
DIGITAL SYSTEMS TCE1111 Representation and Arithmetic Operations with Signed Numbers Week 6 and 7 (Lecture 1 of 2)
Computer ArchitectureFall 2007 © August 29, 2007 Karem Sakallah CS 447 – Computer Architecture.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Building Adders & Sub tractors Dr Ahmed Telba. Introducing adder circuits Adder circuits are essential inside microprocessors as part of the ALU, or arithmetic.
Great Theoretical Ideas in Computer Science.
Grade School Revisited: How To Multiply Two Numbers Great Theoretical Ideas In Computer Science S. Rudich V. Adamchik CS Spring 2006 Lecture 18March.
1 Arithmetic and Logical Operations - Part II. Unsigned Numbers Addition in unsigned numbers is the same regardless of the base. Given a pair of bit sequences.
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Computer Arithmetic Nizamettin AYDIN
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Binary Arithmetic Stephen Boyd March 14, Two's Complement Most significant bit represents sign. 0 = positive 1 = negative Positive numbers behave.
1 CHAPTER 4: PART I ARITHMETIC FOR COMPUTERS. 2 The MIPS ALU We’ll be working with the MIPS instruction set architecture –similar to other architectures.
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers citation and following credit line is included:
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
1 CMSC 250 Chapter 1, con't., Combinatorial circuits.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
CSE 241 Computer Organization Lecture # 9 Ch. 4 Computer Arithmetic Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Digital Electronics and Computer Interfacing
Computing Systems Designing a basic ALU.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
COMPSCI 102 Introduction to Discrete Mathematics.
Half Adder & Full Adder Patrick Marshall. Intro Adding binary digits Half adder Full adder Parallel adder (ripple carry) Arithmetic overflow.
COMP541 Arithmetic Circuits
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale.
COMP541 Arithmetic Circuits
CO5023 Introduction to Digital Circuits. What do we know so far? How to represent numbers (integers and floating point) in binary How to convert between.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
1 Lecture 14 Binary Adders and Subtractors. 2 Overview °Addition and subtraction of binary data is fundamental Need to determine hardware implementation.
CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
1 Ethics of Computing MONT 113G, Spring 2012 Session 4 Binary Addition.
Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.
Integer Operations Computer Organization and Assembly Language: Module 5.
Grade School Again: A Parallel Perspective CS Lecture 7.
Grade School Revisited: How To Multiply Two Numbers CS Lecture 4 2 X 2 = 5.
Integer Multiplication and Division ICS 233 Computer Architecture & Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering.
President UniversityErwin SitompulDigital Systems 7/1 Lecture 7 Digital Systems Dr.-Ing. Erwin Sitompul President University
1. A Parallel Perspective 2. Compression Great Theoretical Ideas In Computer Science S.Rudich V.Adamchik CS Spring 2006 Lecture 20Mar 30, 2006Carnegie.
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Unary, Binary, and Beyond Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2003 Lecture 2Jan 16, 2003Carnegie Mellon University.
William Stallings Computer Organization and Architecture 8th Edition
UNIVERSITY OF MASSACHUSETTS Dept
Great Theoretical Ideas in Computer Science
University of Gujrat Department of Computer Science
Grade School Revisited: How To Multiply Two Numbers
CISC101 Reminders Your group name in onQ is your grader’s name.
Logic Gates.
Grade School Revisited: How To Multiply Two Numbers
Montek Singh Mon, Mar 28, 2011 Lecture 11
XOR Function Logic Symbol  Description  Truth Table 
Introduction to Discrete Mathematics
Presentation transcript:

Grade School Again: A Parallel Perspective Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2004 Lecture 17Mar 16, 2004Carnegie Mellon University

Plus/Minus Binary (Extended Binary) Base 2: Each digit can be -1, 0, 1, Example: = = 1

One weight for each power of 3. Left = “negative”. Right = “positive”

How to add 2 n-bit numbers. ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** ** ** ** ** *** *** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** ** ** *** *** **** **** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** *** *** **** **** **** **** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** *** *** **** **** **** **** **** **** ** ** ** ** ** ** ** ** +

*** *** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** * * * * + * ** *

How to add 2 n-bit numbers. *** *** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** * * * * + * ** * Let k be the maximum time that it takes you to do Time < kn is proportional to n

The time grow linearly with input size. # of bits in numbers timetime kn

If n people agree to help you add two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.

Is it possible to add two n bit numbers in less than linear parallel-time? Darn those carries.

Fast parallel addition is no obvious in usual binary. But it is amazingly direct in Extended Binary!

Extended binary means base 2 allowing digits to be from {-1, 0, 1}. We can call each digit a “trit”.

n people can add 2, n-trit, plus/minus binary numbers in constant time!

An Addition Party to Add to

An Addition Party Invite n people to add two n-trit numbers Assign one person to each trit position

An Addition Party Each person should add the two input trits in their possession. Problem: 2 and -2 are not allowed in the final answer.

Pass Left If you have a 1 or a 2 subtract 2 from yourself and pass a 1 to the left. (Nobody keeps more than 0) Add in anything that is given to you from the right. (Nobody has more than a 1) 1

After passing left There will never again be any 2s as everyone had at most 0 and received at most 1 more

Passing left again If you have a -1 or -2 add 2 to yourself and pass a -1 to the left (Nobody keeps less than 0)

After passing left again No -2s anymore either. Everyone kept at least 0 and received At most -1.

=

Caution: Parties and Algorithms Do not Mix

Is there a fast parallel way to convert an Extended Binary number into a standard binary number?

Not obvious: Sub-linear addition in standard Binary. Sub-linear EB to Binary

Let’s reexamine grade school addition from the view of a computer circuit.

Grade School Addition   +  

Grade School Addition a4a3a2a1a0b4b3b2b1b0a4a3a2a1a0b4b3b2b1b0 + c5c4c3c2c1 c5c4c3c2c1

a 4 a 3 a 2 a 1 a 0 b 4 b 3 b 2 b 1 b 0 + c 5 c 4 c 3 c 2 c 1 s1s1

Ripple-carry adder a i b i cici c i+1 sisi 0 cncn a i b i cici sisi a0b0a0b0 a 1 b 1 s0s0 c1c1 … s1s1 … a n-1 b n-1 s n-1

Logical representation of binary: 0 = false, 1 = true s 1 = (a 1 XOR b 1 ) XOR c 1 c 2 = (a 1 AND b 1 ) OR (a 1 AND c 1 ) OR (b 1 AND c 1 ) a i b i cici c i+1 sisi

OR AND XOR cici bibi aiai OR c i+1 sisi a i b i cici c i+1 sisi

0 How long to add two n bit numbers? Propagation time through the circuit will be  (n) Ripple-carry adder

Circuits compute things in parallel. We can think of the propagation delay as PARALLEL TIME.

Is it possible to add two n bit numbers in less than linear parallel-time?

I suppose the EB addition algorithm could be helpful somehow.

Plus/minus binary means base 2 allowing digits to be from {- 1, 0, 1}. We can call each digit a “trit”.

n people can add 2, n-trit, plus/minus binary numbers in constant time!

Can we still do addition quickly in the standard representation?

Yes, but first a neat idea… Instead of adding two numbers to make one number, let’s think about adding 3 numbers to make 2 numbers.

Carry-Save Addition The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!  + +

XOR Carries Carry-Save Addition The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!  + +   +

Cool! So if we if represent x as a+b, and y as c+d, then can add x,y using two of these (this is basically the same as that plus/minus binary thing). (a+b+c)+d=(e+f)+d=g+h

Even in standard representation, this is really useful for multiplication.

Grade School Multiplication X * * * * * * * * * * * * * * * * * * * * * * * * * * * *

We need to add n 2n-bit numbers: a 1, a 2, a 3,…, a n

A tree of carry-save adders a1a1 anan + Add the last two

A tree of carry-save adders T(n)  log 3/2 (n) + [last step] + Add the last two

So let’s go back to the problem of adding two numbers. In particular, if we can add two numbers in O(log n) parallel time, then we can multiply in O(log n) parallel time too!

If we knew the carries it would be very easy to do fast parallel addition 0

What do we know about the carry- out before we know the carry-in? a b c in c out s abC out C in

What do we know about the carry- out before we know the carry-in? a b c in c out s abC out  10  111

Hey, this is just a function of a and b. We can do this in parallel. abC out  10  111

Idea #1: do this calculation first.   +               This takes just one step!

Idea #1: do this calculation first.   +               Also, once we have the carries, it only takes one step more: s i = (a i XOR b i ) XOR c i

So, everything boils down to: can we find a fast parallel way to convert each position to its final 0/1 value?            Called the “parallel prefix problem”

So, we need to do this quickly….           

01   01    (  (   (  (  (    Can think of            as all partial results in: for the operator : Idea #2:  x = x 1 x = 1 0 x = 0

 (  (  (     (   ) (   )        And, the operator is associative. Idea #2 (cont):           

Just using the fact that we have an Associative, Binary Operator Binary Operator: an operation that takes two objects and returns a third. A  B = C Associative: ( A  B )  C = A  ( B  C )

Examples Addition on the integers Min(a,b) Max(a,b) Left(a,b) = a Right(a,b) = b Boolean AND Boolean OR

In what we are about to do “+” will mean an arbitrary binary associative operator.

Prefix Sum Problem Input: X n-1, X n-2,…,X 1, X 0 Output:Y n-1, Y n-2,…,Y 1, Y 0 where Y 0 = X 0 Y 1 = X 0 + X 1 Y 2 = X 0 + X 1 + X 2 Y 3 = X 0 + X 1 + X 2 + X 3 Y n-1 = X 0 + X 1 + X 2 + X 3 + … + X n

Prefix Sum example Input: 6, 9, 2, 3, 4, 7 Output:31, 25, 16, 14, 11, 7 where Y 0 = X 0 Y 1 = X 0 + X 1 Y 2 = X 0 + X 1 + X 2 Y 3 = X 0 + X 1 + X 2 + X 3 Y n-1 = X 0 + X 1 + X 2 + X 3 + … + X n

Example circuitry (n = 4) + X1X1 X0X0 y1y1 X0X0 X2X2 + X1X1 + y2y2 X0X0 y0y0 X3X X2X2 X1X1 X0X0 y3y3

y n-1 + Divide, conquer, and glue for computing y n-1 X  n/2  -1 … X1X1 X0X0 sum on  n/2  items X n-1 …X  n/2  X n-2 sum on  n/2  items T(1)=0 T(n) = T(  n/2  ) + 1 T(n) =  log 2 n 

Modern computers do something slightly different. This algorithm is fast, but how many components does it use?

y n-1 + Size of Circuit (number of gates) X  n/2  -1 … X1X1 X0X0 Sum on  n/2  items X n-1 …X  n/2  X n-2 Sum on  n/2  items S(1)=0 S(n) = S(  n/2  ) + S(  n/2  ) +1 S(n) = n-1

Sum of Sizes X0X0 y0y0 + X1X1 X0X0 y1y1 X0X0 X2X2 + X1X1 + y2y2 X3X X2X2 X1X1 X0X0 y3y3 S(n) =  + (n-1) = n(n-1)/2

Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 …

Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … … Prefix sum on n/2 items

+ Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … +++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … …

Parallel time complexity T(1)=0; T(2) = 1; T(n) = T(n/2) + 2 T(n) = 2 log 2 (n) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … + ++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … … 1 T(n/2) 1

Size S(1)=0; S(n) = S(n/2) + n - 1 S(n) = 2n – log 2 n -2 + If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … + ++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … … n/2 S(n/2) (n/2)-1

Putting it all together: Carry Look- Ahead Addition To add two n-bit numbers: a and b 1 step to compute x values (   ) 2 log 2 n - 1 steps to compute carries c 1 step to compute c XOR (a XOR b) 2 log 2 n + 1 steps total

T(n)  log 3/2 (n) + 2log 2 2n carry look ahead Putting it all together: multiplication

For a 64-bit word that works out to a parallel time of 22 for multiplication, and 13 for addition.

Addition requires linear time sequentially, but has a practical 2log2n + 1 parallel algorithm.

Multiplication (which is a lot harder sequentially) also has an O(log n) time parallel algorithm.

And this is how addition works on commercial chips….. Processorn2log 2 n Pentium3211 Alpha6413

In order to handle integer addition/subtraction we use 2’s compliment representation, e.g., -44=

Addition of two numbers works the same way (assuming no overflow)

To negate a number, flip each of its bits and add

x + flip(x) = -1. So, -x = flip(x)

Most computers use two’s compliment representation to perform integer addition and subtraction.

Grade School Division * * * * * * * * * * * * * * * * * * * * * * * * * * * * * n bits of precision: n subtractions costing 2log 2 n + 1 each  (n logn)

Let’s see if we can reduce to O(n) by being clever about it.

Idea: internally, allow ourselves to “go negative” using trits so we can do constant-time subtraction. Then convert back at the end. (technically, called “extended binary”)

SRT division algorithm r 6 Rule: Each bit of quotient is determined by comparing first bit of divisor with first bit of dividend. Easy! -1 0 – –1 1 1 = – = – –1 – r –1 – r -5 Time for n bits of precision in result:  3n + 2log 2 (n) + 1 Convert to standard representation by subtracting negative bits from positive. 1 addition per bit

Intel Pentium division error The Pentium uses essentially the same algorithm, but computes more than one bit of the result in each step. Several leading bits of the divisor and quotient are examined at each step, and the difference is looked up in a table. The table had several bad entries. Ultimately Intel offered to replace any defective chip, estimating their loss at $475 million.

If millions of processors, how much of a speed-up might I get over a single processor?

Brent’s Law At best, p processors will give you a factor of p speedup over the time it takes on a single processor.

The traditional GCD algorithm will take linear time to operate on two n bit numbers. Can it be done faster in parallel?

If n 2 people agree to help you compute the GCD of two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.

Is GCD inherently sequential?

No one knows.

Deep Blue Many processors help DB look many chess moves ahead. It was not obvious that more processors could really help with game tree search. I remember a heated debate in C.B.’s Ph.D. thesis defense.