15-251 Great Theoretical Ideas in Computer Science.

Slides:



Advertisements
Similar presentations
(CSC 102) Discrete Structures Lecture 5.
Advertisements

Grade School Again: A Parallel Perspective Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2004 Lecture 17Mar 16, 2004Carnegie.
Grade School Revisited: How To Multiply Two Numbers Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture.
Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.
Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Invitation to Computer Science, C++ Version, Third Edition.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
VLSI Arithmetic. Multiplication A = a n-1 a n-2 … a 1 a 0 B = b n-1 b n-2 … b 1 b 0  eg)  Shift.
Chapter 6 Arithmetic. Addition Carry in Carry out
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Lecture 8 Arithmetic Logic Circuits
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
1 COMP541 Arithmetic Circuits Montek Singh Mar 20, 2007.
Grade School Revisited: How To Multiply Two Numbers Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2004 Lecture 16March 4,
ECE 301 – Digital Electronics
Computer ArchitectureFall 2007 © August 29, 2007 Karem Sakallah CS 447 – Computer Architecture.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 4 – Arithmetic Functions Logic and Computer.
1 Arithmetic and Logical Operations - Part II. Unsigned Numbers Addition in unsigned numbers is the same regardless of the base. Given a pair of bit sequences.
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Binary Arithmetic Stephen Boyd March 14, Two's Complement Most significant bit represents sign. 0 = positive 1 = negative Positive numbers behave.
1 CHAPTER 4: PART I ARITHMETIC FOR COMPUTERS. 2 The MIPS ALU We’ll be working with the MIPS instruction set architecture –similar to other architectures.
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers citation and following credit line is included:
Basic Arithmetic (adding and subtracting)
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
Topic: Arithmetic Circuits Course: Digital Systems Slide no. 1 Chapter # 5: Arithmetic Circuits.
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
Digital Logic Computer Organization 1 © McQuain Logic Design Goal:to become literate in most common concepts and terminology of digital.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 2: The Logic of Compound Statements 2.5 Application: Number Systems and Circuits for Addition 1 Counting in binary is just like counting in decimal.
Computing Systems Designing a basic ALU.
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Digital Logic. 2 Abstractions in CS (gates) Basic Gate: Inverter IO IO GNDI O Vcc Resister (limits conductivity) Truth Table.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
Half Adder & Full Adder Patrick Marshall. Intro Adding binary digits Half adder Full adder Parallel adder (ripple carry) Arithmetic overflow.
1 Lecture 12 Time/space trade offs Adders. 2 Time vs. speed: Linear chain 8-input OR function with 2-input gates Gates: 7 Max delay: 7.
1 Arithmetic I Instructor: Mozafar Bag-Mohammadi Ilam University.
COMP541 Arithmetic Circuits
Sep 29, 2004Subtraction (lvk)1 Negative Numbers and Subtraction The adders we designed can add only non-negative numbers – If we can represent negative.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
COMP541 Arithmetic Circuits
Addition, Subtraction, Logic Operations and ALU Design
1 Fundamentals of Computer Science Combinational Circuits.
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?
1 Lecture 14 Binary Adders and Subtractors. 2 Overview °Addition and subtraction of binary data is fundamental Need to determine hardware implementation.
CSE 8351 Computer Arithmetic Fall 2005 Instructors: Peter-Michael Seidel.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Integer Operations Computer Organization and Assembly Language: Module 5.
Grade School Again: A Parallel Perspective CS Lecture 7.
Grade School Revisited: How To Multiply Two Numbers CS Lecture 4 2 X 2 = 5.
1. A Parallel Perspective 2. Compression Great Theoretical Ideas In Computer Science S.Rudich V.Adamchik CS Spring 2006 Lecture 20Mar 30, 2006Carnegie.
Arithmetic Circuits I. 2 Iterative Combinational Circuits Like a hierachy, except functional blocks per bit.
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Unary, Binary, and Beyond Great Theoretical Ideas In Computer Science Steven RudichCS Spring 2003 Lecture 2Jan 16, 2003Carnegie Mellon University.
CS2100 Computer Organisation
Great Theoretical Ideas in Computer Science
CISC101 Reminders Your group name in onQ is your grader’s name.
Subtraction The arithmetic we did so far was limited to unsigned (positive) integers. Today we’ll consider negative numbers and subtraction. The main problem.
Addition and multiplication
ECE 352 Digital System Fundamentals
Presentation transcript:

Great Theoretical Ideas in Computer Science

Grade School Again: A Parallel Perspective

a question If a man can plough a field in 25 days, how long does it take for 5 men to plough the same field? 5 days

a similar question If a processor can add two n-bit numbers in n microseconds, how long does it take for n processors to add together two n-bit numbers? hmm…

Warming up thinking about parallelism

Dot products a = ( ) b = ( ) Dot product of a and b a  b = (-3) + (-2) = 10 Also called “inner product”. In general, a  b =

Dot products If we can add/multiply two numbers in time C, how long does it take to compute dot products for n-length vectors? n multiplications n-1 additions hence, C*(2n-1) time. simplifying assumption for now

Dot products What if n people decided to compute dot products and they worked in parallel? Modeling decision: what are people allowed to do in parallel? Assume they have shared memory Can read same location in memory in parallel Each location in memory can be written to by only one person at a time. Can write to different locations in memory simultaneously

Parallel dot products What if n people decided to compute dot products and they worked in parallel? All the pairwise products can be computed in parallel! (1 unit of time) How to add these n products up fast?

Binary tree

Parallel dot products What if n people decided to compute dot products and they worked in parallel? All the pairwise products can be computed in parallel! (1 unit of time) How to add these n products up fast? Can add these numbers up in  log 2 n  rounds Hence dot products take  log 2 n  +1 time in parallel.

Not enough people? What if there were fewer than n people?

Another example: Matrix-vector multiplications Suppose we were given a m*n matrix A and a n-length vector x How much time does it take to compute Ax?

How much time in parallel? Since just m dot product computations and all of them can be done in parallel So if we had m*n people, we could compute the solution to Ax in O(log 2 n) time

Back to our question… If a single processor can add two n-bit numbers in n microseconds, how long does it take n processors to add together two n-bit numbers?

How to add 2 n-bit numbers. ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** ** ** ** ** *** *** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** ** ** *** *** **** **** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** ** ** *** *** **** **** **** **** ** ** ** ** ** ** ** ** +

*** *** ** ** ** ** *** *** **** **** **** **** **** **** ** ** ** ** ** ** ** ** +

*** *** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** * * * * + * ** *

How to add 2 n-bit numbers. *** *** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** * * * * + * ** * Let k be the maximum time that it takes you to do Time = kn is proportional to n

The time grow linearly with input size. # of bits in numbers timetime kn

If n people agree to help you add two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.

Is it possible to add two n bit numbers in less than linear parallel-time? Darn those carries.

Plus/Minus Binary (Extended Binary) Base 2: Each digit can be -1, 0, 1, Example: = = 1 Not a unique representation system

Fast parallel addition is no obvious in usual binary. But it is amazingly direct in Extended Binary!

Extended binary means base 2 allowing digits to be from {-1, 0, 1}. We can call each digit a “trit”.

n people can add two n-trit plus/minus binary numbers in constant time!

An Addition Party to Add to

An Addition Party Invite n people to add two n-trit numbers Assign one person to each trit position

An Addition Party Each person should add the two input trits in their possession. Problem: 2 and -2 are not allowed in the final answer.

Pass Left If you have a 1 or a 2 subtract 2 from yourself and pass a 1 to the left. (Nobody keeps more than 0) Add in anything that is given to you from the right. (Nobody has more than a 1) 1

After passing left There will never again be any 2s as everyone had at most 0 and received at most 1 more

Passing left again If you have a -1 or -2 add 2 to yourself and pass a -1 to the left (Nobody keeps less than 0)

After passing left again No -2s anymore either. Everyone kept at least 0 and received at most -1.

=

Strategy To add two n-bit binary numbers Consider them to be in extended binary (EB) no work required! Sum them up to get an answer in EB. constant parallel time! Then convert them back to answer in binary how do we do this fast in parallel???

Is there a fast parallel way to convert an Extended Binary number into a standard binary number?

Both problems not quite obvious: Sub-linear time addition in standard Binary. Sub-linear time EB to Binary

Let’s reexamine grade school addition from the view of a computer circuit.

Grade School Addition   +  

Grade School Addition a 4 a 3 a 2 a 1 a 0 b 4 b 3 b 2 b 1 b 0 + c 5 c 4 c 3 c 2 c 1

Grade School Addition a 4 a 3 a 2 a 1 a 0 b 4 b 3 b 2 b 1 b 0 + c 5 c 4 c 3 c 2 c 1 s1s1

Ripple-carry adder a i b i cici c i+1 sisi 0 cncn a i b i cici sisi a 0 b 0 a 1 b 1 s0s0 c1c1 … s1s1 … a n-1 b n-1 s n-1

Logical representation of binary: 0 = false, 1 = true s 1 = (a 1 XOR b 1 ) XOR c 1 c 2 = (a 1 AND b 1 ) OR (a 1 AND c 1 ) OR (b 1 AND c 1 ) a i b i cici c i+1 sisi

Logical representation of binary: 0 = false, 1 = true a i b i cici c i+1 sisi

OR AND XOR cici bibi aiai OR c i+1 sisi a i b i cici c i+1 sisi

0 How long to add two n bit numbers? Propagation time through the circuit will be  (n) Ripple-carry adder

Circuits compute things in parallel. We can think of the propagation delay as PARALLEL TIME.

Is it possible to add two n bit numbers in less than linear parallel-time?

I suppose the EB addition algorithm could be helpful somehow.

Plus/minus binary means base 2 allowing digits to be from {-1, 0, 1}. We can call each digit a “trit”.

n people can add 2, n-trit, plus/minus binary numbers in constant time!

Can we still do addition quickly in the standard representation?

Yes, but first a neat idea… Instead of adding two numbers together to make one number, let’s think about adding 3 numbers to make 2 numbers.

Carry-Save Addition The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!  + +

XOR Carries Carry-Save Addition The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!  + +   +

Cool! So if we if represent x as a+b, and y as c+d, then can add x,y using two of these (this is basically the same as that extended binary thing). (a+b+c)+d=(e+f)+d=g+h

An aside: Even in standard representation, this is really useful for multiplication.

Grade School Multiplication X * * * * * * * * * * * * * * * * * * * * * * * * * * * *

We need to add n 2n-bit numbers: a 1, a 2, a 3,…, a n

A tree of carry-save adders a1a1 anan + Add the last two

A tree of carry-save adders T(n)  log 3/2 (n) + [last step] + Add the last two

So let’s go back to the problem of adding two numbers. In particular, if we can add two numbers in O(log n) parallel time, then we can multiply in O(log n) parallel time too!

If we knew the carries it would be very easy to do fast parallel addition 0

What do we know about the carry-out before we know the carry-in? a b c in c out s abC out

What do we know about the carry-out before we know the carry-in? a b c in c out s abC out C in

What do we know about the carry-out before we know the carry-in? a b c in c out s abC out  10  111

Hey, this is just a function of a and b. We can do this in parallel. abC out  10  111

Idea #1: do this calculation first.   +                This takes just one step!

Idea #1: do this calculation first. Also, once we actually have the carries, it will only takes one step more: s i = (a i XOR b i ) XOR c i   +               

           Called the “parallel prefix problem” But we only have the carries in this peculiar notation! How fast can we convert from this notation back to the standard notation?  

01   01    (  (   (  (  (    Can think of            as all partial results in: for the operator : Idea #2:  x = x 1 x = 1 0 x = 0

 (  (  (     (   ) (   )        And, the operator is associative. Idea #2 (cont):           

Just using the fact that we have an Associative, Binary Operator Binary Operator: an operation that takes two objects and returns a third. A  B = C Associative: ( A  B )  C = A  ( B  C )

Examples of binary associative operators Addition on the integers Min(a,b) Max(a,b) Left(a,b) = a Right(a,b) = b Boolean AND Boolean OR

In what we are about to do “+” will mean an arbitrary binary associative operator.

Prefix Sum Problem Input: X n-1, X n-2,…,X 1, X 0 Output:Y n-1, Y n-2,…,Y 1, Y 0 where Y 0 = X 0 Y 1 = X 0 + X 1 Y 2 = X 0 + X 1 + X 2 Y 3 = X 0 + X 1 + X 2 + X 3 Y n-1 = X 0 + X 1 + X 2 + X 3 + … + X n

Prefix Sum example when + = addition Input: 6, 9, 2, 3, 4, 7 Output:31, 25, 16, 14, 11, 7 where Y 0 = X 0 Y 1 = X 0 + X 1 Y 2 = X 0 + X 1 + X 2 Y 3 = X 0 + X 1 + X 2 + X 3 Y n-1 = X 0 + X 1 + X 2 + X 3 + … + X n

Example circuitry (n = 4) + X1X1 X0X0 y1y1 X0X0 X2X2 + X1X1 + y2y2 X0X0 y0y0 X3X X2X2 X1X1 X0X0 y3y3

y n-1 + Divide, conquer, and glue for computing y n-1 X  n/2  -1 … X1X1 X0X0 sum on  n/2  items X n-1 …X  n/2  X n-2 sum on  n/2  items T(1)=0 T(n) = T(  n/2  ) + 1 T(n) =  log 2 n 

Slightly more fancy construction coming up…

The above construction had small parallel run-time But it used a lot of addition gates Let’s calculate how many we used… + X1X1 X0X0 y1y1

y n-1 + Size of Circuit (number of gates) X  n/2  -1 … X1X1 X0X0 Sum on  n/2  items X n-1 …X  n/2  X n-2 Sum on  n/2  items S(1)=0 S(n) = S(  n/2  ) + S(  n/2  ) +1 S(n) = n-1

Sum of Sizes X0X0 y0y0 + X1X1 X0X0 y1y1 X0X0 X2X2 + X1X1 + y2y2 X3X X2X2 X1X1 X0X0 y3y3 S(n) =  + (n-1) = n(n-1)/2

Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 …

Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … … Prefix sum on n/2 items

+ Recursive Algorithm n items (n = power of 2) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … +++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … …

Parallel time complexity T(1)=0; T(2) = 1; T(n) = T(n/2) + 2 T(n) = 2 log 2 (n) If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … + ++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … … 1 T(n/2) 1

Size S(1)=0; S(n) = S(n/2) + n - 1 S(n) = 2n – log 2 n -2 + If n = 1, Y 0 = X 0 ; + X3X3 X2X2 + X1X1 X0X0 + X5X5 X4X4 + X n-3 X n-4 + X n-1 X n-2 … + ++ Prefix sum on n/2 items Y n-1 Y1Y1 Y0Y0 Y2Y2 Y n-2 Y4Y4 Y3Y3 … … n/2 S(n/2) (n/2)-1

End of fancier construction

To add two n-bit numbers: a and b  Compute carry in the peculiar notation   Convert it to carry in the standard notation The sum is carry XOR (a XOR b) Putting it all together: Carry Look-Ahead Addition

To add two n-bit numbers: a and b  1 step to compute x values (   ) 2 log 2 n - 1 steps to compute carries c 1 step to compute c XOR (a XOR b) 2 log 2 n + 1 parallel steps total

T(n)  log 3/2 (n) + 2log 2 2n carry look ahead Putting it all together: multiplication

For a 64-bit word that works out to a parallel time of 22 for multiplication, and 13 for addition.

Addition requires linear time sequentially, but has a practical 2log2n + 1 parallel algorithm.

Multiplication (which is a lot harder sequentially) also has an O(log n) time parallel algorithm.

And this is how addition works on commercial chips….. Processorn2log 2 n Pentium3211 Alpha6413

In order to handle integer addition/subtraction we use 2’s compliment representation, e.g., -44=

Addition of two numbers works the same way (assuming no overflow)

To negate a number, flip each of its bits and add

x + flip(x) = -1. So, -x = flip(x)

Most computers use two’s compliment representation to perform integer addition and subtraction.

If millions of processors, how much of a speed-up might I get over a single processor?

Brent’s Law At best, p processors will give you a factor of p speedup over the time it takes on a single processor.

The traditional GCD algorithm will take linear time to operate on two n bit numbers. Can it be done faster in parallel?

If n 2 people agree to help you compute the GCD of two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.

Is GCD inherently sequential?

No one knows.

Parallel computation addition in extended binary one-bit adder ripple-carry adders computing carries using parallel prefix sum addition in parallel O(log n) time mult. in parallel O(log n) time one’s complement Here’s What You Need to Know…