Download presentation
Presentation is loading. Please wait.
1
Chapter Four Arithmetic and Logic Unit
Operation a 32 ALU Result 32 b 32 CSE SUNY New Paltz
2
Numbers Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2) decimal: n-1 How do we represent negative numbers? i.e., which bit patterns will represent which numbers? Sign Magnitude Two's Complement 000 = = = = = = = = = = = = = = = = -1 Which one is best? Why? CSE SUNY New Paltz
3
MIPS 32 bit signed numbers: two = 0ten two = + 1ten two = + 2ten two = + 2,147,483,646ten two = + 2,147,483,647ten two = – 2,147,483,648ten two = – 2,147,483,647ten two = – 2,147,483,646ten two = – 3ten two = – 2ten two = – 1ten Converting n bit numbers into numbers with more than n bits: MIPS 16 bit immediate gets converted to 32 bits for arithmetic copy the most significant bit (the sign bit) into the other bits > > "sign extension" (lbu vs. lb) max min CSE SUNY New Paltz
4
Overflow Decimal Binary Decimal 2’s Complement 0000 0000 1 0001 -1
0000 0000 1 0001 -1 1111 2 0010 -2 1110 3 0011 -3 1101 4 0100 -4 1100 5 0101 -5 1011 6 0110 -6 1010 7 0111 -7 1001 -8 1000 Examples: = but ... = but ... 1 1 1 1 1 1 1 7 1 1 – 4 3 – 5 + 1 1 + 1 1 1 1 1 – 6 1 1 1 7 CSE SUNY New Paltz
5
Detecting Overflow Overflow (result too large for finite computer word): e.g., adding two n-bit numbers does not yield an n-bit number Note that overflow term is somewhat misleading, it does not mean a carry “overflowed” No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive In MIPS add, addi, sub cause exception (interrupt) on overflow Details based on software system Don't always want to detect overflow new MIPS instructions: addu, addiu, subu CSE SUNY New Paltz
6
Building a 32 bit ALU Let's look at a 1-bit ALU for addition:
How could we build a 1-bit ALU for add, and, and or? Carry In Sum = a b cin Cout = a b + (a b) cin Cout = a b + a cin + bcin a Sum + b CarryOut Carry In Operation a 1 Result + 2 b CarryOut CSE SUNY New Paltz
7
What about subtraction (a – b) ?
Two's complement approach: a - b = a + b + 1 CSE SUNY New Paltz
8
Overflow Detection Logic
Carry into MSB XOR Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] CarryOut[N - 1] CarryIn0 a0 1-bit ALU Result0 X Y X XOR Y b0 a1 b1 1-bit ALU Result1 CarryIn1 CarryOut1 CarryOut0 1 1 1 1 1 1 CarryIn2 a2 1-bit ALU Result2 b2 CarryIn3 a3 Overflow 1-bit ALU Result3 b3 CarryOut3 CSE SUNY New Paltz
9
Supporting slt Need to support the set-on-less-than instruction (slt)
remember: slt is an arithmetic instruction produces a 1 if rs < rt and 0 otherwise use subtraction: (a-b) < 0 implies a < b and use sign bit 3 R e s u l t O p r a i o n 1 C y I B v b 2 L LSB . Sign f w d c MSB Overflow
10
(sign) B i n v e r t C a r r y I n O p e r a t i o n a C a r r y I n b
C a r r y I n b A L U R e s u l t L e s s C a r r y O u t a 1 C a r r y I n b 1 A L U 1 R e s u l t 1 L e s s C a r r y O u t a 2 C a r r y I n b 2 A L U 2 R e s u l t 2 L e s s C a r r y O u t C a r r y I n a 3 1 C a r r y I n R e s u l t 3 1 (sign) b 3 1 A L U 3 1 S e t L e s s O v e r f l o w CSE SUNY New Paltz
11
Test for equality B n e g a t e O p e r a t i o n Notice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = slt a C a r r y I n R e s u l t b A L U L e s s C a r r y O u t a 1 C a r r y I n R e s u l t 1 b 1 A L U 1 L e s s C a r r y O u t Z e r o a 2 C a r r y I n R e s u l t 2 b 2 A L U 2 L e s s C a r r y O u t R e s u l t 3 1 a 3 1 C a r r y I n b 3 1 A L U 3 1 S e t L e s s O v e r f l o w CSE SUNY New Paltz
12
Conclusion We can build an ALU to support the MIPS instruction set
key idea: use multiplexor to select the output we want we can efficiently perform subtraction using two’s complement we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware the speed of a gate is affected by the number of inputs to the gate the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however, Clever changes to organization can improve performance (similar to using better algorithms in software) we’ll look at two examples for addition and multiplication CSE SUNY New Paltz
13
Problem: ripple carry adder is slow
Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 c2 = c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 = Not feasible! Why? CSE SUNY New Paltz
14
Carry-lookahead adder
An approach in-between our two extremes c1 = b0c0 + a0c0 + a0b0 = (b0 + a0)c0 + a0b0 If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi When would we propagate the carry? pi = ai + bi Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = c3 = g2 + p2c2 c3 = c4 = g3 + p3c3 c4 = CSE SUNY New Paltz
15
Carry Look Ahead (Design trick: peek)
cin a0 S0 g b0 p c1= g0 +p0 c0 a1 S1 g = a b p = a + b g b1 p c2 = g1 + p1 g0 + p1p0c0 a2 S2 g b2 p Names: suppose G0 is 1 => carry no matter what else => generates a carry suppose G0 =0 and P0=1 => carry IFF C0 is a 1 => propagates a carry Like dominoes What about more than 4 bits? c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1p0 c0 a3 S3 g G b3 p P C4 = . . . CSE SUNY New Paltz
16
Plumbing as Carry Lookahead Analogy
CSE SUNY New Paltz
17
To build bigger adders C a r y I n R e s u l t - 3 A L U 4 7 1 8 2 O 5 P G p i g + Can’t build a 16 bit adder this way .. (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again i + 1 c 2 3 4 p g a b 5 6 7 8 9 5 b 1 5 CSE SUNY New Paltz
18
Cascaded Carry Look-ahead (16-bit)
G0 P0 C1 = G0 + P0 C0 4-bit Adder C2 = G1 + P1 G0 + P1 P0 C0 4-bit Adder C3 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0 4-bit Adder G P C4 = . . . CSE SUNY New Paltz
19
2nd level Carry, Propagate as Plumbing
CSE SUNY New Paltz
20
Carry Lookahead Example
Example: Determine the gi, pi, Pi, and Gi values of the following two 16 bit numbers. What is Cout15 (C16)? a: b: pi = ai + bi gi = ai bi ci Repeat Using Pi and Gi P0= P1= P2= P3= G0 = G1 = G2 = G3 = C4 = CSE SUNY New Paltz
21
Speed of Ripple Carry Versus Carry Lookahead
One simple way to model time for logic is to assume each AND and OR gate takes the same time for a signal to pass through it. Time is estimated by simply counting the number of gates along the longest path through a piece of logic.Compare the number of gate delays for the critical paths of two 16-bit adders, one using ripple carry and one using two-level carry lookahead. CSE SUNY New Paltz
22
Other Design Tricks: Guess
n-bit adder n-bit adder n-bit adder 1 n-bit adder n-bit adder Use multiplexor to save time: guess both ways and then select (assumes mux is faster than adder) Cout Carry-select adder CSE SUNY New Paltz
23
Multiplication Let's look at 3 versions based on grade school algorithm (multiplicand) __x_ (multiplier) 0010 0000 Negative numbers: convert and multiply there are better techniques (i.e. Booth Algorithm), we won’t look at them m bits x n bits = m+n bit product Binary makes it easy: 0 => place ( 0 x multiplicand) 1 => place a copy ( 1 x multiplicand) CSE SUNY New Paltz
24
Multiplication (version 1)
64-bit Multiplicand register, 64-bit ALU, 64-bit Product register, 32-bit multiplier register Shift Left Multiplicand 64 bits Multiplier Shift Right 64-bit ALU 32 bits Write Product Control 64 bits Multiplier = datapath + control CSE SUNY New Paltz
25
Multiplication Algorithm Version 1
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to product & place the result in Product register Product Multiplier Multiplicand 2. Shift the Multiplicand register left 1 bit M’ier: 0011 M’and: P: 1a. 1=>P=P+Mcand M’ier: 0011 Mcand: P: 2. Shl Mcand M’ier: 0011 Mcand: P: 3. Shr M’ier M’ier: 0001 Mcand: P: 1a. 1=>P=P+Mcand M’ier: 0001 Mcand: P: 2. Shl Mcand M’ier: 0001 Mcand: P: 3. Shr M’ier M’ier: 0000 Mcand: P: 1. 0=>nop M’ier: 0000 Mcand: P: 2. Shl Mcand M’ier: 0000 Mcand: P: 3. Shr M’ier M’ier: 0000 Mcand: P: 1. 0=>nop M’ier: 0000 Mcand: P: 2. Shl Mcand M’ier: 0000 Mcand: P: 3. Shr M’ier M’ier: 0000 Mcand: P: 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done CSE SUNY New Paltz
26
Observations on Multiplication: Version 1
1 clock per cycle => 100 clocks per multiply Ratio of multiply to add 5:1 to 100:1 1/2 bits in multiplicand always 0 => 64-bit adder is wasted 0’s inserted in left of multiplicand as shifted => least significant bits of product never changed once formed Instead of shifting multiplicand to left, shift product to right? CSE SUNY New Paltz
27
Multiplication Version 2
32-bit Multiplicand register, 32-bit ALU, 64-bit Product register, 32-bit Multiplier register Multiplicand 32 bits Multiplier Shift Right 32-bit ALU 32 bits Shift Right Product Control 64 bits Write CSE SUNY New Paltz
28
Multiplication Algorithm Version 2
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 Multiplier Multiplicand Product 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier Multiplicand 2. Shift the Product register right 1 bit. M’ier: 0011 Mcand: 0010 P: 1a. 1=>P=P+Mcand M’ier: 0011 Mcand: 0010 P: 2. Shr P M’ier: 0011 Mcand: 0010 P: 3. Shr M’ier M’ier: 0001 Mcand: 0010 P: 1a. 1=>P=P+Mcand M’ier: 0001 Mcand: 0010 P: 2. Shr P M’ier: 0001 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 1. 0=>nop M’ier: 0000 Mcand: 0010 P: 2. Shr P M’ier: 0000 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 1. 0=>nop M’ier: 0000 Mcand: 0010 P: 2. Shr P M’ier: 0000 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done CSE SUNY New Paltz
29
Multiplication Algorithm Version 2
Start Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier Multiplicand 2. Shift the Product register right 1 bit M’ier: 0011 Mcand: 0010 P: 1a. 1=>P=P+Mcand M’ier: 0011 Mcand: 0010 P: 2. Shr P M’ier: 0011 Mcand: 0010 P: 3. Shr M’ier M’ier: 0001 Mcand: 0010 P: 1a. 1=>P=P+Mcand M’ier: 0001 Mcand: 0010 P: 2. Shr P M’ier: 0001 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 1. 0=>nop M’ier: 0000 Mcand: 0010 P: 2. Shr P M’ier: 0000 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 1. 0=>nop M’ier: 0000 Mcand: 0010 P: 2. Shr P M’ier: 0000 Mcand: 0010 P: 3. Shr M’ier M’ier: 0000 Mcand: 0010 P: 3. Shift the Multiplier register right 1 bit 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done CSE SUNY New Paltz
30
Observations on Multiplication Version 2
Product register wastes space that exactly matches size of multiplier Combine Multiplier register and Product register CSE SUNY New Paltz
31
Multiplication Version 3
32-bit Multiplicand register, 32 -bit ALU, 64-bit Product register, (0-bit Multiplier register) Multiplicand 32 bits 32-bit ALU Shift Right Product (Multiplier) Control 64 bits Write CSE SUNY New Paltz
32
Multiplication Algorithm Version 3
Start Product0 = 1 1. Test Product0 Product0 = 0 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Multiplicand Product 2. Shift the Product register right 1 bit. Mcand: 0010 P: 1a. 1=>P=P+Mcand Mcand: 0010 P: 2. Shr P Mcand: 0010 P: 1a. 1=>P=P+Mcand Mcand: 0010 P: 2. Shr P Mcand: 0010 P: 1. 0=>nop Mcand: 0010 P: 2. Shr P Mcand: 0010 P: 1. 0=>nop Mcand: 0010 P: 2. Shr P Mcand: 0010 P: 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done CSE SUNY New Paltz
33
Observations on Final Version
2 steps per bit because Multiplier & Product combined How can you make it faster? What about signed multiplication? Booth’s Algorithm CSE SUNY New Paltz
34
Unsigned Combinational Multiplier
Stage i accumulates A * 2 i if Bi == 1 Q: How much hardware for 32 bit multiplier? Critical path? CSE SUNY New Paltz
35
Floating Point (a brief look)
We need a way to represent numbers with fractions, e.g., very small numbers, e.g., very large numbers, e.g., 109 Representation: sign, exponent, significand: (–1)sign significand 2exponent more bits for significand gives more accuracy more bits for exponent increases range IEEE 754 floating point standard: single precision: sign bit 8 bit exponent 23 bit significand double precision: sign bit 11 bit exponent 52 bit significand CSE SUNY New Paltz
36
IEEE 754 floating-point standard
Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier All 0s is smallest exponent all 1s is largest Bias of 127 for single precision and 1023 for double precision Summary: (–1)sign significand) 2exponent – bias Example: Decimal: = -3/4 = -3/22 Binary: = -1.1 x 2-1 Floating point: exponent = 126 = Single precision: sign bit 8 bit exponent 23 bit significand IEEE single precision: CSE SUNY New Paltz
37
Floating-Point Arithmetic
Addition example: three digit significand 9.999 10-1. Step 1: Align ==> 101 Step 2: Add Significand ==> 0.016 10.015 Step 3: Normalize: ==> 102 Step 4: Round: ==> 102 Multiplication example: 10-5 Step 1: Add exponents ( ) + ( ) = ( ) Step 2: Multiply significand = Step 3: Normalize 105 = 106 Step 4: Sign of product 106 CSE SUNY New Paltz
38
Floating Point Complexities
Operations are somewhat more complicated In addition to overflow we can have “underflow” Accuracy can be a big problem IEEE 754 keeps two extra bits, guard and round four rounding modes: round up, round down, truncate, nearest even positive divided by zero yields “infinity” (see page 300) zero divide by zero yields “not a number” (see page 300) other complexities CSE SUNY New Paltz
39
Chapter Four Summary Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning but standards do exist two’s complement IEEE 754 floating point Computer instructions determine “meaning” of the bit patterns Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation). CSE SUNY New Paltz
40
Barrel Shifter Technology-dependent solutions: transistor per switch
SR0 SR1 SR2 SR3 Qi are data input Di are output SRi are control lines that make connection Can do N x N shifter in N**2 transistors: 32x32 = 1024 transistors Simplicity lead to inclusion even if large shifts are rare CSE SUNY New Paltz
41
Motivation for Booth’s Algorithm
Example 2 x 6 = 0010 x 0110: x shift (0 in multiplier) add (1 in multiplier) add (1 in multiplier) shift (0 in multiplier) ALU with add or subtract gets same result in more than one way: 6= – = – = For example 0010 x shift (0 in multiplier) – sub (first 1 in multpl.) 0000 shift (mid string of 1s) add (prior step had last 1) CSE SUNY New Paltz
42
Current Bit Bit to the Right Explanation Example Op
Booth’s Algorithm Current Bit Bit to the Right Explanation Example Op 1 0 Begins run of 1s sub 1 1 Middle of run of 1s none 0 1 End of run of 1s add 0 0 Middle of run of 0s none Originally for Speed (when shift was faster than add) Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one –1 01111 CSE SUNY New Paltz
43
Booths Example (2 x 7) Operation Multiplicand Product next? 0. initial value > sub 1a. P = P - m shift P (sign ext) 1b > nop, shift > nop, shift > add 4a shift 4b done CSE SUNY New Paltz
44
Booths Example (2 x -3) Operation Multiplicand Product next? 0. initial value > sub 1a. P = P - m shift P (sign ext) 1b > add 2a shift P 2b > sub 3a shift 3b > nop 4a shift 4b done CSE SUNY New Paltz
45
at each stage shift A left ( x 2)
How does it work? A0 A1 A2 A3 B0 A0 A1 A2 A3 B1 A0 A1 A2 A3 B2 A0 A1 A2 A3 B3 P7 P6 P5 P4 P3 P2 P1 P0 at each stage shift A left ( x 2) use next bit of B to determine whether to add in shifted multiplicand accumulate 2n bit partial product at each stage CSE SUNY New Paltz
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.