Structure of Computer Systems Course 3 The Arithmetical and Logical Unit.

Structure of Computer Systems Course 3 The Arithmetical and Logical Unit

ALU- Arithmetical and Logical Unit  Purpose: computes arithmetical and logical operations: arithmetical: arithmetical: basic operations: add, subtract, multiply, division, modulobasic operations: add, subtract, multiply, division, modulo special functions: exponential, logarithm, sine, cosine, tangent, atangent, etc.special functions: exponential, logarithm, sine, cosine, tangent, atangent, etc. logical: logical: AND, OR, NOT, inclusiveOR, exclusiceORAND, OR, NOT, inclusiveOR, exclusiceOR  Types of arithmetic units: integer arithmetic integer arithmetic floating point arithmetic (e.g. Intel’s co-processor) floating point arithmetic (e.g. Intel’s co-processor) signal processing arithmetic (e.g. with saturation MMX) signal processing arithmetic (e.g. with saturation MMX) parallel arithmetic (MMX - integer, SSE2- floating point) parallel arithmetic (MMX - integer, SSE2- floating point)

Addition  most used operation  all the other arithmetic operations are based on addition: subtract – adding the complement subtract – adding the complement multiply – repetitive adding multiply – repetitive adding division – repetitive subtraction and adding division – repetitive subtraction and adding  efficient implementation of adding operation: influence directly all the other operations influence directly all the other operations efficiency: speed and cost (complexity) efficiency: speed and cost (complexity)

Addition  Basic (full) adder unit – one bit adder inputs: x i, y i, C i inputs: x i, y i, C i outputs: outputs: S i = x i  y i  C iS i = x i  y i  C i C i = x i y i + (x i  y i ) C i-1C i = x i y i + (x i  y i ) C i-1 delay: 3* gate_delay delay: 3* gate_delay One bit adder x i y i CiCi C i-1 SiSi   SiSiSiSi x i y i C i-1 CiCi

“n” bit adder with ripple carry   n bit adder = n * (1 bit full adder)   delay: n*3*gate_delay example: n=32; gate_delay = 10 ns (TTL gate) => delay: 32*3*10ns ~= 1000 ns => f clk_max = 1/1000 ns = 10 6 =1MHz !!! 1 bit adder S n-1 y n-1 x n-1 C n-2 1 bit adder S n-2 y n-2 x n-2 C n-3 1 bit adder S1S1 y1y1 x1x1 C0C0 S0S0 y0y0 x0x0 C -1 C n-1 X Y n bit adder S

Subtract  subtract = adding with the second number’s 2 th complement  n bit add and subtract: Add/Sub = 0 => adding Add/Sub = 0 => adding Add/Sub = 1 => subtraction Add/Sub = 1 => subtraction 1 bit adder S n-1 y n-1 x n-1 C n-2 1 bit adder S n-2 y n-2 x n-2 C n-3 1 bit adder S1S1 y1y1 x1x1 C0C0 S0S0 y0y0 x0x0 C n-1  Add/Sub

Sequence of steps for adding StepBUSSELLD_A/LD_B/Add/SubWr_m/Result 1X101-1A<=X 2Y01001B<=Y 3-00101A<=X+Y 4Z-11-0Z<=X+Y Control unit MUX Reg. AReg. B Add&Sub Clk Sel Ld_A/ Ld_B/ Data Bus (D 0 -D 15 ) Add/Sub 0 1 Amp. Temp Wr_m/ Instr. code

Improving the Adder Carry Look-ahead Adder  Issue: the delay time of the carry  Solution: direct generation on carry => “Carry lookahead adder” C i = x i y i + (x i  y i ) C i-1 = g i +p i *c i-1 where: g i – carry generator p i – carry propagator p i – carry propagator C 0 = x 0 y 0 + (x 0  y 0 )C -1 = g 0 +p 0 *C -1 C 1 = x 1 y 1 + (x 1  y 1 )C 0 = g 1 +p 1 *C 0 = g 1 +p 1 *(g 0 +p 0 *C -1 )= g 1 +p 1 g 0 +p 1 p 0 C -1 C 2 = x 2 y 2 + (x 2  y 2 )C 1 = g 2 +p 2 *C 1 = g 2 +p 2 *[g 1 +p 1 *(g 0 +p 0 *C -1 )] = = g 2 +p 2 g 1 +p 2 p 1 g 0 +p 2 p 1 p 0 *C -1 = g 2 +p 2 g 1 +p 2 p 1 g 0 +p 2 p 1 p 0 *C -1...... C i =f(g 0, g 1,... g i, p 0, p 1,... p i, C -1 ) = f(x 0, x 1,... x i, y 0, y 1,... y i, C -1 ) Conclusion: C i is obtained directly by combining ONLY input signals Drawbacks: - the circuit’s complexity grows exponentially with the number of bits (n) - it requires gates with a lot of input signals - delay ideal = 2*gate_delay

Carry Look-ahead Adder - CLU  generates a result in a shorter time  CLU is feasible for 4 bits – the gate inputs’ number is limited  it can be extended putting together 4 bit adders Carry Look-ahead Unit (CLU) S0S0 C -1 x 0 y 0 C 0 p 0 g 0 1 bit adder C 1 p 1 g 1 1 bit adder C n-1 p n-1 g n-1 1 bit adder S1S1 S n-1 x 1 y 1 x n-1 y n-1

Carry Look-ahead Adder  extension from 4 bits to 16 bits  Generators and propagators for blocks of bits from “i” to “k”: Group generate g i,k Group generate g i,k Group propagate p i,k Group propagate p i,k  For a block of 4 bits: G 0,3 = g 3 + p 3 g 2 + p 3 p 2 g 1 + p 3 p 2 p 1 g 0 P 0,3 = p 3 p 2 p 1 p 0  Using this notation we obtain block caries C 3, C 7, C 11, C 15 C 3 = G 0,3 + P 0,3 C -1 C 7 = G 4,7 + P 4,7 C 3 = G 4,7 + P 4,7 (G 0,3 + P 0,3 C -1 )

Carry Look-ahead Adder  16 bit carry look-ahead adder made of: 4 units of 4 bit carry look-ahead adders 4 units of 4 bit carry look-ahead adders one 4 bit carry look-ahead unit one 4 bit carry look-ahead unit p 0,3 g 0,3 S 0-3 X 0-3 Y 0-3 4 bit adder C -1 p 0,3 g 0,3 S 0-3 X 0-3 Y 0-3 4 bit adder p 0,3 g 0,3 S 0-3 X 0-3 Y 0-3 4 bit adder p 0,3 g 0,3 S 0-3 X 0-3 Y 0-3 4 bit adder 4 bit carry look-ahead unit C3C3 C7C7 C 11 C 15 C 3 p 3 g 3 C 2 p 2 g 2 C 1 p 1 g 1 C 0 p 0 g 0

Carry select adder  Extra hardware to speed-up the adding  Avoids complex carry look-ahead unit Y 3,0 X 3,0 Y 7,4 X 7,4 MUX 1 0 4 bit adder S 3,0 C 7,S 7,4 1 0 C3C3

Serial adder  Adding two sequences of bits with a 1 bit adder 1 bit adder Q D A n-1 ….A 2 A 1 A 0 B n-1 ….B 2 B 1 B 0 SiSi CiCi AiAi BiBi C i-1 S n-1 ….S 2 S 1 S 0 Clk clk shift entry

BCD adder  adding numbers in BCD –(binary coded decimal) representation a correction is needed: a correction is needed: if the figure is not a decimalif the figure is not a decimal If a carry is generated to the next group of 4 bits (to the next decimal figure)If a carry is generated to the next group of 4 bits (to the next decimal figure) solution: adding 6 (both cases) solution: adding 6 (both cases) Example: Example: 89+ 89+ 42 42 CB+ correction CB+ correction 66 66131 4 bits adder 0 X 3,0 Y 3,0 S’ 3,0 S 3,0 S 3 S 2 S 1 S 0 C Corr 0 0 0 1 0 0 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 1 x x x x 1 1

Multiplication  Multiply = repeated adding 1100 * 12 * 1010 10 0000 1100 0000 1100 1111000 = 78H = 120 Issues: - we need a 2n bits adder - partial products must be placed in different positions Modified multiply: 00000000 Acumulator (AC) “0” → 0000000 0 shift right “1” → 1100 adding 0001100 0 partial product 000110 00 shift right. “0” → 00011 000 shift right “1” → 1100 adding 1111 000 final product Solution: shift the partial result to the right and put the product in the same place Advantages: - we need just an n bits adder - partial products in the same place

Multiplication  (n+1) Command unit Shift X Q0Q0 Q1Q1 Q n-1... B0B0 B1B1 B n-1... BSBS A0A0 A1A1 A n-1... ASAS Q S Y Clear Write Scriere Test Shift Write

Multiply algorithm 1. Write the operands in registers (B ← X, Q ← Y), clear accumulator (A ← 0) 2. Complement the negative numbers 3. Test Q 0 If Q 0 = 0, shift right A and Q If Q 0 = 0, shift right A and Q If Q 0 = 1, add A = B + A and shift right A and Q If Q 0 = 1, add A = B + A and shift right A and Q 4. Go to step 3 until Y n-1 arrives in Q 0. No shift is needed after the last step 5. A S = B S + Q S 6. If A S = 1 complement the result

Multiply with Booth algorithm  Improvements: Multiply numbers in 2 th complement; no initial and final complementation are needed Multiply numbers in 2 th complement; no initial and final complementation are needed For long sequences of 0s and 1s only shift operations are needed: For long sequences of 0s and 1s only shift operations are needed: For 0s – it is obvious from the previous methodFor 0s – it is obvious from the previous method For a sequence of 1s:For a sequence of 1s: Examples: 1111 = 10000 -1; Examples: 1111 = 10000 -1; 11.1111 = 100.000 – 1 A sequence of 1s can be changed into a sequence of 0s A sequence of 1s can be changed into a sequence of 0s Only transitions from 0 to 1 or 1 to 0 needs adding or subtract operations as follows: Only transitions from 0 to 1 or 1 to 0 needs adding or subtract operations as follows: If two consecutive bits in the second operand are:If two consecutive bits in the second operand are: 0 and 0 - shift the partial result to the right 0 and 0 - shift the partial result to the right 0 and 1 – add second operand and shift the partial result to the right 0 and 1 – add second operand and shift the partial result to the right 1 and 0 – subtract the second operand and shift the partial result to the right 1 and 0 – subtract the second operand and shift the partial result to the right 1 and 1 - shift the partial result to the right 1 and 1 - shift the partial result to the right

Division  Multiple solutions: Compare and subtract Compare and subtract Hard to compare on different positionsHard to compare on different positions Subtract and restore the partial result (if necessary) Subtract and restore the partial result (if necessary) Subtract the second operand from the most significant part of the first operand andSubtract the second operand from the most significant part of the first operand and If the result is positive than its ok (quotient gets a 1), If the result is positive than its ok (quotient gets a 1), Else restore the result by adding back the second operand (quotient gets a 0) Else restore the result by adding back the second operand (quotient gets a 0) Drawback: some steps require 2 arithmetical operations (subtract and adding) Drawback: some steps require 2 arithmetical operations (subtract and adding) Subtract without restoring the partial result Subtract without restoring the partial result try to subtract B from the partial rest R’=R-Btry to subtract B from the partial rest R’=R-B If a wrong subtraction was made in the previous step the correction is made in the next step by adding the second operand instead of subtracting itIf a wrong subtraction was made in the previous step the correction is made in the next step by adding the second operand instead of subtracting it With correction: ((R-B) +B)*2 - B = R*2 - B ; A shifted one position to the leftWith correction: ((R-B) +B)*2 - B = R*2 - B ; A shifted one position to the left Without correctionWithout correction (R – B)*2 + B = R*2 – B Advantage: in a step at most one subtraction or adding is neededAdvantage: in a step at most one subtraction or adding is needed

Division circuit for the second method – restoring the partial result Adding, Subtraction Command unit X Q0Q0 Q1Q1 Q n-1... B0B0 B1B1 B n-1...BSBS A0A0 A1A1 A n-1...ASAS Q S Y Add / Sub

Division algorithm – with restoring the partial result 1. 1. Load first operand in A and Q; Load second operand in B 2. 2. Write A S + B S in Q S. If A S = 1, complement A, Q If B S = 1, complement B 3. 3. Tests: A ≥ B, overflow B = 0, division with 0 A = 0 and Q < B, rezult = 0 4. 4. Shift A, Q to the left and put 0 in Q 0 5. 5. Subtract B from A and put the result in A. if A S = 0 (positive rest), shift A, Q to the left and put 1 in Q 0 else (A S = 1 negative rest), add B to A, shift A, Q to the left and put 0 in Q 0 6. 6. Go to step 5 n times 7. 7. Rounding the result. If A ≥ B, add 1 to the Qth complement 8. 8. If Q S = 1 complement register Q

Multiply with look-up tables  Principle: all the results are pre-computed and memorized in a non- volatile memory  Multiply is a simple reading from the memory  Operands form the address of the location where the result is stored  Problem: the dimension of the memory must be 2 2n Examples: Examples: 8*8 bits => 16 address lines => 2 16 = 64KB8*8 bits => 16 address lines => 2 16 = 64KB 16*16 bits => 32 address lines => 2 32 = 4GB (TOO MUCH)16*16 bits => 32 address lines => 2 32 = 4GB (TOO MUCH) Solution: Solution: Multiply 8*8 bits in multiple steps to obtain multiply on 16, 32 or 64 bitsMultiply 8*8 bits in multiple steps to obtain multiply on 16, 32 or 64 bits Example:Example: X= X 15,8 X 7,0 Y= Y 15,8 Y 7,0 P = X*Y = X 7,0 *Y 7,0 + X 15,8 *Y 7,0 *2 8 + X 7,0 *Y 15,8 *2 8 + X 15,8 *Y 15,8 *2 16 Observation: multiplies with 2 8 and 2 16 are achieved by placing the result in a proper binary position; also the first and the last partial products may be combined in a single 32 bit register with no adding required

Multiply with look-up table X 15,0 Y 15,0 X 15,8 Y 15,8 Y 7,0 X 7,0 A 15,0 D 15,0 Memory Look-up table MUXMUX MUXMUX MUX Accumulator Adder Control unit X 15,8 *Y 15,8 X 7,0 *Y 7,0 X 15,8 *Y 7,0 X 7,0 *Y 15,8 Sel 1 Sel 0 Sel 2 Wr X Wr Y Wr Acc Wr P1,2 Wr P0 Wr P3

Multiply with look-up table Step Wr X Wr Y Wr P0 Wr P1,2 Wr P3 Wr Acc Sel 0 Sel 1 Sel 2 Description 1110000000 Load operands 2001000000 Write P0 3000010110 Write P3 4000100100 Write P1 5000001010 Acc=P0+ P3 +P1 6000100010 Write P2 7000001001Acc=Acc+P2  Multiply with look-up table requires only 7 steps instead of 16-20  it can be further optimized

Arithmetical operations in floating point (FP) representation  Floating point representation of a number: Used in case of very big or very small numbers Used in case of very big or very small numbers 3 fields for representation: 3 fields for representation: SignSign Exponent – magnitude of the numberExponent – magnitude of the number Mantissa – some significant figures (digits) of the numberMantissa – some significant figures (digits) of the number IT IS NOT THE REPRESENTATION OF REAL NUMBERS from mathematics !!!!! IT IS NOT THE REPRESENTATION OF REAL NUMBERS from mathematics !!!!! A lots of anomalies and precision problems: A lots of anomalies and precision problems: Operating with numbers having different magnitudes may generate errors caused by rounding:Operating with numbers having different magnitudes may generate errors caused by rounding: M+m-M = 0 ; M-M+m = m M+m-M = 0 ; M-M+m = m Number with decimal parts, in most cases have no precise FP representationNumber with decimal parts, in most cases have no precise FP representation Example: 0.3 has no precise representation in floating point Example: 0.3 has no precise representation in floating point

Floating point adder/ subtracter Add & subtract Compare Control unit exponent mantissaS exponent S Inc/Dec Shift right Inc/Dec Shift right < = > Add/Sub X Y

Adding floating point numbers 1.Load the operands 2.Compare exponents (5 cases): e x = e y, add mantissas and copy the exponent e x > e y and (e x – e y ) e y and (e x – e y ) < number of bits in the mantissa, than the my mantissa is aligned by shifting it with ex-ey positions to the right; e x >> e y and (e x – e y ) ≥ number of bits in the mantissa, than X is copied in the result (Y is too small); go to step 4 e x < e y and (e y – e x ) < number of bits in the mantissa, than the mx mantissa is aligned by shifting it with ey-ex positions to the right; than mantissas are added e x << e y and (e y – e x ) ≥ number of bits in the mantissa, than Y is copied in the result (X is too small); go to step 4 3.Add mantissas 4.Realign the result if necessary. Shift the resulting mantissa to the right or to the left until the integer part is 0 and the first bit after the decimal point is 1; in the same time increment or decrement the exponent in accordance with the shifting operation

Multiply and division in floating point representation  Multiply: Add the exponents Add the exponents Multiply the mantissas Multiply the mantissas Adjust the result (shift mantissa to the left and decrement the exponent if necessary) Adjust the result (shift mantissa to the left and decrement the exponent if necessary)  Division: Subtract the exponents Subtract the exponents Divide the mantissas Divide the mantissas Adjust the result (if necessary) Adjust the result (if necessary)

Add and Subtract with saturation  Idea: if there is an overflow or underflow after an adding or subtraction the result should be the maximum or the minimum possible value  example: unsigned 8 bit representation unsigned 8 bit representation Normal adding (wraparound)With saturation 80h+90h = 10h (error, overflow)80h+90h = FFh (maximum value) 80h-90h = F0h (underflow)80h-90h = 00h (minimum value) signed (2th complement) 8 bit representation signed (2th complement) 8 bit representation Normal adding (wraparound)With saturation 70h+20h = 90h (error, negative)70h+20h = 7Fh (maximum value) 80h-20h = 60h (error, positive)80h-20h = 80h (minimum value) (-128-32 = 96)  Used in case of: signal processing signal processing multimedia processing multimedia processing  Typical signal processing operation: amplification U e = U i *A Supply: +10V;-10V, U i =0.05 V; A=100 =>U e = 5V U i =1.00 V; A=100 =>U e = 10V !!! – upper saturation U i =1.00 V; A=100 =>U e = 10V !!! – upper saturation UiUiUiUi UeUeUeUe R2R2 R1R1

Add and Subtract with saturation MUX Add&Sub FF 00 3 2 1 0 Add/Sub S1S1 S0S0 X 7,0 Y 7,0 S 7,0 Carry C Add/ Sub OperationResult S1S1S1S1 S0S0S0S0 00adding Correct X+Y 1X 01subtract Correct X-Y 1X 10adding Overflow FFh 01 11subtract Underflow 00h 00  Add and subtract with saturation for unsigned 8 bit representation  the result is selected with a multiplexer: Carry (C) = 0 => result correct Carry (C) = 0 => result correct C=1 and adding => overflow, result=FFh C=1 and adding => overflow, result=FFh C=1 and subtract => underflow, result=00h C=1 and subtract => underflow, result=00h  homework: do it for 2th complement S0S0S0S0010XX 110 Add/Sub C S1S1S1S101011 100 C

Structure of Computer Systems Course 3 The Arithmetical and Logical Unit.

Similar presentations

Presentation on theme: "Structure of Computer Systems Course 3 The Arithmetical and Logical Unit."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structure of Computer Systems Course 3 The Arithmetical and Logical Unit.

Similar presentations

Presentation on theme: "Structure of Computer Systems Course 3 The Arithmetical and Logical Unit."— Presentation transcript:

Similar presentations

About project

Feedback