CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents.

CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents

CS224 Spring 2010 Chapter 3 2 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2 n -1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number How do we represent negative numbers? i.e., which bit patterns will represent which numbers ? Numbers

CS224 Spring 2010 Chapter 3 3 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000 two = 0 ten 0000 0000 0000 0000 0000 0000 0000 0001 two = + 1 ten... 0111 1111 1111 1111 1111 1111 1111 1110 two = + 2,147,483,646 ten 0111 1111 1111 1111 1111 1111 1111 1111 two = + 2,147,483,647 ten 1000 0000 0000 0000 0000 0000 0000 0000 two = – 2,147,483,648 ten 1000 0000 0000 0000 0000 0000 0000 0001 two = – 2,147,483,647 ten... 1111 1111 1111 1111 1111 1111 1111 1110 two = – 2 ten 1111 1111 1111 1111 1111 1111 1111 1111 two = – 1 ten Number Representations maxint minint Converting <32-bit values into 32-bit values  copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010  sign extend versus zero extend (lb vs. lbu) MSB LSB

CS224 Spring 2010 Chapter 3 4 MIPS Arithmetic Logic Unit (ALU) Must support the Arithmetic/Logic operations of the ISA -add, addi, addiu, addu -sub, subu -mult, multu, div, divu -sqrt -and, andi, nor, or, ori, xor, xori -beq, bne, slt, slti, sltu, sltiu 32 m (operation) result A B ALU 4 zeroovf 1 1 With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub

CS224 Spring 2010 Chapter 3 5 Dealing with Overflow OperationOperand AOperand BResult indicating overflow A + B≥ 0 < 0 A + B< 0 ≥ 0 A - B≥ 0< 0 A - B< 0≥ 0  Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit  When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur  MIPS signals overflow with an exception (a.k.a. interrupt) – an unscheduled procedure call to the OS, where the Exception Program Counter (EPC) contains the address of the instruction that caused the exception

CS224 Spring 2010 Chapter 3 6 A MIPS ALU Implementation + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  Enable overflow bit setting for signed arithmetic (add, addi, sub) add/subt op ovf zero...  Zero detect (slt, slti, sltiu, sltu, beq, bne)

CS224 Spring 2010 Chapter 3 7 But What about Performance ? Critical path of n-bit ripple-carry adder is n*CP Design trick – compute carries in parallel w/ Carry Lookahead Unit (CLU) A0A0 B0B0 1-bit ALU result 0 CarryIn 0 CarryOut 0 A1A1 B1B1 1-bit ALU result 1 CarryIn 1 CarryOut 1 A2A2 B2B2 1-bit ALU result 2 CarryIn 2 CarryOut 2 A3A3 B3B3 1-bit ALU result 3 CarryIn 3 CarryOut 3

CS224 Spring 2010 Chapter 3 8 Multiply Binary multiplication is just a bunch of right shifts and adds multiplicand multiplier partial product array double precision product n 2n n can be formed in parallel and added in parallel for faster multiplication

CS224 Spring 2010 Chapter 3 9 Multiplication Hardware Initially 0

CS224 Spring 2010 Chapter 3 10 Optimized Multiplier Perform steps in parallel: add/shift One cycle per partial-product addition That’s ok, if frequency of multiplications is low

CS224 Spring 2010 Chapter 3 11 Fast Multiplication Hardware Can build a faster multiplier by using a parallel tree of adders with one 32-bit adder for each bit of the multiplier at the base Rather than use a single 32-bit adder 31 times, this hardware “unrolls the loop” to use 31 adders, organized in a “tree” to minimize delay

CS224 Spring 2010 Chapter 3 12 Multiply in MIPS Product stored in two 32-bit registers called Hi and Low mult $s0, $s1# $s0 * $s1 -> high/low multu $s0, $s1# $s0 * $s1 (unsigned) Results are moved from Hi/Low mfhi $t0# $t0 = Hi mflo $t0# $t0 = Low Pseudoinstruction mul $t0, $s0, $s1# $t0 = $s0 * $s1

CS224 Spring 2010 Chapter 3 13 Divide Long-hand division (108 ÷ 12 = 9) 1 0 0 1 (n bit Quotient) +--------------- +----------------- 1 1 0 0 | 1 1 0 1 1 0 0(Divisor) | (Dividend) - 1 1 0 0 --------- 0 0 1 1 - 0 0 0 0 --------- 0 1 1 0 - 0 0 0 0 --------- 1 1 0 0 - 1 1 0 0 --------- 0 0 0 0(Remainder) N + 1 Steps

CS224 Spring 2010 Chapter 3 14 Solution #1: Operations Test Shift Quot Left and set bit 1 Shift divisor right 33rd? Done Start Remainder  0 Remainder < 0 < 33 repetitions 33 repetitions 1. 2a. 3. Remainder = Rem - Div Restore Rem; sll Quot left 0 2b.

CS224 Spring 2010 Chapter 3 15 Solution #1: Hardware Remainder Divisor 64-bit ALU Shift Right Shift Left Write Control 32 bits 64 bits Quotient MSB

CS224 Spring 2010 Chapter 3 16 Issues and Alternative Notice that:  Half of the divisor bits are always 0 Either high or low bits are 0, even as shifted  Half of the 64 bit adder is therefore wasted Solution #2  Instead of shifting divisor right, shift the remainder left  Adder only needs to be 32 bits wide  Can also remove an iteration by switching order to shift and then subtract  Remainder is in left half of register

CS224 Spring 2010 Chapter 3 17 Solution #2: Hardware Remainder Divisor 32-bit ALU Shift Left Control 32 bits 64 bits Quotient Write Shift Left MSB

CS224 Spring 2010 Chapter 3 18 Solution #3: Operations Test Shift Rem Left and set bit 1 32nd? Done – Shift Left Rem right Start Remainder  0 Remainder < 0 < 32 32 repetitions 2. 3a. 1. Rem = Rem – Div (left half) Restore Rem; sll Rem set 0 3b. Shift Remainder left

CS224 Spring 2010 Chapter 3 19 Solution #3: Hardware Remainder Divisor 32-bit ALU Control 32 bits 64 bits Write Shift Left MSB

CS224 Spring 2010 Chapter 3 20 Divide in MIPS Quotient and remainder stored in two 32-bit registers called Hi and Low div $s0, $s1# $s0/$s1 -> high/low divu $s0, $s1# $s0/$s1 (unsigned) Results are moved from Hi/Low mfhi $t0# $t0 = Hi (remainder) mflo $t0# $t0 = Low (quotient) Pseudoinstructions div $t0, $s0, $s1# $t0 = $s0 / $s1 rem $t0, $s0, $s1# $t0 = $s0 % $s1

CS224 Spring 2010 Chapter 3 21 Floating Point Representation for non-integer numbers  Including very small and very large numbers Like scientific notation  –2.34 × 10 56  +0.002 × 10 –4  +987.02 × 10 9 In binary  ±1.xxxxxxx 2 × 2 yyyy Types float and double in C normalized not normalized §3.5 Floating Point

CS224 Spring 2010 Chapter 3 22 Floating Point Standard Defined by IEEE Std 754-1985 Developed in response to divergence of representations  Portability issues for scientific code Now almost universally adopted Two representations  Single precision (32-bit)  Double precision (64-bit)

CS224 Spring 2010 Chapter 3 23 IEEE Floating-Point Format S: sign bit (0  non-negative, 1  negative) Normalize significand: 1.0 ≤ |significand| < 2.0  Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)  Significand is Fraction with the “1.” restored Exponent: excess representation: actual exponent + Bias  Ensures exponent is unsigned  Single: Bias = 127; Double: Bias = 1023 SExponentFraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits

CS224 Spring 2010 Chapter 3 24 Single-Precision Range Exponents 00000000 and 11111111 reserved Smallest value  Exponent: 00000001  actual exponent = 1 – 127 = –126  Fraction: 000…00  significand = 1.0  ±1.0 × 2 –126 ≈ ±1.2 × 10 –38 Largest value  exponent: 11111110  actual exponent = 254 – 127 = +127  Fraction: 111…11  significand ≈ 2.0  ±2.0 × 2 +127 ≈ ±3.4 × 10 +38

CS224 Spring 2010 Chapter 3 25 IEEE 754 Double Precision Double precision number represented in 64 bits MIPS Format (-1) S × S × 2 E or (-1) S × (1 + Fraction) × 2 (Exponent-Bias) sign 11120 Exponent: bias 1023 binary integer 0 < E < 2047 Significand: magnitude, normalized binary significand with hidden bit (1): 1.F SE F F 32

CS224 Spring 2010 Chapter 3 26 Double-Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value  Exponent: 00000000001  actual exponent = 1 – 1023 = –1022  Fraction: 000…00  significand = 1.0  ±1.0 × 2 –1022 ≈ ±2.2 × 10 –308 Largest value  Exponent: 11111111110  actual exponent = 2046 – 1023 = +1023  Fraction: 111…11  significand ≈ 2.0  ±2.0 × 2 +1023 ≈ ±1.8 × 10 +308

IEEE 754 FP Standard Encoding Special encodings are used to represent unusual events –± infinity for division by zero –NAN (not a number) for the results of invalid operations such as 0/0 –True zero is the bit string all zero Single PrecisionDouble PrecisionObject Represented E (8)F (23)E (11)F (52) 0000 00000…00000true zero (0) 0000 nonzero0000…0000nonzero± denormalized number 0000 0001 to 1111 1110 anything0000…0001 to 1111 …1110 anything± floating point number 1111 01111 … 1111 0± infinity 1111 nonzero1111 … 1111nonzeronot a number (NaN)

CS224 Spring 2010 Chapter 3 28 Floating-Point Precision Relative precision  all fraction bits are significant  Single: approx 2 –23 Equivalent to 23 × log 10 2 ≈ 23 × 0.3 ≈ 6 decimal digits of precision  Double: approx 2 –52 Equivalent to 52 × log 10 2 ≈ 52 × 0.3 ≈ 16 decimal digits of precision

CS224 Spring 2010 Chapter 3 29 Floating-Point Example What number is represented by the single- precision float 11000000101000…00  S = 1  Fraction = 01000…00 2  Exponent = 10000001 2 = 129 x = (–1) 1 × (1 +.01 2 ) × 2 (129 – 127) = (–1) × 1.25 × 2 2 = –5.0

CS224 Spring 2010 Chapter 3 30 Floating-Point Addition Consider a 4-digit decimal example  9.999 × 10 1 + 1.610 × 10 –1 1. Align decimal points  Shift number with smaller exponent  9.999 × 10 1 + 0.016 × 10 1 2. Add significands  9.999 × 10 1 + 0.016 × 10 1 = 10.015 × 10 1 3. Normalize result & check for over/underflow  1.0015 × 10 2 4. Round and renormalize if necessary  1.002 × 10 2

CS224 Spring 2010 Chapter 3 31 Floating-Point Addition Now consider a 4-digit binary example  1.000 2 × 2 –1 + –1.110 2 × 2 –2 (0.5 + –0.4375) 1. Align binary points  Shift number with smaller exponent  1.000 2 × 2 –1 + –0.111 2 × 2 –1 2. Add significands  1.000 2 × 2 –1 + –0.111 2 × 2 – 1 = 0.001 2 × 2 –1 3. Normalize result & check for over/underflow  1.000 2 × 2 –4, with no over/underflow 4. Round and renormalize if necessary  1.000 2 × 2 –4 (no change) = 0.0625

CS224 Spring 2010 Chapter 3 32 FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long – Much longer than integer operations – Slower clock would penalize all instructions FP adder usually takes several cycles – Can be pipelined

CS224 Spring 2010 Chapter 3 33 FP Adder Hardware Step 1 Step 2 Step 3 Step 4

CS224 Spring 2010 Chapter 3 34 Floating-Point Multiplication Consider a 4-digit decimal example  1.110 × 10 10 × 9.200 × 10 –5 1. Add exponents  For biased exponents, subtract bias from sum  New exponent = 10 + –5 = 5 2. Multiply significands  1.110 × 9.200 = 10.212  10.212 × 10 5 3. Normalize result & check for over/underflow  1.0212 × 10 6 4. Round and renormalize if necessary  1.021 × 10 6 5. Determine sign of result from signs of operands  +1.021 × 10 6

CS224 Spring 2010 Chapter 3 35 Floating-Point Multipl ication Now consider a 4-digit binary example  1.000 2 × 2 –1 × –1.110 2 × 2 –2 (0.5 × –0.4375) 1. Add exponents  Unbiased: –1 + –2 = –3  Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127 2. Multiply significands  1.000 2 × 1.110 2 = 1.1102  1.110 2 × 2 –3 3. Normalize result & check for over/underflow  1.110 2 × 2 –3 (no change) with no over/underflow 4. Round and renormalize if necessary  1.110 2 × 2 –3 (no change) 5. Determine sign: if same, +; else, -  –1.110 2 × 2 –3 = –0.21875

CS224 Spring 2010 Chapter 3 36 FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder o But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does o Addition, subtraction, multiplication, division, reciprocal, square-root o FP  integer conversion Operations usually takes several cycles o Can be pipelined

CS224 Spring 2010 Chapter 3 37 FP Instructions in MIPS FP hardware is coprocessor 1 o Adjunct processor that extends the ISA Separate FP registers o 32 single-precision: $f0, $f1, … $f31 o Paired for double-precision: $f0/$f1, $f2/$f3, … Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s FP instructions operate only on FP registers o Programs generally don’t do integer ops on FP data, or vice versa o Gives us more registers w/ minimal code-size impact FP load and store instructions o lwc1, ldc1, swc1, sdc1 e.g., ldc1 $f8, 32($sp )

CS224 Spring 2010 Chapter 3 38 FP Instructions in MIPS Single-precision arithmetic add.s, sub.s, mul.s, div.s e.g.: add.s $f0, $f1, $f6 Double-precision arithmetic add.d, sub.d, mul.d, div.d e.g.: mul.d $f4, $f4, $f6 Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, …) Sets or clears FP condition-code bit e.g.: c.lt.s $f3, $f4 Branch on FP condition code true or false bc1t, bc1f e.g.: bc1t TargetLabel

CS224 Spring 2010 Chapter 3 39 FP Example: °F to °C C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } fahr in $f12, result in $f0, literals in global memory space Compiled MIPS code: f2c: lwc1 $f16, const5($gp) lwc1 $f18, const9($gp) div.s $f16, $f16, $f18 lwc1 $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra

CS224 Spring 2010 Chapter 3 40 Associativity Parallel programs may interleave operations in unexpected orders §3.6 Parallelism and Computer Arithmetic: Associativity Assumptions of associativity may fail, since FP operations are not associative ! Need to validate parallel programs under varying degrees of parallelism

CS224 Spring 2010 Chapter 3 41 Support for Accurate Arithmetic Rounding (except for truncation) requires the hardware to include extra F bits during calculations Guard bit – used to provide one F bit when shifting left to normalize a result (e.g., when normalizing F after division or subtraction) G Round bit – used to improve rounding accuracy R Sticky bit – used to support Round to nearest even; is set to a 1 whenever a 1 bit shifts (right) through it (e.g., when aligning F during addition/subtraction) S IEEE 754 FP rounding modes Always round up (toward +∞) Always round down (toward -∞) Truncate (toward 0) Round to nearest even (when the Guard || Round || Sticky are 100) – always creates a 0 in the least significant (kept ) bit of F F = 1. xxxxxxxxxxxxxxxxxxxxxxx G R S

CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents.

Similar presentations

Presentation on theme: "CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents.

Similar presentations

Presentation on theme: "CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents."— Presentation transcript:

Similar presentations

About project

Feedback