1 Integer Operations
2 Outline Arithmetic Operations –overflow –Unsigned addition, multiplication –Signed addition, negation, multiplication –Using Shift to perform power-of-2 multiply/divide Suggested reading –Chap 2.3 Negation :取反
3 Unsigned Addition u v + u + v True Sum: w+1 bits Operands: w bits Discard Carry: w bits UAdd w (u, v)
4 Unsigned Addition Standard Addition Function –Ignores carry output Implements Modular Arithmetic –s = UAddw(u, v) = (u + v) mod 2 w P67 (2.9)
5 Visualizing Unsigned Addition P68 Figure 2.16 Wraps Around –If true sum ≥ 2 w –At most once UAdd 4 (u, v) u v Overflow 0 2w2w 2 w+1 True Sum Modular Sum Overflow Module: 取模
6 Unsigned Addition Forms an Abelian Group P68 Closed under addition –0 UAdd w (u, v) 2 w –1 Commutative (交换律) –UAdd w (u, v) = UAdd w (v, u) Associative (结合律) –UAdd w (t, UAdd w (u,v)) = UAdd w (UAdd w (t, u ), v)
7 Unsigned Addition Forms an Abelian Group 0 is additive identity –UAdd w (u, 0) = u Every element has additive inverse –Let UComp w (u ) = 2 w – u –UAddw(u, UComp w (u )) = 0 P68 ( 2.10 )
8 Signed Addition Functionality –True sum requires w+1 bits –Drop off MSB –Treat remaining bits as 2’s comp. integer PosOver : Positive Overflow NegOver : Negative Overflow P70 ( 2.12 )
9 Signed Addition P70 Figure 2.17 u v < 0> 0 < 0 > 0 NegOver PosOver TAdd(u, v) –2 w –1 –2 w 0 2 w –1 True Sum TAdd Result 1 000… … … … …1 100…0 000…0 011…1 PosOver NegOver
10 Visualizing 2’s Comp. Addition Values –4-bit two’s comp. –Range from -8 to +7 Wraps Around –If sum 2 w-1 Becomes negative –If sum < –2 w–1 Becomes positive
11 Visualizing 2’s Comp. Addition P72 Figure 2.19 TAdd 4 (u, v) u v PosOver NegOver
12 Detecting Tadd Overflow P71 Task –Given s = TAdd w (u, v) –Determine if s = Add w (u, v) Claim –Overflow iff either: u, v < 0, s 0(NegOver) u, v 0, s < 0(PosOver) –ovf = (u<0 == v<0) && (u<0 != s<0); 0 2 w –1 PosOver NegOver
13 Mathematical Properties of TAdd Two’s Complement Under TAdd Forms a Group –Closed, Commutative, Associative, 0 is additive identity –Every element has additive inverse Let TAdd w (u, TComp w (u )) = 0 P73 ( 2.13 )
14 Mathematical Properties of TAdd Isomorphic Algebra to UAdd –TAdd w (u, v) = U2T (UAdd w (T2U(u ), T2U(v))) Since both have identical bit patterns –T2U(TAdd w (u, v)) = UAdd w (T2U(u ), T2U(v)) Isomorphic :同构
15 Negating with Complement & Increment P73 In C – ~x + 1 == -x Complement –Observation: ~x + x == 1111…111 == x ~x~x ~x : Complement
16 Signed Addition Increment –~x + 1 = ~x +[x + (-x)] +1 –(~x + x) + -x + 1 ==-1 + (-x + 1) == -x –So, –~x + 1== -x
17 Multiplication P75 Computing Exact Product of w-bit numbers x, y –Either signed or unsigned Ranges –Unsigned: 0 ≤ x * y ≤ (2 w – 1) 2 = 2 2w – 2 w Up to 2w bits –Two’s complement min: x *y ≥–2 w–1 *(2 w–1 –1) = –2 2w–2 + 2 w–1 Up to 2w–1 bits –Two’s complement max: x * y ≤ (–2 w–1 ) 2 = 2 2w–2 Up to 2w bits, but only for TMinw 2
18 Multiplication Maintaining Exact Results –Would need to keep expanding word size with each product computed –Done in software by “arbitrary precision” arithmetic packages
19 Power-of-2 Multiply with Shift u 2k2k * u · 2 k True Product: w+k bits Operands: w bits Discard k bits: w bits UMult w (u, 2 k ) k 000 TMult w (u, 2 k ) 000
20 Power-of-2 Multiply with Shift Operation –u << k gives u * 2 k –Both signed and unsigned Examples –u << 3==u * 8 –u << 5 - u << 3==u * 24 –Most machines shift and add much faster than multiply Compiler will generate this code automatically
21 Unsigned Power-of-2 Divide with Shift Quotient of Unsigned by Power of 2 –u >> k gives u / 2 k –Uses logical shift u 2k2k / u / 2 k Division: Operands: k 000 u / 2 k 00 0 Quotient:. Binary Point
22 2’s Comp Power-of-2 Divide with Shift P77 Quotient of Signed by Power of 2 –u >> k gives u / 2 k –Uses arithmetic shift –Rounds wrong direction when u < u 2k2k / u / 2 k Division: Operands: k 0 RoundDown(u / 2 k ) Result:. Binary Point 0
23 Correct Power-of-2 Divide Quotient of Negative Number by Power of 2 –Want u / 2 k (Round Toward 0) –Compute as (u+2 k -1)/ 2 k In C: (u + (1 > k Biases divided toward 0 Quotient :商
24 Correct Power-of-2 Divide Divisor: Dividend: Case 1: No rounding u 2k2k / u / 2 k k Binary Point k +– Biasing has no effect
25 Correct Power-of-2 Divide Divisor: Dividend: Case 2: Rounding u 2k2k / u / 2 k k Binary Point k +–1 1 Biasing adds 1 to final result Incremented by 1
26 Floating Point
27 Topics Fractional Binary Numbers IEEE 754 Standard Rounding Mode FP Operations Floating Point in C Suggested Reading: Chap 2.4
28 Encoding Rational Numbers P80 Form V = Very useful when >> 0 or <<1 An Approximation to real arithmetic From programmer’s perspective –Uninteresting –Arcane and incomprehensive * Arcane :神秘的 * Incomprehensive: 不可理解的
29 Encoding Rational Numbers Until 1980s –Many idiosyncratic formats, fast speed, easy implementation, less accuracy IEEE 754 –Designed by W. Kahan for Intel processors –Based on a small and consistent set of principles, elegant, understandable, hard to make go fast Idiosyncratic: 特殊的 Elegant :雅致的
30 Fractional Binary Numbers bmbm b m–1 b2b2 b1b1 b0b0 b –1 b –2 b –3 b–nb–n m–1 2m2m 1/2 1/4 1/8 2–n2–n
31 Fractional Binary Numbers Bits to right of “binary point” represent fractional powers of 2 Represents rational number: 2 i P81 ( 2.17 )
32 Fractional Numbers to Binary Bits unsigned result_bits=0, current_bit=0x for (i=0;i<32;i++) { x *= 2 if ( x>= 1 ) { result_bits |= current_bit ; if ( x == 1) break ; x -= 1 ; } current_bit >> 1 ; }
33 Fraction Binary Number Examples ValueBinary Fraction [0011] Observations: –The form …11 represent numbers just below 1.0 which is noted as 1.0- –Binary Fractions can only exactly represent x/2 k –Others have repeated bit patterns
34 IEEE Floating-Point Representation P83 Numeric form –V=(-1) s M 2 E Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range [1.0,2.0). Exponent E weights value by power of two
35 IEEE Floating-Point Representation Encoding – –s is sign bit –exp field encodes E –frac field encodes M Sizes –Single precision (32 bits): 8 exp bits, 23 frac bits –Double precision (64 bits): 11 exp bits, 52 frac bits sexpfrac
36 Normalize Values P84 Condition – exp 000 … 0 and exp 111 … 1 Exponent coded as biased value –E = Exp – Bias Exp : unsigned value denoted by exp Bias : Bias value –Single precision: 127 (Exp: 1…254, E : -126…127) –Double precision: 1023 (Exp: 1…2046, E : …1023) –In general: Bias = 2 m-1 - 1, where m is the number of exponent bits
37 Normalize Values Significand coded with implied leading 1 –m = 1.xxx … x 2 xxx … x : bits of frac Minimum when 000 … 0 (M = 1.0) Maximum when 111 … 1 (M = 2.0 – ) Get extra leading bit for “free”
38 Normalized Encoding Examples Value: (Hex: 0x3039) Binary bits: Fraction representation: *2 13 M: E: (140) Binary Encoding – –4640E400
39 Denormalized Values P84 Condition –exp = 000 … 0 Values –Exponent Value: E = 1 – Bias –Significant Value m = 0.xxx … x 2 xxx … x : bits of frac
40 Denormalized Values Cases –exp = 000 … 0, frac = 000 … 0 Represents value 0 Note that have distinct values +0 and –0 –exp = 000 … 0, frac 000 … 0 Numbers very close to 0.0 Lose precision as get smaller “Gradual underflow”
41 Special Values P85 Condition –exp = 111 … 1
42 Special Values exp = 111 … 1, frac = 000 … 0 –Represents value (infinity) –Operation that overflows –Both positive and negative –E.g., 1.0/0.0 = 1.0/ 0.0 = + , 1.0/ 0.0 =
43 Special Values exp = 111 … 1, frac 000 … 0 –Not-a-Number (NaN) –Represents case when no numeric value can be determined –E.g., sqrt(–1),
44 Summary of Real Number Encodings P85 Figure 2.22 NaN ++ 00 +Denorm+Normalized -Denorm -Normalized +0
45 8-bit Floating-Point Representations s expfrac
46 8-bit Floating-Point Representations ExpexpE2 E /64(denorms) / / / / / / n/a(inf, NaN)
47 Dynamic Range (Denormalized numbers) P86 Figure 2.23 s exp frac E Value /8*1/64 = 1/ /8*1/64 = 2/512 … /8*1/64 = 6/ /8*1/64 = 7/512
48 Dynamic Range s exp frac E Value /8*1/64 = 8/ /8*1/64 = 9/512 … /8*1/2 = 14/ /8*1/2 = 15/ /8*1 = /8*1 = 9/8
49 Dynamic Range (Denormalized numbers) s exp frac E Value /8*1 = 10/8 … /8*128 = /8*128 = n/ainf
50 Distribution of Representable Values 6-bit IEEE-like format –K = 3 exponent bits –n = 2 significand bits –Bias is 3 Notice how the distribution gets denser toward zero.
51 Distribution of Representable Values
52 Interesting Numbers P88 Figure 2.24
53 Special Properties of Encoding FP Zero Same as Integer Zero –All bits = 0 Can (Almost) Use Unsigned Integer Comparison –Must first compare sign bits –Must consider -0 = 0 –NaNs problematic Will be greater than any other values – Otherwise OK Denorm vs. normalized Normalized vs. infinity
54 Round Mode P89 Round down: –rounded result is close to but no greater than true result. Round up: –rounded result is close to but no less than true result.
55 Round Mode P90 Figure 2.25 Mode Round-to-Even Round-toward-zero1112 Round-down Round-up2223
56 Round-to-Even Default Rounding Mode –Hard to get any other kind without dropping into assembly –All others are statistically biased Sum of set of positive numbers will consistently be over- or under- estimated
57 Round-to-Even P89 Applying to Other Decimal Places –When exactly halfway between two possible values Round so that least significant digit is even –E.g., round to nearest hundredth (Less than half way) (Greater than half way) (Half way—round up) (Half way—round down)
58 Rounding Binary Number P89 “Even” when least significant bit is 0 Half way when bits to right of rounding position = 100… 2 ValueBinaryRoundedActionRound Decimal 2 3/ Down2 2 3/ Up2 1/4 2 7/ Up3 2 5/ Down2 1/2
59 Floating-Point Operations Conceptual View –First compute exact result –Make it fit into desired precision Possibly overflow if exponent too large Possibly round to fit into frac
60 Mathematical Properties of FP Add Compare to those of Abelian Group –Closed under addition?YES But may generate infinity or NaN –Commutative?YES –Associative?NO Overflow and inexactness of rounding –0 is additive identity?YES –Every element has additive inverseALMOST Except for infinities & NaNs
61 Mathematical Properties of FP Add Monotonicity –a ≥ b a+c ≥ b+c?ALMOST Except for infinities & NaNs
62 Algebraic Properties of FP Mult Compare to Commutative Ring –Closed under multiplication?YES But may generate infinity or NaN –Multiplication Commutative?YES –Multiplication is Associative? P92 NO Possibility of overflow, inexactness of rounding –1 is multiplicative identity?YES –Multiplication distributes over addition?NO Possibility of overflow, inexactness of rounding
63 Algebraic Properties of FP Mult P90 Monotonicity –a ≥ b & c ≥ 0 a *c ≥ b *c?ALMOST Except for infinities & NaNs
64 FP Multiplication Operands (– 1) s1 M1 2 E1 (– 1) s2 M2 2 E2 Exact Result (– 1) s M 2 E –Sign s : s1 ^ s2 –Significand M : M1 * M2 –Exponent E : E1 + E2
65 FP Multiplication Fixing –If M ≥ 2, shift M right, increment E –If E out of range, overflow –Round M to fit frac precision
66 FP Addition Operands (– 1) s1 M1 2 E1 (– 1) s2 M2 2 E2 –Assume E1 > E2 Exact Result (– 1) s M 2 E –Sign s, significand M: Result of signed align & add –Exponent E : E1
67 FP Addition Fixing –If M ≥ 2, shift M right, increment E –if M < 1, shift M left k positions, decrement E by k –Overflow if E out of range –Round M to fit frac precision
68 FP Addition (– 1) s1 m1 (– 1) s2 m2 E1–E2 + (– 1) s m
69 Answers to Floating Point Puzzles int x = …; float f = …; double d = …; Assume neither d nor f is NAN or infinity
70 Floating Point in C x == (int)(float) xNo: 24 bit significand x == (int)(double) xYes: 53 bit significand f == (float)(double) fYes: increases precision d == (float) dNo: loses precision f == -(-f);Yes: Just change sign bit 2/3 == 2/3.0No: 2/3 == 0 d < 0.0 ((d*2) < 0.0)Yes! d > f -f < -dNo d *d >= 0.0Yes! (d+f)-d == fNo: Not associative
71 Answers to Floating Point Puzzles C Guarantees Two Levels –floatsingle precision –doubledouble precision
72 Answers to Floating Point Puzzles Conversions –Casting between int, float, and double changes numeric values – Double or float to int Truncates fractional part Like rounding toward zero Not defined when out of range –Generally saturates to TMin or TMax – int to double Exact conversion, as long as int has ≤ 53 bit word size – int to float Will round according to rounding mode