Topic 3d Representation of Real Numbers Introduction to Computer Systems Engineering (CPEG 323) 2018/9/19 cpeg323-08F\Topic3d-323
Recap What can be represented in n bits? But, what about? Unsigned 0 to 2n - 1 2’s Complement -2n-1 to 2n-1 - 1 BCD 0 to 10n/4 - 1 But, what about? very large numbers 9,349,398,989,787,762,244,859,087,678 very small number 0.0000000000000000000000045691 rationals 2/3 Irrationals 2.71828…., 3.1415926….., 2018/9/19 cpeg323-08F\Topic3d-323
Conceptual Overview: Finite-precision numbers Negative underflow Positive underflow Expressible negative numbers zero Expressible positive numbers Negative overflow Positive overflow ………………. Is there any “underflow” region in integer representation? Between two adjacent integer numbers, there is no other integer. 2018/9/19 cpeg323-08F\Topic3d-323
Representing Real Numbers How to represent fractional parts to the right of the ``decimal'' point? A number like 0.12 (i.e., (1/2)10) not represented well by integers 0 or 1! Two ways to represent real numbers better: Fixed point Floating point 2018/9/19 cpeg323-08F\Topic3d-323
Fixed-Point Data Format 4 2 1 1/2 1/4 …… 1/8 S 1 1 1 . 1 Sign Integer Fractional When representing a real value, you have two choices: the fixed-point format or the floating-point format. Any way, to avoid nonlinear effects introduced by overflow, underflow, saturation, and wrapping during computation, control of each variable’s range has to be done by appropriate scaling of operands. In the floating-point case, the exponent of a variable is part of the run-time representation and is computed automatically by the floating-point ALU. But in the fixed-point case, the exponent of each variable is implicit and determined by the programmer off-line using the predicted dynamic range of the variable. Therefore, the scaling of fixed-point operands has to be done explicitly by the programmer. This is generally a tedious and error-prone process. In fact, you may regard the fixed-point as a form of integer with an additional hypothetical binary point. In our research, the number of bits to the left of the binary-point excluding the sign bit is called Integer Word-Length or IWL. hypothetical binary point 2018/9/19 cpeg323-08F\Topic3d-323
Fixed Point Pros Cons Add two reals just by adding the integers Much less logic than floating point Faster Often used in digital signal processing Cons The range of numbers is narrow number=400 000 000 000 000 000 000 000 000. 000 It is much more economical to represent as 4*1026 2018/9/19 cpeg323-08F\Topic3d-323
Recall Scientific Notation decimal point exponent -23 6.02 x 10 Significand radix (base) Issues: Arithmetic (+, -, *, / ) Representation, Normal form Range and Precision(Determined by?) Rounding and errors Exceptions (e.g., divide by zero, overflow, underflow) Properties 2018/9/19 cpeg323-08F\Topic3d-323
Scientific Notation: Normalized 12.35 x 10^-9 ? 1.235 x 10^-8 ? scientific notation: has a single digit to the left of the decimal point Normalized scientific notation: such a single digit must be non-zero. 2018/9/19 cpeg323-08F\Topic3d-323
Floating Point Representation Numerical Form: (–1)s M 2E Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range [1.0,2.0). Exponent E weights value by power of two Encoding MSB is sign bit: S=0/1 exp field encodes E frac field encodes M s exp frac 2018/9/19 cpeg323-08F\Topic3d-323
IEEE 754 standard Three formats: single/double/extended precision (32,64,80 bits). Single precision: s exp frac 1 bit 8 bits 23 bits Double precision: see page 192 in P&H book 2018/9/19 cpeg323-08F\Topic3d-323
IEEE 754 Standard Floating Point Representation the representation: (–1)s (1+ Fraction) 2E-bias where E is the exponent part in the notation Sign bit s determines whether number is negative or positive Fraction is normally a fractional value in range between 0 and 1 A leading 1 added to the fraction is “implicit” Exponent using a “biased” notation” For single precision – the bias is 127 That is, when you convert the notation to the number it represents, the exponential part used should be reading E out from the IEEE 754 notation, and calculated as E-127. 2018/9/19 cpeg323-08F\Topic3d-323
Advantage of using the biased notation for exponents Under the single-precision IEEE 754 standard: bias = 127 If the real exponent is +1, what is the biased exponent ? Answer: 1+127 = 128! (Note 128-127 = +1!) How about if the real exponent is -1 ? Answer: -1+127 = 126! (Note 126-127 = -1!) How about if the real exponent is -127 ? E -127 = -127, then E = ? 2018/9/19 cpeg323-08F\Topic3d-323
Observations Using biased notation, the exponential part in the single-precision IEEE 754 notation itself never get nagative. 2018/9/19 cpeg323-08F\Topic3d-323
Normalized Encoding Example Value: Float F = 15213.010; 1521310 = 1.1101101101101 X 213 Significand M = 1.1101101101101 frac = 11011011011010000000000 (23 bits! With leading 1 hiding!) Exponent E = 1310, Bias = 12710 Exp = 14010 = 10001100 Floating Point Representation: Binary:01000110011011011011010000000000 exp frac 2018/9/19 cpeg323-08F\Topic3d-323
Denormalized Values Condition Significand value M = 0.xxx…x Cases exp = 000…0 Significand value M = 0.xxx…x xxx…x: bits of frac Cases frac = 000…0 Represents value 0 Note that have distinct values +0 and –0 frac 000…0 Numbers very close to 0.0 Lose precision as get smaller “Gradual underflow” 2018/9/19 cpeg323-08F\Topic3d-323
Special Values (infinity) Condition Not-a-Number (NaN) exp = 111…1 Operation that overflows Both positive and negative E.g., 1.0/0.0 = 1.0/0.0 = +, 1.0/0.0 = Not-a-Number (NaN) Represents case when no numeric value can be determined E.g., sqrt(–1), 2018/9/19 cpeg323-08F\Topic3d-323
IEEE 754: Summary Normalized: ± 0<exp<max Any bit pattern ± Denormalized: ± Any nonzero bit pattern zero: ± Infinite: ± 11…1 NaN: ± 11…1 Any nonzero bit pattern 2018/9/19 cpeg323-08F\Topic3d-323
IEEE754: Summary (Cont.) + 0 +0 -Normalized -Denorm +Denorm NaN NaN 0 +0 Negative overflow Positive underflow Negative underflow Positive overflow 2018/9/19 cpeg323-08F\Topic3d-323
FP Addition Operands Exact Result: (–1)s M 2E (–1)s1 M1 2E1 Assume E1 > E2 Exact Result: (–1)s M 2E Exponent E: E1 Sign s, significand M: Result of signed align & add E1–E2 (–1)s1 M1 (–1)s2 M2 + (–1)s M 2018/9/19 cpeg323-08F\Topic3d-323
Decimal Number Conversion Convert Binary to Decimal (base 2 to base 10) x x x x x. d d d d d d …2 24 23 22 21 20 2-1 2-2 2-3 2-4 2-5 2-6 … … 1101.0112 = 1*23+1*22+0*21+1*20+0*2-1+1*2-2+1*2-3 = 13.37510 2018/9/19 cpeg323-08F\Topic3d-323
Decimal Number Conversion (Cont.) Convert Decimal Integer to binary Integer divide the decimal value by 2 and then write down the remainder from bottom to top (Divide 2 and Get the Remainders) 3710 = ?2 Quotient reminder 37÷2 = 18 … 1 18÷2 = 9 … 0 9÷2 = 4 … 1 4÷2 = 2 … 0 2÷2= 1 … 0 1÷2 = 0 … 1 1001012 Can it always be converted into an accurate binary number? 2018/9/19 YES! cpeg323-08F\Topic3d-323
Decimal Number Conversion (Cont.) Convert Decimal Fraction to Binary Fraction multiply the decimal value by 2 and then write down the integer number from top to bottom (Multiply 2 and Get the Integers) . 0.37510 = ?2 .375 * 2 = 0.75 .75 * 2 = 1.5 .5 * 2 = 1.0 Where to put the binary point? Prior to the First number! 0.0112 Can it always be converted into an accurate binary number? 2018/9/19 NO! cpeg323-08F\Topic3d-323
Decimal Number Conversion (Cont.) Convert Decimal Number to Binary Binary Put Together: Integer Part . Fraction Part 37.37510 = 100101.011 2018/9/19 cpeg323-08F\Topic3d-323
Summary Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning but standards do exist two’s complement IEEE 754 floating point OPcode determines “meaning” of the bit patterns Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation). 2018/9/19 cpeg323-08F\Topic3d-323