Download presentation
Presentation is loading. Please wait.
Published byNoel Randolf Carter Modified over 9 years ago
1
Number Representation Fixed and Floating Point
No Method Capable of Representing ALL Real Numbers Using Finite Register Lengths Must Use Approximations to Represent Values Concentrate on Two Forms: Fixed Point Floating Point Others are: Rational Number Systems – uses ratios of integers Logarithmic Number Systems – uses signs and logarithms of values
2
Fixed Versus Floating Point
Fixed Point Values Represent Values where Any Two Differ by 1 unit in the last place (ulp) Equal Spacing Between Numbers Floating Point Values Use Two Multi-Bit Words Mantissa Exponent Both Forms Must be Capable of Representing Signed Quantities Fixed Point Values CAN be Used to Represent Fractional Quantities
3
Floating Point Characteristics
Total Number of Representations = Total Bit Strings For n-bit Register we have 2n Range of Value is Larger than Fixed Point Precision of Value is Smaller Distance Between Two Consecutive Values Increases
4
Floating Point s e m s – Sign Bit (signed magnitude)
e – Exponent (in 2’s Complement Form) m – Mantissa (significand or fraction) mMAX=1 - ulp; [0,1) hidden bit float – BIAS = 127 (32 bits-23 for m and 8 for e) double – BIAS=1023 (64 bits-52 for m and 11 for e) Sign of Exponent is Complement of it’s MSb Thus, adding/subtracting bias is just complementation of MSb
5
Floating Point Example
double = bfe80000 Big Endian – MSW has Higher Address s e m s = 1; e = 1022; m = 0.5 Value = (-1)11.5 2( ) Value = -(1.5)(0.5) = -0.75
6
Floating Point Normalization
Redundant /representations are Possible! Hidden Bit Helps Out of All Possible Representations, Choose One With Fewest Leading Zeros in Significand This is Normalization After Performing Arithmetic, Renormalization May Need to be Accomplished
7
Floating Point Special Numbers
Value v when exponent e and fraction f are special values (IEEE standard) Note: NaN = Not a Number
8
IEEE/ANSI 754/854 Standard
9
Denormalized Numbers Allows for Gradual Degradation for Underflow
10
Denormals
11
Operations – Internal Precision
12
Floating Point Addition/Subtraction
13
Floating Point Multiplication/Division
14
Conversions and Roundings
15
Exceptions
16
Rounding Schemes Signed Magnitude Two’s Complement
17
Round to Nearest (Signed Magnitude)
18
Rounding Comments
19
Round to Nearest Even/Odd
Round to Nearest Odd (R*)
20
Jamming/von Neumann Rounding
21
ROM Rounding
22
Rounding
23
Rounding Examples Round Towards + Downward Directed Rounding
24
Floating Point Operations
25
Adders/Subtractors
26
Operand Packing/Unpacking
27
Other Key Parts of FP Add/Sub Unit
28
Pre-Shifting
29
Four-stage Combinational Shifter
Pre-shifts Operand by 0 to 15 Bits
30
Leading Zeros/Ones – Counting vs. Prediction
31
Leading Zeros Prediction
32
Guard Digits What is the smallest number of extra digits needed for rounding? post-normalization? Multiplication – Double Length Result Add/Sub w/ differing exp. – Can have Double Length Result FP Unit Provides One Length Result
33
Significand Ranges Assume Significand M(0,1-ulp]
Then Normalized M ranges as: Multiplication: prod=M1M2 For postnormalization need at most one shift left to get:
34
Significand Ranges (cont)
Division: quot=M1M2 Need at most one shift right to get: Conclusion: 1 Extra Digit Needed for Postnormalization 1 Extra Digit Needed for Round-to-Nearest 2 Extra Digits Needed G - guard R - round
35
“Sticky Bit” in std754 Round-to-Nearest-Even Requires 1 Extra Bit
The “sticky bit”, S Turns out to be Logical-OR of Other Additional Bits
36
Floating Point Multiplier
37
Floating Point Divider
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.