Download presentation
Presentation is loading. Please wait.
1
Arithmetic for Computers
Chapter 3 Arithmetic for Computers
3
Morgan Kaufmann Publishers
14 September, 2018 Floating Point §3.5 Floating Point Representation for non-integral numbers Including very small and very large numbers Like scientific notation –2.34 × 1056 × 10–4 × 109 Normalized number: A number in floating-point notation that has no leading 0s In binary ±1.xxxxxxx2 × 2yyyy Types float and double in C Floating point: Computer arithmetic that represents numbers in which the binary point is not fixed normalized not normalized Chapter 3 — Arithmetic for Computers
4
LEGv8 floating-point number
Floating point numbers are of the form (-1)s x F x 2E S: sign bit (0 non-negative, 1 negative), E: exponent, F: fraction Numbers representable in computer Smallest fractions 2.0ten × 10−38 Largest numbers 2.0ten × 1038 overflow (floating-point): a positive exponent becomes too large to fit in the exponent field. Raises exception underflow (floating-point): a negative exponent becomes too large to fit in the exponent field. Raises interrupt 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 S exponent fraction 1 bit bits bits
5
Floating Point Standard
Morgan Kaufmann Publishers Floating Point Standard 14 September, 2018 Defined by IEEE Std Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit) Double precision (64-bit) IEEE 754 makes the leading 1 bit of normalized binary numbers implicit. Single precision: 24 bits (implied 1 and a 23-bit fraction), and double precision: 53 bits (1 +52). Significand: represents the 24- or 53-bit number (1 plus the fraction) Fraction: 23- or 52-bit number. Since 0 has no leading 1, it is given the reserved exponent value 0 so that the hardware won’t attach a leading 1 to it. Chapter 3 — Arithmetic for Computers
6
IEEE 754 Floating Point Standard
00 … 00two represents 0; Representation of the rest of the numbers: (-1)s x (1+Fraction) x 2E Special symbols to represent unusual events. For example, instead of interrupting on a divide by 0, software can set the result to a bit pattern representing +∞ or −∞; the largest exponent is reserved for these special symbols. NaN (Not a Number): symbol for the result of invalid operations, such as 0/0 or subtracting infinity from infinity.
7
LEGv8 floating-point number
Floating point numbers are of the form (-1)s x F x 2E S: sign bit (0 non-negative, 1 negative), E: exponent, F: fraction Numbers representable in computer Smallest fractions 2.0ten × 10−38 Largest numbers 2.0ten × 1038 overflow (floating-point): a positive exponent becomes too large to fit in the exponent field. Raises exception underflow (floating-point): a negative exponent becomes too large to fit in the exponent field. Raises interrupt 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 S exponent fraction 1 bit bits bits
8
Floating Point Standard
Morgan Kaufmann Publishers Floating Point Standard 14 September, 2018 Defined by IEEE Std Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit) Double precision (64-bit) IEEE 754 makes the leading 1 bit of normalized binary numbers implicit. Single precision: 24 bits (implied 1 and a 23-bit fraction), and double precision: 53 bits (1 +52). Significand: represents the 24- or 53-bit number (1 plus the fraction) Fraction: 23- or 52-bit number. Since 0 has no leading 1, it is given the reserved exponent value 0 so that the hardware won’t attach a leading 1 to it. Chapter 3 — Arithmetic for Computers
9
IEEE 754 Floating Point Standard
00 … 00two represents 0; Representation of the rest of the numbers: (-1)s x (1+Fraction) x 2E Special symbols to represent unusual events. For example, instead of interrupting on a divide by 0, software can set the result to a bit pattern representing +∞ or −∞; the largest exponent is reserved for these special symbols. NaN (Not a Number): symbol for the result of invalid operations, such as 0/0 or subtracting infinity from infinity.
10
IEEE 754 Floating Point Standard
Floating-point representation to process integer comparisons Consequence: sign is in the most significant bit Placing the exponent before the significand simplifies the sorting of floating-point numbers using integer comparison instructions numbers with bigger exponents look larger than numbers with smaller exponents, as long as both exponents have the same sign Negative exponents pose a challenge to simplified sorting Single precision representation of 1.0two x 2-1: exponent looks like a big number The value 1.0two × 2+1 would look like the smaller binary number 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
11
IEEE 754 Floating-Point Format
Morgan Kaufmann Publishers IEEE 754 Floating-Point Format 14 September, 2018 single: 8 bits double: 11 bits single: 23 bits double: 52 bits S Exponent Fraction Most negative exponent represented as and most positive exponent represented as Biased notation: Exponent: unsigned representation = actual exponent + Bias Ensures exponent is unsigned Single precision: Bias = 127; Double precision: Bias = 1203 exponent −1 represented by −1 +127ten = 126ten = two, and Exponent +1 represented by = 128ten = two Chapter 3 — Arithmetic for Computers
12
Single-Precision Range
Morgan Kaufmann Publishers 14 September, 2018 Single-Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value Exponent: actual exponent = 1 – 127 = –126 Fraction: 000…00 significand = 1.0 ±1.0 × 2–126 ≈ ±1.2 × 10–38 Largest value exponent: actual exponent = 254 – 127 = +127 Fraction: 111…11 significand ≈ 2.0 ±2.0 × ≈ ±3.4 × 10+38 Chapter 3 — Arithmetic for Computers
13
Double-Precision Range
Morgan Kaufmann Publishers 14 September, 2018 Double-Precision Range Exponents 0000…00 and 1111…11 reserved Smallest value Exponent: actual exponent = 1 – 1023 = –1022 Fraction: 000…00 significand = 1.0 ±1.0 × 2–1022 ≈ ±2.2 × 10–308 Largest value Exponent: actual exponent = 2046 – 1023 = +1023 Fraction: 111…11 significand ≈ 2.0 ±2.0 × ≈ ±1.8 × Chapter 3 — Arithmetic for Computers
14
Floating-Point Precision
Morgan Kaufmann Publishers 14 September, 2018 Floating-Point Precision Relative precision all fraction bits are significant Single: approx 2–23 Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision Double: approx 2–52 Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision Chapter 3 — Arithmetic for Computers
15
Floating-Point Example
Morgan Kaufmann Publishers 14 September, 2018 Floating-Point Example Represent –0.75 –0.75 = (–1)1 × 1.12 × 2–1 S = 1 Fraction = 1000…002 Exponent = –1 + Bias Single: – = 126 = Double: – = 1022 = Single: …00 Double: …00 Chapter 3 — Arithmetic for Computers
16
Floating-Point Example
Morgan Kaufmann Publishers 14 September, 2018 Floating-Point Example What number is represented by the single-precision float …00 S = 1 Fraction = 01000…002 = 1 ×2−2 = ¼ = 0.25 exponent = = 129 x = (–1)1 × ( ) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0 Chapter 3 — Arithmetic for Computers
17
Morgan Kaufmann Publishers
Denormal Numbers 14 September, 2018 Exponent = hidden bit is 0 Smaller than normal numbers allow for gradual underflow, with diminishing precision Denormal with fraction = Two representations of 0.0! Chapter 3 — Arithmetic for Computers
18
Morgan Kaufmann Publishers
14 September, 2018 Infinities and NaNs Exponent = , Fraction = ±Infinity Can be used in subsequent calculations, avoiding need for overflow check Exponent = , Fraction ≠ Not-a-Number (NaN) Indicates illegal or undefined result e.g., 0.0 / 0.0 Can be used in subsequent calculations Chapter 3 — Arithmetic for Computers
19
Floating-Point Addition
Morgan Kaufmann Publishers 14 September, 2018 Floating-Point Addition Consider a 4-digit decimal example 9.999 × × 10–1 1. Align decimal points Shift number with smaller exponent 9.999 × × 101 2. Add significands 9.999 × × 101 = × 101 3. Normalize result & check for over/underflow × 102 4. Round and renormalize if necessary 1.002 × 102 Chapter 3 — Arithmetic for Computers
20
Algorithm for binary floating-point addition
21
Binary Floating-Point Addition
Morgan Kaufmann Publishers 14 September, 2018 Binary Floating-Point Addition Now consider a 4-digit binary example × 2–1 + – × 2–2 (0.5 + –0.4375) 1. Align binary points Shift number with smaller exponent × 2–1 + – × 2–1 2. Add significands × 2–1 + – × 2–1 = × 2–1 3. Normalize result & check for over/underflow × 2–4, Since 127 ≥−4 ≥−126, no overflow/underflow 4. Round and renormalize if necessary × 2–4 (no change) = Chapter 3 — Arithmetic for Computers
22
Morgan Kaufmann Publishers
14 September, 2018 FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long Much longer than integer operations Slower clock would penalize all instructions FP adder usually takes several cycles Can be pipelined Chapter 3 — Arithmetic for Computers
23
Morgan Kaufmann Publishers
FP Adder Hardware 14 September, 2018 Step 1 Step 2 Step 3 Step 4 Chapter 3 — Arithmetic for Computers
24
Floating-Point Multiplication
Morgan Kaufmann Publishers 14 September, 2018 Floating-Point Multiplication Consider a 4-digit decimal example 1.110 × 1010 × × 10–5 1. Add exponents For biased exponents, subtract bias from sum New exponent = 10 + –5 = 5 2. Multiply significands 1.110 × = × 105 3. Normalize result & check for over/underflow × 106 4. Round and renormalize if necessary 1.021 × 106 5. Determine sign of result from signs of operands × 106 Chapter 3 — Arithmetic for Computers
25
Floating-Point Multiplication
26
Binary Floating-Point Multiplication
Morgan Kaufmann Publishers 14 September, 2018 Binary Floating-Point Multiplication Now consider a 4-digit binary example × 2–1 × – × 2–2 (0.5 × –0.4375) 1. Add exponents Unbiased: –1 + –2 = –3 Biased: (– ) + (– ) = – – 127 = – 2. Multiply significands × = × 2–3 3. Normalize result & check for over/underflow × 2–3 (no change), since 127 ≥−3 ≥−126, no overflow/underflow 4. Round and renormalize if necessary × 2–3 (no change) 5. Determine sign: +ve × –ve –ve – × 2–3 = two = - 7/25ten = - 7/32ten = – ten Chapter 3 — Arithmetic for Computers
27
FP Arithmetic Hardware
Morgan Kaufmann Publishers 14 September, 2018 FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does Addition, subtraction, multiplication, division, reciprocal, square-root FP integer conversion Operations usually takes several cycles Can be pipelined Chapter 3 — Arithmetic for Computers
28
FP Instructions in LEGv8
Morgan Kaufmann Publishers 14 September, 2018 FP Instructions in LEGv8 Separate FP registers 32 single-precision: S0, …, S31 32 double-precision: DS0, …, D31 Sn stored in the lower 32 bits of Dn FP instructions operate only on FP registers Programs generally don’t do integer ops on FP data, or vice versa More registers with minimal code-size impact FP load and store instructions LDURS, LDURD STURS, STURD Chapter 3 — Arithmetic for Computers
29
FP Instructions in LEGv8
Morgan Kaufmann Publishers 14 September, 2018 FP Instructions in LEGv8 Single-precision arithmetic FADDS, FSUBS, FMULS, FDIVS e.g., FADDS S2, S4, S6 Double-precision arithmetic FADDD, FSUBD, FMULD, FDIVD e.g., FADDD D2, D4, D6 Single- and double-precision comparison FCMPS, FCMPD Sets or clears FP condition-code bits Branch on FP condition code true or false B.cond Chapter 3 — Arithmetic for Computers
30
Morgan Kaufmann Publishers
14 September, 2018 FP Example: °F to °C C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in S12, result in S0, literals in global memory space Compiled LEGv8 code: f2c: LDURS S16, [X27,const5] // S16 = 5.0 (5.0 in memory) LDURS S18, [X27,const9] // S18 = 9.0 (9.0 in memory) FDIVS S16, S16, S // S16 = 5.0 / 9.0 LDURS S18, [X27,const32] // S18 = 32.0 FSUBS S18, S12, S // S18 = fahr – 32.0 FMULS S0, S16, S // S0 = (5/9)*(fahr – 32.0) BR LR // return Chapter 3 — Arithmetic for Computers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.