Floating Point Numbers Binary 5 Floating Point Numbers
Floating Point Numbers Floating point numbers are real numbers. In Java, this just means any numbers that aren’t integers (whole numbers) For example… 2.86 -0.5 4.000 -0.03 3.1415926 7.2 x 109 In Java we use either double or float
Floating Point Numbers A floating point number is represented in three parts: M x BE M mantissa, represented as a number that is >=0 and <1 B base (2 is binary, 10 is decimal) E exponent (the power of the base as a positive or negative)
Floating Point Numbers For example, let’s consider what happens when we write 0.06779 in the format M x BE Scientific notation uses one significant digit before the decimal point, multiplied by the base to the appropriate power eg: 6.779 x 10-2 Normalised notation has no significant digits before the decimal point, multiplied by the base to the appropriate power eg: 0.6779 x 10-1
Floating Point Numbers Floating point numbers are stored in normalised form as this is the most efficient way. Examples of normalisation (decimal): 0.00156 x 10-4 0.156 x 10-6 1732.1 0.17321 x 104 164 x 10-2 0.164 x 101 794 x 10-5 0.794 x 10-2
Floating Point Numbers Examples of normalisation (binary): 1101 x 2100 0.1101 x 21000 1101.001 x 2-111 0.1101001 x 2-011 Note: This examples have used unsigned binary, with a minus symbol where the exponent is negative
To “normalise” a number… Move the decimal point to put a 0 to its left, and a non-zero digit to the right. Count how many places the decimal point moved. If the decimal point moved… left, increase the exponent right, decrease the exponent … by the same amount.
Floating Point Numbers Practice normalising the following decimal numbers: 45.1 x 101 = 0.063 x 103 = -178.135 x 10-8 = 0.0076 x 104 = 1200.21 x 10-12 =
Floating Point Numbers Practice normalising the following decimal numbers: 45.1 x 101 = 0.451 x 103(moved 2x to the left, so add 2) 0.063 x 103 = 0.63 x 102(moved 1x to the right, so minus 1) -178.135 x 10-8 = -0.178135 x 10-5 0.0076 x 104 = 0.76 x 102 1200.21 x 10-12 = 0.120021 x 10-8
Floating Point Numbers Practice normalising the following binary numbers (for now, just put in a minus for negatives): e.g. -1101.01x 2010 = -0.101101 x 2110 101.1 x 210 = -0.01101 x 2-101 = 1000.01 x 21100 = 0.0001 x 2-100 = -1101001 = Note: If no exponent is provided, then you start at 0
Floating Point Numbers Practice normalising the following binary numbers 101.1 x 210 = 0.1011 x 2101 -0.01101 x 2-101 = -0.1101x 2-110 1000.01 x 21100 = 0.100001 x 210000 0.0001 x 2-100 = 0.1 x 2-111 -1101001 = -0.1101001 x 2111 Note: If no exponent is provided, then you start at 0
Floating Point Numbers To store a floating number we need to allocate space for the mantissa, exponent and sign There needs to be a balance between: Precision Range Total memory required for storage Type Size Largest Value Smallest Value Precision float 32 bits 3.4 1038 1.4 10-45 6-7 sig figs double 64 bits 1.8 10308 4.9 10-324 14-15 sig figs
Floating Point Numbers Lets look at some examples of floating point representation using a 16-bit word Example 1: 16-bit word with 5-bit 2’s complement exponent and normalised mantissa 0.1 x 201 To convert that to decimal 0.1 == 0.5 201 == 2 0.5 x 2 = 1 Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 00001 10 0000 0000
Floating Point Numbers Lets look at more examples: -0.1101 x 20 To convert that to decimal -0.1101 == -0.8125 20 == 1 -0.8125 x 1 = -0.8125 Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 1 00000 11 0100 0000
Floating Point Numbers Lets look at more examples: As a normalised binary number, this is... -0.101 x 211110 As a regular decimal… mantissa = -0.101 = -0.625 exponent = 11110 = -2 decimal number = -0.625 x 2-2 = -0.15625 Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 1 11110 10 1000 0000
Floating Point Numbers Lets look at more examples: As a normalised binary number, this is… -0.110111011 x 210011 As a regular decimal… mantissa = -0.110111011 = -0.86522 exponent = -13 decimal number = -0.86522 x 2-13 = -0.0001056176… Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 1 10011 1101110110
Floating Point Numbers Lets look at more examples: As a normalised binary number, this is… ?x 2? As a regular decimal… mantissa = exponent = decimal number = = Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 01111 1000010011
Floating Point Numbers Lets look at more examples: As a normalised binary number, this is… 0.1000010011 x 201111 As a regular decimal… mantissa = 0.5371 exponent = 15 decimal number = 0.5371 x 215 = 17600.35… Sign Bit of mantissa (1 bit) Exponent in 2’s complement (5 bits) Normalised mantissa (10 bits) 01111 1000010011
Problems with floating point Floating point numbers are often an inexact representation Many decimals ie. 0.1, 0.2 & 0.4 are recurring binary fractions and cannot be represented exactly When carrying out calculations, the size of the mantissa changes Two examples using a 4-digit mantissa: 0.6152 x 103 0.6152 x 105 - 0.6151 x 103 x 0.6151 x 105 ----------------------------- ----------------------------- 0.0001 x 103 = 0.1 x 100 0.3875 x 105 (Should be 0.387471. .so it loses precision)
Helpful Hint Always convert a decimal to binary BEFORE attempting to normalise
Floating Point Numbers Represent the following binary numbers using a 16-bit representation, with… 1 bit for the exponent sign 1 bit for the mantissa sign 6 bits for the exponent 8 bits for the mantissa 0.011 x 2100 -0.0011 111000 10.11 x 2-111
Precision When using an 8-bit two’s complement representation, which of the following decimals can be represented precisely? (that is, without any rounding off) 128.5 0.1 0.03125