Download presentation
Presentation is loading. Please wait.
1
Floating Point Numbers
Photocopiable/digital resources may only be copied by the purchasing institution on a single site and for their own use © ZigZag Education, 2016
2
Floating Point Floating point is an alternative method of representing binary numbers with a fractional part. As the name suggests, the binary point can move or ‘float’. A floating-point number is divided into two parts: the mantissa and the exponent. 1 1 Mantissa Exponent This is the actual number Sets the position of the binary point © ZigZag Education, 2016
3
Floating Point There are many different formats of floating-point representation; here we are going to be working with the two’s complement version. 8 4 2 1 1/2 1/4 1/8 1/16 1 Mantissa Exponent The value of the exponent is 3, so we need to move the binary point three places to the right. 4 + 2 + 0.5 = 6.5 © ZigZag Education, 2016
4
Another Example Both the mantissa and the exponent are in two’s complement format. If either starts with a 1 it needs to be converted to a positive number by flipping the bits and adding 1. 16 8 4 2 1 1/2 1/4 1/8 4 2 1 1/2 1/4 1/8 1/16 1/32 4 2 1 1/2 1/4 1/8 1/16 1/32 1 Mantissa Exponent 8 + 2 + 0.25 + 0.125 = 10.375 We know the result is a negative number because the mantissa is negative. © ZigZag Education, 2016
5
Negative Exponents If the exponent is negative we shift the binary point to the left rather than to the right. First we have to convert the exponent to a positive value to work out the number of places the binary point needs to shift. 1 1/2 1/4 1/8 1/16 1/32 1 1 Mantissa Exponent 0.25 © ZigZag Education, 2016
6
The repeated values need to be removed and the binary point moved.
Normalisation When representing numbers in binary we want to ensure we use the smallest number of bits possible; to do this we normalise the numbers. To normalise a number you look at the start of the mantissa to see whether there are any repeated numbers. 1 1 The repeated values need to be removed and the binary point moved. The binary point has moved one place so the exponent needs to be updated by subtracting 1 from it. © ZigZag Education, 2016
7
Comparison It is faster to process calculations using fixed-point numbers compared to floating-point numbers. Floating-point numbers can represent a larger range of numbers with fractional parts when compared to fixed-point numbers. This means that floating-point numbers are best suited to situations where you need to represent a wide range of values. On the other hand, fixed-point numbers are best used when the speed of processing is more important than precision. © ZigZag Education, 2016
8
The difference between the actual number and the rounded number
Rounding Errors Some numbers cannot be represented using the number of bits allocated to them. In this case the number is rounded to the nearest representable number. The rounding error is the difference between the actual value and the rounded value. There are two different methods of measuring the precision of a rounded value: Absolute Error The difference between the actual number and the rounded number Relative Error The difference between the actual number and rounded number divided by the actual number © ZigZag Education, 2016
9
Example If we wanted to represent the decimal value 5.6 using 8-bit fixed-point binary we would end up with: ( ) The absolute error is: 5.6 - = The relative error is: / 5.6 = © ZigZag Education, 2016
10
END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.