Download presentation
1
Binary Arithmetic
2
Binary Arithmetic We will see: How numbers are stored in a computer
The consequences of using fixed-length arithmetic How to convert to and from binary How to represent negative numbers How to represent fractional numbers Fixed-point Floating-point
3
Why Binary Arithmetic? Early computers often used decimal arithmetic
Binary arithmetic is the most robust Only need to distinguish between two different voltage levels A decimal computer would need to distinguish ten different levels Would need a much lower noise level to operate reliably – expensive But could pack far more into a given word length Another possibility is an analogue computer But the presence of noise means low accuracy
4
8 bits, 1 byte, 256 values 3 bits 8 combinations 4 bits 16 combinations Each bit we add increases the number of possible combinations by 2 With n bits, we have 2n combinations Hence 8 bits (1 byte) gives 28 = 256 different combinations
5
How many bits in a date? Using binary values means we must divide our world into powers of 2 We usually pick the nearest power of 2 For the date format “DD/MM/YYYY” 29 to 31 days in a month: 25 = 32 12 months in a year: 24 = 16 9999 possible years: 214 = 16384 So we would need =23 bits in total to represent a date like this This leaves many illegal bit combinations E.g. “32/16/9999”
6
Fixed-length Arithmetic
We are used to working with numbers of any length Computers have to work with fixed-length numbers This causes some interesting problems Suppose we only have 3 decimal digits 000, 001, 002, … 999 Cannot represent the following integers: Negative numbers (no sign) Fractional numbers (no decimal point) Numbers larger than 999 (not enough digits)
7
Problems with Fixed-Length Arithmetic
Fixed-length arithmetic is not closed For example, using 3 decimal digits: = 1200 (too large) = -2 (negative) 050 x 050 = 2500 (too large) 007 / 002 = 3.5 (not an integer) We can divide the problems into three main classes Overflow (too large) Underflow (too small) Not a member of the set
8
Binary-to-Decimal 000 0 = 0x22 + 0x21 + 0x20 Binary Decimal 001 1 = 0x22 + 0x21 + 1x20 010 2 = 0x22 + 1x21 + 0x20 011 3 = 0x22 + 1x21 + 1x20 100 4 = 1x22 + 0x21 + 0x20 101 5 = 1x22 + 0x21 + 1x20 110 6 = 1x22 + 1x21 + 0x20 111 7 = 1x22 + 1x21 + 1x20 By convention, the right-most bit is the least-significant Each subsequent bit is worth twice the one before We can build this idea into an algorithm
9
Decimal-to-Binary This involves the successive division of the decimal number by 2 The remainder at each stage is the next digit of the binary expansion E.g. Converting 58 into binary Divide 58 by 2 = 29 remainder 0 Divide 29 by 2 = 14 remainder 1 Divide 14 by 2 = 7 remainder 0 Divide 7 by 2 = 3 remainder 1 Divide 3 by 2 = 1 remainder 1 Divide 1 by 2 = 0 remainder 1 So 58 is in binary
10
Representations of Negative Integers
There are four main ways of representing negative m-bit binary numbers Sign and magnitude 1’s Complement 2’s Complement Excess 2m-1 All of them have problems But all of them work well with binary addition and subtraction
11
Sign and Magnitude We use one bit for the sign, and the rest for the size of the number E.g. Using an 8-bit binary representation: 58 is -58 is (using sign-magnitude) The range is therefore –127 to 127 Problem: = zero = -zero
12
1’s Complement Also has a sign bit
When negating a number, we simply flip every 0 to a 1 and visa versa E.g. Using an 8-bit binary representation: 58 is -58 is (using 1’s complement) The range is again –127 to 127 We also have two representations for zero = zero = -zero
13
2’s Complement Like 1’s Complement, but to negate a number we negate all the bits and add 1 to the result E.g. Using an 8-bit binary representation: 58 is -58 is (using 2’s complement) This gives us a range of –128 to 127 Only one representation of zero But we still have a problem: –(-128) = -( ) = = -128 InnerInt
14
Excess 2m-1 We store each number as its sum plus 2m-1
E.g. Using an 8-bit binary representation, this is “excess 128” 58 is represented as 186, or -58 is represented as 70, or This is equivalent to 2’s complement with the sign-bit reversed The range is again –128 to 127 Still unequal
15
Fixed-Point Arithmetic
It is useful to be able to represent real numbers, e.g. 1.23, 3.141, … One way we can do this is to reserve a certain number of digits to be the fractional part: E.g = 10.75 (“.1100” = = = 0.75) Note that we don’t need any special hardware to deal with this We just need to keep track of the decimal point Given m bits before and n bits after the decimal point m determines the range n determines the precision
16
Problems with Fixed-point Arithmetic
Fixed-point arithmetic is useful, but limited No good for extremely large or small numbers The mass of the Sun is 2x1033 grams The mass of an electron is 9x10-28 grams To use both these amounts in a calculation we would need over 60 decimal digits, most of which would be irrelevant We need floating-point arithmetic
17
Floating Point Arithmetic
We borrow the familiar scientific notation to represent numbers: 3.14 = x 101 = 0.5 x 10-4 We use powers of 2 instead of 10: N = mantissa x 2exponent (-1 < mantissa < 1) Range depends on exponent Precision depends on mantissa This allows a huge range of numbers to be covered With some loss in accuracy IEEE floating-point standard 754: Single precision: 1 sign bit, 8 exponent, 23 mantissa Double precision: 1 sign bit, 11 exponent, 52 mantissa Reserved values for infinity and NAN (not a number) Inner Float
18
Problems with Floating-point Arithmetic
Negative Overflow Positive Expressible Negative Numbers Positive Numbers Underflow The number line is divided into seven regions Using floating point we can only access three of them The range is not continuous, nor equally sampled 0.998x1099 and 0.999x1099 v.s x100 and 0.999x100 Cannot express some numbers at all, e.g x103 / 3 Need to round them to the nearest expressible number For a fixed number of bits, we must trade off the range against the precision of the representation Requires special hardware for fast performance
19
Summary Computers use a binary representation for numbers
They are bound by the constraints of fixed-length arithmetic Representing negative numbers is something of a challenge No system is ideal We can represent real numbers using fixed-point or floating-point arithmetic Again, neither system is ideal
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.