Binary Arithmetic
Binary Arithmetic We will see: How numbers are stored in a computer The consequences of using fixed-length arithmetic How to convert to and from binary How to represent negative numbers How to represent fractional numbers Fixed-point Floating-point
Why Binary Arithmetic? Early computers often used decimal arithmetic Binary arithmetic is the most robust Only need to distinguish between two different voltage levels A decimal computer would need to distinguish ten different levels Would need a much lower noise level to operate reliably – expensive But could pack far more into a given word length Another possibility is an analogue computer But the presence of noise means low accuracy
8 bits, 1 byte, 256 values 3 bits 8 combinations 4 bits 16 combinations Each bit we add increases the number of possible combinations by 2 With n bits, we have 2n combinations Hence 8 bits (1 byte) gives 28 = 256 different combinations
How many bits in a date? Using binary values means we must divide our world into powers of 2 We usually pick the nearest power of 2 For the date format “DD/MM/YYYY” 29 to 31 days in a month: 25 = 32 12 months in a year: 24 = 16 9999 possible years: 214 = 16384 So we would need 5+4+14=23 bits in total to represent a date like this This leaves many illegal bit combinations E.g. “32/16/9999”
Fixed-length Arithmetic We are used to working with numbers of any length Computers have to work with fixed-length numbers This causes some interesting problems Suppose we only have 3 decimal digits 000, 001, 002, … 999 Cannot represent the following integers: Negative numbers (no sign) Fractional numbers (no decimal point) Numbers larger than 999 (not enough digits)
Problems with Fixed-Length Arithmetic Fixed-length arithmetic is not closed For example, using 3 decimal digits: 600 + 600 = 1200 (too large) 003 - 005 = -2 (negative) 050 x 050 = 2500 (too large) 007 / 002 = 3.5 (not an integer) We can divide the problems into three main classes Overflow (too large) Underflow (too small) Not a member of the set
Binary-to-Decimal 000 0 = 0x22 + 0x21 + 0x20 Binary Decimal 001 1 = 0x22 + 0x21 + 1x20 010 2 = 0x22 + 1x21 + 0x20 011 3 = 0x22 + 1x21 + 1x20 100 4 = 1x22 + 0x21 + 0x20 101 5 = 1x22 + 0x21 + 1x20 110 6 = 1x22 + 1x21 + 0x20 111 7 = 1x22 + 1x21 + 1x20 By convention, the right-most bit is the least-significant Each subsequent bit is worth twice the one before We can build this idea into an algorithm
Decimal-to-Binary This involves the successive division of the decimal number by 2 The remainder at each stage is the next digit of the binary expansion E.g. Converting 58 into binary Divide 58 by 2 = 29 remainder 0 Divide 29 by 2 = 14 remainder 1 Divide 14 by 2 = 7 remainder 0 Divide 7 by 2 = 3 remainder 1 Divide 3 by 2 = 1 remainder 1 Divide 1 by 2 = 0 remainder 1 So 58 is 111010 in binary
Representations of Negative Integers There are four main ways of representing negative m-bit binary numbers Sign and magnitude 1’s Complement 2’s Complement Excess 2m-1 All of them have problems But all of them work well with binary addition and subtraction
Sign and Magnitude We use one bit for the sign, and the rest for the size of the number E.g. Using an 8-bit binary representation: 58 is 00111010 -58 is 10111010 (using sign-magnitude) The range is therefore –127 to 127 Problem: 00000000 = zero 10000000 = -zero
1’s Complement Also has a sign bit When negating a number, we simply flip every 0 to a 1 and visa versa E.g. Using an 8-bit binary representation: 58 is 00111010 -58 is 11000101 (using 1’s complement) The range is again –127 to 127 We also have two representations for zero 00000000 = zero 11111111 = -zero
2’s Complement Like 1’s Complement, but to negate a number we negate all the bits and add 1 to the result E.g. Using an 8-bit binary representation: 58 is 00111010 -58 is 11000110 (using 2’s complement) This gives us a range of –128 to 127 Only one representation of zero But we still have a problem: –(-128) = -(10000000) = 10000000 = -128 InnerInt
Excess 2m-1 We store each number as its sum plus 2m-1 E.g. Using an 8-bit binary representation, this is “excess 128” 58 is represented as 186, or 10111010 -58 is represented as 70, or 01000110 This is equivalent to 2’s complement with the sign-bit reversed The range is again –128 to 127 Still unequal
Fixed-Point Arithmetic It is useful to be able to represent real numbers, e.g. 1.23, 3.141, 0.0000123 … One way we can do this is to reserve a certain number of digits to be the fractional part: E.g. 1010.1100 = 10.75 (“.1100” = 2-1 + 2-2 = 0.5 + 0.25 = 0.75) Note that we don’t need any special hardware to deal with this We just need to keep track of the decimal point Given m bits before and n bits after the decimal point m determines the range n determines the precision
Problems with Fixed-point Arithmetic Fixed-point arithmetic is useful, but limited No good for extremely large or small numbers The mass of the Sun is 2x1033 grams The mass of an electron is 9x10-28 grams To use both these amounts in a calculation we would need over 60 decimal digits, most of which would be irrelevant We need floating-point arithmetic
Floating Point Arithmetic We borrow the familiar scientific notation to represent numbers: 3.14 = 0.314 x 101 0.0005 = 0.5 x 10-4 We use powers of 2 instead of 10: N = mantissa x 2exponent (-1 < mantissa < 1) Range depends on exponent Precision depends on mantissa This allows a huge range of numbers to be covered With some loss in accuracy IEEE floating-point standard 754: Single precision: 1 sign bit, 8 exponent, 23 mantissa Double precision: 1 sign bit, 11 exponent, 52 mantissa Reserved values for infinity and NAN (not a number) Inner Float
Problems with Floating-point Arithmetic Negative Overflow Positive Expressible Negative Numbers Positive Numbers Underflow The number line is divided into seven regions Using floating point we can only access three of them The range is not continuous, nor equally sampled 0.998x1099 and 0.999x1099 v.s. 0.998x100 and 0.999x100 Cannot express some numbers at all, e.g. 0.100x103 / 3 Need to round them to the nearest expressible number For a fixed number of bits, we must trade off the range against the precision of the representation Requires special hardware for fast performance
Summary Computers use a binary representation for numbers They are bound by the constraints of fixed-length arithmetic Representing negative numbers is something of a challenge No system is ideal We can represent real numbers using fixed-point or floating-point arithmetic Again, neither system is ideal