Lecture Overview Introduction Positional Numbering System

Slides:



Advertisements
Similar presentations
2-1 Chapter 2 - Data Representation Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.
Advertisements

Chapter 2: Data Representation
Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 2: Data Representation.
COE 202: Digital Logic Design Signed Numbers
CHAPTER 2 Number Systems, Operations, and Codes
CS 151 Digital Systems Design Lecture 3 More Number Systems.
Assembly Language and Computer Architecture Using C++ and Java
Signed Numbers.
Assembly Language and Computer Architecture Using C++ and Java
1 Binary Arithmetic, Subtraction The rules for binary arithmetic are: = 0, carry = = 1, carry = = 1, carry = = 0, carry =
ENGIN112 L3: More Number Systems September 8, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 3 More Number Systems.
S. Barua – CPSC 240 CHAPTER 2 BITS, DATA TYPES, & OPERATIONS Topics to be covered are Number systems.
1 Binary Numbers Again Recall that N binary digits (N bits) can represent unsigned integers from 0 to 2 N bits = 0 to 15 8 bits = 0 to bits.
Number Systems Lecture 02.
Binary Number Systems.
Chapter 3 Data Representation part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Binary Representation and Computer Arithmetic
Dr. Bernard Chen Ph.D. University of Central Arkansas
Chapter3 Fixed Point Representation Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2009.
The Binary Number System
Data Representation Number Systems.
CSC 221 Computer Organization and Assembly Language
Simple Data Type Representation and conversion of numbers
Data Representation – Binary Numbers
Information Representation (Level ISA3) Floating point numbers.
Computer Arithmetic Nizamettin AYDIN
Logic and Digital System Design - CS 303
2-1 Chapter 2 - Data Representation Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Chapter Contents.
IT253: Computer Organization
Computing Systems Basic arithmetic for computers.
2-1 Chapter 2 - Data Representation Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer.
Data Representation in Computer Systems
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Lecture 4 Last Lecture –Positional Numbering Systems –Converting Between Bases Today’s Topics –Signed Integer Representation Signed magnitude One’s complement.
Cosc 2150: Computer Organization Chapter 2 Part 1 Integers addition and subtraction.
CSC 221 Computer Organization and Assembly Language
Operations on Bits Arithmetic Operations Logic Operations
AEEE2031 Data Representation and Numbering Systems.
Data Representation in Computer Systems. 2 Signed Integer Representation The conversions we have so far presented have involved only positive numbers.
1 2.1 Introduction A bit is the most basic unit of information in a computer. –It is a state of “on” or “off” in a digital circuit. –Sometimes these states.
Digital Logic Lecture 3 Binary Arithmetic By Zyad Dwekat The Hashemite University Computer Engineering Department.
1 Digital Logic Design Lecture 2 More Number Systems/Complements.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
Dr. ClincyLecture 2 Slide 1 CS Chapter 2 (1 of 5) Dr. Clincy Professor of CS Note: Do not study chapter 2’s appendix (the topics will be covered.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
Floating Point Representations
CS2100 Computer Organisation
Department of Computer Science Georgia State University
Cosc 2150: Computer Organization
Dr. Clincy Professor of CS
Dr. Clincy Professor of CS
Digital Logic & Design Dr. Waseem Ikram Lecture 02.
Data Representation in Computer Systems
IT 0213: INTRODUCTION TO COMPUTER ARCHITECTURE
Data Structures Mohammed Thajeel To the second year students
Data Representation in Computer Systems
Digital Logic & Design Lecture 02.
Data Representation in Computer Systems
Chapter3 Fixed Point Representation
Presentation transcript:

Lecture 2 Data Representation in Computer Systems Lecture Duration: 2 Hours

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation

Some Notifications – A reminder (1/2) Introduction Some Notifications – A reminder (1/2) Bit: The most basic unit of information in a digital computer (On/Off ; 0/1 state) Byte: A set of 8bits Word: two or more adjacent bytes that are manipulated collectively Word size: The size of a word in bits depends on the computer organization (16, 32, 64 bits, …) Nibbles (or nybbles): set of 4 bits – Usually a set of 8 bits is divided into two nibbles, a low order nibble and a high order nibble

Some notifications – A reminder (2/2) Introduction Some notifications – A reminder (2/2) Example: 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 1 Most Significant bit (MSB) Least Significant bit (LSB) bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit High Order nibble Low Order nibble High Order nibble Low Order nibble byte byte Word (16 bit)

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation

Positional Numbering System (1/3) Any numeric value is represented through increasing powers of a radix (or base) The set of valid numerals (digits) is equal in size to the radix of that system The least numeral is 0 and the highest one in 1 smaller than the radix Example: In the decimal system (base 10) The radix is 10 The number of valid numerals is 10 (equal to the radix) The set of valid numerals is: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Positional Numbering System (2/3) The most important radices (bases) in computer science are: Binary Radix 2 or base 2 Numerals: {0 , 1} Octal Radix 8 or Base 8 Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7} Hexadecimal Radix 16 or base 16 Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , A , B , C , D , E , F}

Positional Numbering System (3/3) Any numeric value is represented through increasing powers of a radix (or base) Examples 243.5110 = 2x102 + 4x101 + 3x100 + 5x10-1 + 1x10-2 2123 = 2x32 + 1x31 + 2x30 = 2310 10110.012 = 1x24 + 0x23 + 1x22 + 1x21 + 0x20 + 0x2-1 + 1x2-2= 22.2510

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Converting Unsigned Whole Numbers Converting fractions Converting between Power-of-Two Radices Signed integer representation Floating-point representation

Some numbers to remember (1/1) Decimal to binary conversion Some numbers to remember (1/1) Keep in mind the following tables or how to obtain them!

Converting Unsigned Whole Numbers (1/6) Decimal to binary conversion Converting Unsigned Whole Numbers (1/6) A real number can take any value (ex. 10323.7643 ; -16813.5322703) Whole number: No fractions (ex: 10, 1231, 3543, …, -12, -12334,…) Unsigned number: Only positive numbers (ex: 102313.43234, 1231.56234, 12357, …) Unsigned whole numbers: No fraction and only positive numbers

Converting Unsigned Whole Numbers (2/6) Decimal to binary conversion Converting Unsigned Whole Numbers (2/6) Convert the decimal number 11310 to binary: 11310 = 2 Method 1: Repeated subtraction 113 - 64 49 - 32 17 - 16 1 - 1 1 1 11310 = 11100012 1 1

Converting Unsigned Whole Numbers (3/6) Decimal to binary conversion Converting Unsigned Whole Numbers (3/6) Method 2: Division-remainder 2 |113 2 |56 2 |28 2 |14 2 |7 2 |3 2 |1 Remainder 1 Remainder 0 LSB 11310 = 11100012 MSB

Converting Unsigned Whole Numbers (4/6) Decimal to binary conversion Converting Unsigned Whole Numbers (4/6) A binary number with N bits can represent 2N unsigned integers from 0 to 2N-1 Example: Having N=4 bits, we can represent 24 = 16 unsigned integers from 0 to 24-1=16-1=15 The number 16 CANNOT be represented with only 4 bits!!

Converting Unsigned Whole Numbers (5/6) Decimal to binary conversion Converting Unsigned Whole Numbers (5/6) The subtraction method is cumbersome. The subtraction method requires a familiarity with the powers of the radix being used. The division-remainder method is faster and easier than the repeated subtraction method. The division-remainder method can be used to convert from decimal to any other base system (not only to base 2).

Converting Unsigned Whole Numbers (6/6) Decimal to binary conversion Converting Unsigned Whole Numbers (6/6) Example: Convert 10410 to base 3 using the division-remainder method. 3 |104 3 |34 3 |11 3 |3 3 |1 Remainder 2 Remainder 1 Remainder 0 10410 = 102123

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Converting Unsigned Whole Numbers Converting fractions Converting between Power-of-Two Radices Signed integer representation Floating-point representation

Converting fractions (1/5) Decimal to binary conversion Converting fractions (1/5) Fractions in a decimal system can be converted/approximated to fractions in any other radix system Radix points separate the integer part of a number from its fractional part Example of fractions (the integer part is italic and the fractional part is bold) Base 10 : 2390167.1208 Base 3 : 2012.11022 Base 2 : 1011110.111011 The “radix point” is called a “decimal point” in a decimal system, a “binary point” in a binary system, and so on…

Converting fractions (2/5) Decimal to binary conversion Converting fractions (2/5) To convert fractions from decimal to any other base system we repeatedly multiply by the destination radix Example: Convert 0.430410 to base 5. 0.4304 x 5 2.1520 The integer part is 2 0.1520 0.7600 The integer part is 0 3.8000 The integer part is 3 0.8000 4.0000 The integer part is 4, the fractional part is zero, we are done 0.430410 = 0.20345

Converting fractions (3/5) Decimal to binary conversion Converting fractions (3/5) Some fractions in one base could be indeterminate Fractions that contain repeating strings of digits to the right of the radix point Example: (2/3)10=(0.666…)10 An indeterminate fraction in one base could be determinate in another base (and vice-versa). Example: (2/3)10=0.23=(0.666…)10 2/3 is indeterminate in base 10 but determinate in base 3. When a fraction is indeterminate, an approximation is needed We fix the number of digits to the right of the radix point Also, approximation is needed due to the limited computing resources (example: limited size of the processor’s registers)

Converting fractions (4/5) Decimal to binary conversion Converting fractions (4/5) Example: Convert 0.3437510 to binary with 4 bits to the right of the binary point. 0.34375 x 2 0.68750 1.37500 0.37500 0.75000 1.50000 This is our fourth bit. We will stop here. 0.3437510 = 0.01012

Converting fractions (5/5) Decimal to binary conversion Converting fractions (5/5) Convert 26.78125 to binary: 26.7812510 = 2 By using the methods just described we will have: 2610=110102 and 0.7812510=0.110012 So 26.7812510=11010.110012

Going back to positional numbering system (1/1) Decimal to binary conversion Going back to positional numbering system (1/1) Any unsigned whole or fractional number could be converted to decimal by using the “Positional Numbering System” described previously Examples: 0.01012=0x2-1+1x2-2+0x2-3+1x2-4 = 0 + 0.25 + 0 + 0.0625 = 0.312510 134.20345 = 1x52 + 3x51 + 4x50 + 2x5-1 + 0x5-2 + 3x5-3 + 4x5-4 = 44.430410

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Converting Unsigned Whole Numbers Converting fractions Converting between Power-of-Two Radices Signed integer representation Floating-point representation

Converting between Power-of-Two Radices (1/4) Decimal to binary conversion Converting between Power-of-Two Radices (1/4) To convert between any base to any other base (different than base 10), it is easier to pass through base 10. Example: 31214= 3? First step: 31214 = 3x43 + 1x42 + 2x41 + 1x40=21710 Second step: by using the division-remainder method: 21710 = 220013 So 31214=220013 Working between bases that are powers of two is much more easier.

Converting between Power-of-Two Radices (2/4) Decimal to binary conversion Converting between Power-of-Two Radices (2/4) The must famous power-of-two radices are: binary (base 2), octal (base 23 / base 8) and hexadecimal (base 24 / base 16). Each octal digit is equivalent to a group of 3 binary digits called octet1 Each hexadecimal digit is equivalent to a group of 4 binary digits called hextet We convert from binary to octal and from binary to hexadecimal by simply grouping bits 1 The term “Octet” could also be used in the literature to describe a set of 8 bits.

Converting between Power-of-Two Radices (3/4) Decimal to binary conversion Converting between Power-of-Two Radices (3/4) Example: Convert 101100100111012 to octal Make Groups of 3 bits (from right to left): 10 110 010 011 101 Add zero(s) on the left to complete the last octet 010 110 010 011 101 Convert each octet to its corresponding octal digit 2 6 2 3 5 Finally: 101100100111012 = 262358

Converting between Power-of-Two Radices (4/4) Decimal to binary conversion Converting between Power-of-Two Radices (4/4) Example: Convert 101100100111012 to hexadecimal Make Groups of 4 bits (from right to left): 10 1100 1001 1101 Add zero(s) on the left to complete the last hextet 0010 1100 1001 1101 Convert each hextet to its corresponding hexadecimal digit 2 C 9 D Finally: 101100100111012 = 2C9D16

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Signed Magnitude Complement system Floating-point representation

Signed integer representation An integer is a whole number Signed integers are the set of positive and negative whole numbers How should we encode and deal with the actual sign of the number? Two concepts are used Signed Magnitude concept Complement concept

Signed magnitude is the most intuitive method Signed integer representation Signed Magnitude (1/13) Signed magnitude is the most intuitive method The MSB (Most Significant Bit) of a binary number is kept as the “sign” of the number MSB = 1: negative number MSB = 0: positive number The remaining bits represent the magnitude (or absolute value) of the numeric value

Signed integer representation Signed Magnitude (2/13) Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers 00000001? The MSB is 0: The number is positive The remaining 7 bits are: 00000012 = 110 The decimal number is +1 10000001? The MSB is 1: The number is negative The decimal number is -1

Signed integer representation Signed Magnitude (3/13) Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers 10001001? The MSB is 1: The number is negative The remaining 7 bits are: 00010012 = 910 The decimal number is -9 01000001? The MSB is 0: The number is positive The remaining 7 bits are: 10000012 = 6510 The decimal number is +65

In a N bit word signed magnitude system Signed integer representation Signed Magnitude (4/13) In a N bit word signed magnitude system 1 bit is used for the sign of the number N-1 bits are used for the magnitude of the number The largest integer is 2N-1 - 1 The smallest integer is -(2N-1 - 1) Example: in a 8 bit word signed magnitude system The largest integer is 011111112 = 27-1 = 12710 The smallest integer is 111111112 = -(27-1) = -12710

Computers should be able to carry out mathematical operations Signed integer representation Signed Magnitude (5/13) Computers should be able to carry out mathematical operations Signed-magnitude arithmetic is carried out using essentially the same methods as humans At first we look at the signs of the two operands We arrange the operands in a certain way based on their signs We perform the calculation without regard to the signs Finally, we supply the sign as appropriate

Adding operands that have the same sign Signed integer representation Signed Magnitude (6/13) Adding operands that have the same sign Example: Add 010011112 to 001000112 using signed-magnitude arithmetic. 1 1 1 1 ⇐ carries 0 1 0 0 1 1 1 1 (79) 0 + 0 1 0 0 0 1 1 + (35) 0 1 1 1 0 0 1 0 (114) We find 010011112 + 001000112 = 011100102 in signed-magnitude representation. Sign

Example: Add 010000012 to 011000012 using signed-magnitude arithmetic Signed integer representation Signed Magnitude (7/13) Overflow condition In the last example, adding the seventh’ bits to the left gives no carry If there is a carry, we say that we have an overflow condition and the carry is discarded, resulting in an incorrect sum. Example: Add 010000012 to 011000012 using signed-magnitude arithmetic

The addition overflows The last carry is discarded Signed integer representation Signed Magnitude (8/13) 1 1 ⇐ carries 0 1 0 0 0 0 0 1 (65) 0 + 1 1 0 0 0 0 1 + (97) 0 0 1 0 0 0 1 0 The addition overflows The last carry is discarded The sum’s result is incorrect X (34)

Signed integer representation Signed Magnitude (9/13) Signed-magnitude subtraction is carried out in a manner similar to pencil and paper decimal arithmetic Example 1: Subtract 010011112 (79) from 011000112 (99) using signed-magnitude arithmetic. 0 1 1 2 ⇐ borrows 0 1 1 0 0 0 1 1 (99) 0 - 1 0 0 1 1 1 1 (79) 0 0 0 1 0 1 0 0 (20) We find 011000112 - 010011112 = 000101002 in signed-magnitude representation.

Signed integer representation Signed Magnitude (10/13) Example 2: Subtract 011000112 (99) from 010011112 (79) using signed-magnitude arithmetic. Here the subtrahend, 01100011, is larger than the minuend, 01001111. With the result obtained in Example 2.12, we know that the difference of these two numbers is 00101002. Because the subtrahend is larger than the minuend, all that we need to do is change the sign of the difference. So we find 010011112 - 011000112 = 100101002 in signed-magnitude representation

Signed integer representation Signed Magnitude (11/13) Example 3: Add 100100112 (-19) to 000011012 (+13) using signed-magnitude arithmetic. The result is negative We subtract 13 from 19 The result of the binary subtraction is: 100001102 (-6) Example 4: Subtract 100110002 (-24) from 101010112 (-43) using signed-magnitude arithmetic. This is equivalent to adding -43 to 24 We subtract 24 from 43 The result of the binary subtraction is: 100100112 (-19)

General rules when operands have different signs Signed integer representation Signed Magnitude (12/13) General rules when operands have different signs Determine which operand has the larger magnitude The sign of the result is the same as the sign of the operand with the larger magnitude the magnitude must be obtained by subtracting (not adding) the smaller one from the larger one

Problems related to signed magnitude Signed integer representation Signed Magnitude (13/13) Problems related to signed magnitude To much decisions to make (larger number? ; borrows? ; what signs?). The number 0 could have two representations : 10000000 and 00000000. Complicated method Expensive circuits

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Signed Magnitude Complement system Floating-point representation

Complement system is used to represent/convert negative numbers only Signed integer representation Complement system (1/19) Complement system is used to represent/convert negative numbers only When using complement system the subtraction is converted to an addition Advantages of complement system Simplify computer arithmetic No need to process sign bits separately The sign of a number is easily checked by looking at its high-order bit (MSB).

In base 10, “Casting out 9s” was used to subtract numbers Signed integer representation Complement system (2/19) In base 10, “Casting out 9s” was used to subtract numbers Let’s say we wanted to find 167 - 52 At first, 999 - 52 is calculated 999 – 52 = 947 947 is then added to 167 and the last carry is added to the sum: 167 – 52 = 167 + 947 = 114 + 1 = 115 a Carries: 1 6 7 + 9 4

The last method uses a “diminished radix complement” Signed integer representation Complement system (3/19) The last method uses a “diminished radix complement” Working in base r (radix), the diminished radix is given by : r-1 Example: Base 10 ; r=10 The diminished radix is r-1 = 10 - 1 = 9 We say that a negative number is converted to its 9’s complement For example, -246810 is converted to its nine’s complement as follows: -246810 = 9999 - 2468 = 7531C9

In a binary system r=2 The diminished radix complement is r-1 = 1 Signed integer representation Complement system (4/19) In a binary system r=2 The diminished radix complement is r-1 = 1 We say that we work in one’s complement (C1) To convert a negative number to its one’s complement this number is subtracted from all ones A positive number is directly converted to its binary representation Example: The one’s complement of 01012 is 11112 - 01012 = 1010C1 It is nothing more than switching all of the 1s with 0s and vice versa!!

Example: Express 2310 and -910 in 8-bit binary one’s complement form. Signed integer representation Complement system (5/19) Example: Express 2310 and -910 in 8-bit binary one’s complement form. 2310 = + (000101112) = 00010111C1 -910 = - (000010012) = 11110110C1

In one’s compliment the subtraction is converted into addition Signed integer representation Complement system (6/19) In one’s compliment the subtraction is converted into addition Example: 2310 – 910 = 2310 + (-910) Example: Add 2310 to -910 using 8-bit binary one’s complement arithmetic. The result is 00001110C1 = +(000011102) = 1410 Carries: 1 2310 + + (-910) 1410

Signed integer representation Complement system (7/19) Example: Add 910 to -2310 using 8-bit binary one’s complement arithmetic. -2310 = - (00010111)2 = 11101000C1 910 = + (000010012) = 00001001C1 910 + (-2310) = 11101000C1 + 00001001C1 Result: 11110001C1 = -(000011102) = -1410 Carries: 1 910 + + (-2310) -1410

Computer engineers long ago stopped using one’s complement Signed integer representation Complement system (8/19) In One’s complement, we still have two representations for zero: 00000000 and 11111111 Computer engineers long ago stopped using one’s complement A more efficient representation for binary numbers is the two’s complement

Two’s complement is an example of a radix complement Signed integer representation Complement system (9/19) Two’s complement is an example of a radix complement No need to subtract one from the radix r when working in a radix complement. Example: Base 10 ; r=10 We say that a negative number is converted to its 10’s complement For example, -246810 is converted to its ten’s complement as follows: -246810 = 10000 - 2468 = 7532C10

In a binary system r=2 The diminished radix r = 2 Signed integer representation Complement system (10/19) In a binary system r=2 The diminished radix r = 2 We say that we work in two’s complement Consider “d” is the number of digits To convert a negative number “N” to its two’s complement this number is subtracted from rd = 2d : N10 = (2d – N)C2 A positive number is directly converted to its binary representation

Example: In a 4 bits system: d=4; Signed integer representation Complement system (11/19) Example: In a 4 bits system: d=4; All negative numbers are converted by being subtracted from 2d = 24 = 1610 = 100002 The two’s complement of 00112 is 100002 - 00112 = 1101C2 It is nothing more than one’s complement incremented by 1!!

Signed integer representation Complement system (12/19) Example: Express 2310, -2310, and -910 in 8-bit binary two’s complement form. 2310 = + (000101112) = 000101112 -2310 = -(000101112) = 111010002 + 1 = 111010012 -910 = -(000010012) = 111101102 + 1 = 111101112

Unlike C1 arithmetic, in C2 the last carry is discarded Signed integer representation Complement system (13/19) Unlike C1 arithmetic, in C2 the last carry is discarded Example 1: Add 910 to -2310 using two’s complement arithmetic. The result is 11110010C2 = -(000011102) = -1410 Carries: 1 910 + + (-2310) -1410

Note how a negative binary number in C2 is converted to decimal Signed integer representation Complement system (14/19) Note how a negative binary number in C2 is converted to decimal At first all 0 and 1 in the C2’s number are switched: 11110010 → 00001101 A “1” is then added to the last number: 00001101+1 = 00001110 So 11110010C2 = -(000011102) = -1410

Signed integer representation Complement system (15/19) Example 2: Find the sum of 2310 and -910 in binary using two’s complement arithmetic. 2310 = +(00010111)2 = 00010111C2 -910 = -(000010012) = 11110111C2 2310 + (-910) = 00010111C2 + 11110111C2 Result: 00001110C2 = +(000011102) = 1410 Carries: 1 2310 + + (-910) -1410

Advantages of two’s complement Signed integer representation Complement system (16/19) Advantages of two’s complement It is the most popular choice for representing signed numbers The algorithm for adding and subtracting is quite easy It has the best representation for 0 (all 0 bits) It is self-inverting It is easily extended to larger numbers of bits.

Signed integer representation Complement system (17/19) Drawback the asymmetry seen in the range of values that can be represented by N bits. Examples: With signed-magnitude, 4 bits allow us to represent the values -7 (11112) through +7 (01112). Using two’s complement, we can represent the values: -8 (1000C2) through +7 (0111C2)

Overflow in complement systems (C1 and C2) Signed integer representation Complement system (18/19) Overflow in complement systems (C1 and C2) An overflow occurs if two positive numbers are added and the result is negative or if two negative numbers are added and the result is positive. It is not possible to have overflow when if a positive and a negative number are being added together.

Note that the last two carries are different Signed integer representation Complement system (19/19) To Detect Overflow Check the last two carries If these are different: there is an overflow If these are equal: there is no overflow Example 1: Find the sum of 12610 and 810 in binary using two’s complement arithmetic. The result is 10000110C2 = -(01111010)2 = -12210!!! Note that the last two carries are different Carries: 1 12610 + + 810 -1410

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation A simple model Floating-point arithmetic Floating point errors

Floating-point representation (1/1) A computer is supposed to solve all problems Huge and fractional numbers and complicated mathematical operations could be involved An optimized solution to give a good ratio: “Biggest Number/word size” is the Floating point representation

Numbers written in scientific notation have three components: Computers use a form of scientific notation for floating-point representation Numbers written in scientific notation have three components: Scientific notation in base 10: Scientific notation in base 2: 0.579 x 107 + 0.101101 x 23 +

In digital computers, floating-point numbers consist of three parts: Floating-point representation A simple model (1/8) In digital computers, floating-point numbers consist of three parts: A sign bit, an exponent part: representing the exponent on a power of 2, a fractional part called a significand: which is a fancy word for a mantissa.

More bits used for the exponent increases the range of numbers Floating-point representation A simple model (2/8) More bits used for the exponent increases the range of numbers More bits used for the significant increases the precision For simplicity, in all this course, we will use a simplified 14 bits model Sign bit: 1 bit Exponent: 5 bits Significand: 8 bits

Floating-point representation A simple model (3/8) Exercise 1: Represent the number 17 in a 14 bits floating point representation 17 = 17.0 x 100 = 1.7 x 101 = 0.17 x 102 Analogically in binary: 1710 = 100012 x 20 = 1000.12 x 21= 100.012 x 22 = 10.0012 x23 = 1.00012 x 24 = 0.100012 x 25 = 0.0100012 x 26 = 0.00100012 x 27 = ... As a convention, we stop when the MSB of the significant is “1”: 0.100012 x 25 The exponent is 510 = 001012 The significant is: 100012 → 100010002 So: 1

To solve such problems we use an excess-16 bias Floating-point representation A simple model (4/8) The last floating point representation is not suitable for negative exponents Example: the number 0.25 = 0.012 = 0.12 x 2-1 How to represent the negative exponent -1?! To solve such problems we use an excess-16 bias All negative and positive exponents are added by 16 We say that the real exponent is replaced by a biased exponent All exponents are converted to positive biased exponents

Floating-point representation A simple model (5/8) With an excess-16 bias Exponent values less than 16 will indicate negative exponent values Exponent values more than 16 will indicate positive exponent values exponents of all zeros or all ones are typically reserved for special numbers (such as zero or infinity).

Floating-point representation A simple model (6/8) Example 1: Represent the number 17 in a 14 bits floating point form with excess-16 bias The number is positive: sign bit is “0” 1710 = 0.100012 x 25 The exponent is 510 → (5+16)10 = 2110 = 101012 The significant is: 100012 → 100010002 So 17 in floating point form with excess-16 bias is: 1

Floating-point representation A simple model (7/8) Example 2: Represent the number 0.2510 in a 14 bits floating point form with excess-16 bias. The number is positive: sign bit is “0” 0.25 = 0.012 x 20 = 0.12 x 2-1 The exponent is -110 → (-1+16)10 = 1510 = 011112 The significant is 1 → 10000000 So 0.25 in floating point form with excess-16 bias is: 1

Floating-point representation A simple model (8/8) Example 3: Express -0.0312510 in normalized floating-point form with excess-16 bias. The number is negative: sign bit is “1” 0.0312510 = 0.000012 = 0.00001x20 = 0.0001x2-1 = … = 0.1x2-4 The exponent is -410 → (-4+16)10 = 1210 = 011002 The significant is 1 → 10000000 So -0.03125 in floating point form with excess-16 bias is: 1

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation A simple model Floating-point arithmetic Floating point errors

Floating point arithmetic (1/2) Floating-point representation Floating point arithmetic (1/2) To add/subtract two numbers in floating point form Both numbers should have the same exponent If exponents are different we change one of the numbers so that both of them are expressed in the same power of the base We add the binary numbers We represent the result in a normalized floating point form

Floating point arithmetic (2/2) Floating-point representation Floating point arithmetic (2/2) Example: Add the following binary numbers as represented in a normalized 14-bit format with an excess-16 bias. The second number is 0.10011010x20 The first number is 0.11001000x22 = 11.001000x20 Now 0.100110102 + 11.0010002 : 0.1 0 0 1 1 0 1 0 + 1 1.0 0 1 0 0 0 0 0 1 1.1 0 1 1 1 0 1 0 The result is 11.10111010 x 20 = 0.1110111010 x 22 In floating point form with excess-16 1810 → 210 1 + 1610 → 010 1

Lecture Overview Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation A simple model Floating-point arithmetic Floating point errors

Floating Point Errors (1/2) Floating-point representation Floating Point Errors (1/2) Computers are finite systems When dealing with floating-point form, we are modeling the infinite system of real numbers in a finite system of integers What we have, in truth, is an approximation of the real number system The more bits we use, the better the approximation However, there is always some element of error Such errors can propagate through a lengthy calculation, causing substantial loss of precision

Floating Point Errors (2/2) Floating-point representation Floating Point Errors (2/2) Example: In our previous simple model we are limited between 0.111111112x215 through +0.11111111x215. we cannot store 2x-19 or 2128; they simply don’t fit. Also, 128.5 cannot be accurately stored even if it is well within our range 128.510 = 10000000.12 = 0.1000000012x28 The significant is expressed with more than 8 bits! In practice we store only the first 8 bits: 10000000 We actually store 128 and not 128.5 with an absolute error of 0.5 The relative error is : 128.5 - 128 = 0.0038910 = 0.39%. 128.5

End of lecture 2 Try to solve all exercises related to lecture 2