Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation.

Chapter 3 Data Representation

Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation methods Describe how nonnumeric data is represented

Data representation Humans have many symbolic forms to represent information Alphabet, numbers, pictograms  Computer can only represent information with electrical signals Is a circuit on or off?

Computers, numbers, and binary data Computers only use on/off signals to represent information These signals can only represent numeric data Even character based data is represented as a number

Why binary data? Electricity has two states, on and off On = 1 Off = 0 Binary numbers only have 0s and 1s Data is stored as collections of binary numbers

Binary numbers are “computer friendly” Binary numbers are signals that can easily be transported Binary numbers can be easily processed (transformed) by two-state electrical devices that are easy to design and fabricate These devices (and/or gates, adders) are strung together like an assembly line to carry out a function

Logic gates

Boolean algebra System developed by George Boole (19 th century mathematician) that can determine if two values are: Equal, not equal, less than, greater than, etc. Boolean algebra allows the CPU to carry out binary arithmetic (see White p.36- 37)

Binary numbers Can be combined into a positional numbering system Base for decimal numbers is 10, base for binary numbers is 2 Each position to the left is an increasing factor of 2

Terminology for number systems Base is also referred to as the radix Binary numbers have a radix of 2 Decimal numbers have a radix of 10 Radix point separates whole values from fractional values Decimal point is a kind of radix point

Base 2 positional example

Numbering systems Higher base (radix) means fewer positions are needed to represent a number Base 2 needs many more positions than base 10 Base 16 (hexidecimal) is often used to represent binary numbers

Computers & binary numbers Each digit of a binary number is called a bit Bit string – group of digits that describes a single value

Bit strings Left most bit (most significant bit) called high order bit Right most bit (least significant bit) called low order bit 8 bits make a byte Programming languages/spreadsheets/etc. automatically translate from base 10 to base 2 and back again

Hexadecimal notation Base or radix is 16 More compact than binary Symbols used are 0-9, A-F One hexadecimal position corresponds to 4 bits Used to designate memory locations, colors (html & VB)

Goals of computer data representation Any representation format for numeric data represents a balance among several factors, including: Compactness Accuracy Range Ease of manipulation Standardization

Balancing objectives Compactness and range are inversely related: the more compact, the smaller the range Accuracy increases with # of bits used, especially with real numbers: example, 1/3, or 0.33333333 (non-terminating fraction)

Other objectives Does information format make it easier for processor to perform operations? Is data in a standard format, allowing simple transfer between computers?

CPU standard data types Integer Real number Character Boolean Memory address

Integer data types Unsigned – assumed to be positive Signed – uses one bit (usually high order bit) to indicate sign 0 is positive, 1 is negative

Representing negative integers Excess notation and twos complement Allow subtraction to be carried out as addition Number is converted to its complement 1 is added to the result When added to another binary number, carry bit is ignored

Range and overflow Most CPUs use a fixed width of 32 or 64 bits to represent an integer For small numbers format is padded with leading zeros Machine processes fixed width information more easily than variable width

Integer overflow If number is too big for fixed width integer format CPU throws an overflow error Integer format width is tradeoff between overflow and wasted space (padded zeros) CPU often use double precision data types for arithmetic operations

Representing real numbers More complicated problem than storing integers Real numbers contain whole & fractional components How to represent both parts together in one format?

Fixed format for real numbers

Floating point notation Any real number can be re-written using floating point (scientific notation) 12.555 becomes 1.2555 X 10¹ Format stores 12555 (mantissa), 1 (exponent), and sign (+) -143.99 becomes 1.4399 X 10 2 Format stores 14399 (mantissa), 2 (exponent), and sign (-)

IEEE floating point format for real numbers

Floating point range Number of bits in floating point format limit range of exponent, mantissa Overflow (too large a number) always occurs in the exponent Underflow (too small a number, i.e. negative exponent does not fit)

Range for mantissa Number of bits for mantissa limit the number of significant digits stored for a real number 23 bits allows for approx. 7 decimal places of precision Mantissa is stored using truncation (information that does not fit is discarded) Does not throw an overflow condition

Processing complexity General rule is floating point operations (+, -, *, etc.) take CPU twice as long as integers (binary) Floating Point Operations Per Second (FLOPS) is a measure of processor speed

Character data Alphabetic letters (upper & lower case), numerals, punctuation marks, special symbols are called characters Variable of type character contain only one symbol Sequence of symbols forming words, sentences, etc. called a string

How computers store characters Character data cannot be directly processed by a computer Must be translated into a number Characters are converted into numbers using a table of correspondences between a character and a bit string

Design issues for character coding schemes Table must be publicly available and all users must use the same table Coding scheme is a tradeoff among compactness, ease of manipulation, accuracy, range, & standardization

Examples of character coding schemes BCD and EBCIDIC – older IBM mainframe computers ASCII – PCs Unicode – larger format allows for expanded and international alphabets (Java and internet applications)

ASCII coding scheme 7 bit format allows for parity bit (used to check for errors over transmission lines) Has unique codes for all uppercase & lowercase letters, numbers, other printable characters Also includes codes for device control

Device control In many applications that handle text, formatting & commands to a device are included in the same stream of data as the text Examples: word processors (reveal codes), HTML tags Examples: CR (carriage return), tab, form feed

Limitations to ASCII Not robust enough to represent multiple languages and symbols 7 bit format allows for 128 unique codes, some languages have thousands of symbols Unicode (16 bit) has 65,536 entries

Boolean data Data types has two values, true and false Can be stored with one bit The results of many CPU operations (comparisons) generate a Boolean value stored in a register

Memory addresses Primary storage is a series of contiguous bytes CPU must be able to access sections of memory directly Sections of memory are accessed by their address (location)

Formats for memory addresses Flat memory model – memory starts at address 0, goes to maximum capacity – 1 Simple integers used to store address Segmented memory model Memory is divided into equal sized segments called pages Address has two parts 00FA:0034 number for page, and location within page

Data structures These five primitive types are quite limited for representing real world data Words, sentences Dates Data base tables More complex data structures constructed from these five primitive types

Chapter summary To be processed by any device, data must be converted from its native format into a form suitable for the processing device. All data, including nonnumeric data, are represented within a modern computer system as strings of binary digits, or bits. Each bit string has a specific data format and coding method.

Summary (cont.) Numeric data is stored using integer, real number, and floating point formats. Characters are converted to numbers by means of a coding table. Boolean vales can have only two values, true and false. Programs often need to define and manipulate data in larger and more complex units than primitive CPU data types.

Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation.

Similar presentations

Presentation on theme: "Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation.

Similar presentations

Presentation on theme: "Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation."— Presentation transcript:

Similar presentations

About project

Feedback