Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Similar presentations


Presentation on theme: "Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images."— Presentation transcript:

1 Data Encoding COSC 1301

2 Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images video

3 Standards Look around – how many items do you see that are based on a standard? Standards: make our lives simpler, more efficient Sometimes there aren't any.

4 Not Much of a Standard

5 A Small Number of Standards

6

7

8 Bitten by Lack of a Single Standard

9

10 Wishing for Standards http://www.sheldonbrown.com/tire-sizing.html

11 A General Trend Toward Standards Word Sizes of Early Computers EDVAC44 bits1947 MARK 140 bits1948 EDSAC17 bits1949 CSIRAC20 bits1949 UNIVAC I12 digits1951 IBM 70136 bits1952 CDC 160448 bits1959 CDC 660060 bits1964 IBM 36032 bits1965 x-8616 bits1978 x-3232 bits1986 x-6464 bits2004

12 Standard: Integer Representation Representing integers in base 2: 93 01011101

13 Integers Representing integers in base 2: 93 01011101 11011101 But what about: -93 sign bit

14 Integers 11011101 But what about: -93 sign bit Problem: Two representations of zero – positive zero and negative zero Unnecessary complexity Better representations make it easier for the computer.

15 Two's Complement: Negative Integers 93 -93 Flip the bits: Then add 1: 01011101 10100010 10100011 A good explanation of why it works: http://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html

16 A Problem 104.23 What should we do about: If we always want two places after. : Then we could write: 10423 And then always treat it as though the decimal point were there.

17 Floating Point Numbers Floating point representation: exponential/scientific notation Example: 123l.45 can be represented as a decimal floating-point number with the integer 12345 as the significand and -2 as the exponent (and 10 as the base). It’s value is given by the following: 123.45 = 12345 X 10 -2 See the following slide to see how a computer stores this

18 IEEE Standard - Floating Point Single Format: 32 bits (4 bytes) to store a floating point number: 1 bit for the sign 8 bits for the exponent 23 bits for the mantissa or significand Double Format: 64 bits (8 bytes) to store a floating point number: 1 bit for the sign 11 bits for the exponent 52 bits for the mantissa or significand

19 Text To represent text digitally, need to be able to represent every possible character that may appear: Computers have revolutionized our world. コンピュータは私たちの世界に革命をもたらしました。 Les ordinateurs ont révolutionné notre monde.

20 Text Decide how many characters we need to represent. Then: determine the required number of bits. English: 26 letters, 52 for upper and lower case. Plus punctuation... And other languages? character set: a list of characters and the codes used to represent each Several character sets have been used over the years - a standard makes processing text easier

21 ASCII ASCII: American Standard Code for Information Interchange 1963: 7 bits per character = 128 different symbols Thought to be enough at the time 8th bit in each character byte – used as a check bit or parity bit check for errors in transmission of data Later: Latin-1 Extended ASCII character set All 8 bits used to represent character Represent 256 characters – includes accented characters, other special characters

22 ASCII http://www.krisl.net/cgi-bin/ascbin.pl

23 Representing Text Fourscore and seven … F o u r 01000110 01101111 01110101 01110010

24 Representing Text T h e n u m b e r i s 1 7. 54 68 65 20 6E 75 6D 62 65 72 20 69 73 20 31 37 2E

25 Computing with Text Computers have revolutionized our world. They have changed the course of our daily lives, the way we do science, the way we entertain ourselves, the way that business is conducted, and the way we protect our security. Suppose we want to capitalize this entire paragraph: Let’s go back and look at the ASCII table to see how to do that.

26 When We Need More Characters 简体字 What about things like:

27 When We Need More Characters 简体字 What about things like: Answer: Unicode A conversion applet: http://www.pinyin.info/tools/converter/chars2uninumbers.html

28 Unicode Previously, a letter maps to some bits: A encoded as 0100 0001 In Unicode, a letter maps to a code point – a number like U+0639 U+ means Unicode numbers are hexadecimal Every character has a Unicode code point This doesn't indicate how the code point is encoded as a sequence of bits, though U+0041: English letter A U+0639: Arabic letter Ain

29 Unicode Example: Hello 5 code points, one code point (i.e., number) per letter U+0048 U+0065 U+006C U+006F How is this stored in memory? Different standards for this. One standard: UTF-8 Standard system for storing strings of Unicode code points in binary (i.e., U+DDDD stored in some number of bytes)

30 UTF-8 Code points 0-127 stored in one byte So English text looks same in UTF-8 as ASCII (backwards compatible) Code points 128 and higher: 2, 3, up to 6 bytes Hello: U+0048 U+0065 U+006C U+006C U+006F Stored as: 48 65 6C 6C 6F (same as ASCII) For Hebrew characters, accented letters, etc.: you may need more bytes


Download ppt "Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images."

Similar presentations


Ads by Google