Download presentation
Presentation is loading. Please wait.
1
Chapter 3 Data Representation Text Characters
2
Representing Text To represent a text document in digital form, we need to be able to represent every possible character that may appear. There is a finite number of characters to represent, so the general approach is to list them all and assign each a binary string.
3
Representing Text A character set is a list of characters and the codes used to represent each one. In 1960, a survey revealed 60 different characters sets in use. At IBM alone there were 9 different sets. By agreeing to use ONE particular character set, computer manufacturers have made the processing of text data easier.
4
The ASCII Character Set
ASCII stands for American Standard Code for Information Interchange. The ASCII character set originally used seven bits to represent each character, allowing for 128 unique characters. Wikipedia has an excellent entry on ASCII.
5
The ASCII Character Set (7 bit)
6
The ASCII Character Set
Notice the organisation of the ASCII table. The table divides in half according to the MSB. Letters are all in the second half so all codes for alphabetic characters start with 1. This second half of the table divides in half again according to the next bit: UPPERCASE letters start 10. lowercase letters start 11. The first half of the table also divides in half according to the next bit: Control characters start 00. Numerals and punctuation start 01.
7
The ASCII Character Set
Note that control characters (the first 32 in the ASCII character set) do not have simple character representations that you could print to the screen. Many, however, perform actions with which you are familiar. Some have there own keys, others need to be constructed.
8
The ASCII Character Set Control Characters
Control sequences are created by holding the Ctrl key (control) and pressing a letter. This has the effect of subtracting 64 from the ASCII value of the letter pressed. For example: ‘M’ has ASCII value 77 ( in binary), Ctrl-M has ASCII value 13 ( in binary). Alternately, we can see this as “masking bit 6.”
9
The ASCII Character Set Common Control Characters
Hex Binary Decimal Name Function 00 NUL Null 07 7 BEL Bell 08 8 BS Backspace 09 9 HT Horizontal Tab 0A 10 LF Line Feed 0B 11 VT Vertical Tab 0C 12 FF Form Feed 0D 13 CR Carriage Return 0E 14 SO Shift Out 0F 15 SI Shift In 1B 29 ESC Escape
10
The ASCII Character Set
Coding letters in ASCII is easy. Let’s look at ‘j’ as an example: Since ‘j’ is a letter, its code starts with a 1. Since it’s lowercase, the next bit is also a 1. Since it’s the tenth letter of the alphabet the rest of the code is The complete ASCII code for ‘j’ is
11
The ASCII Character Set
ASCII evolved so that eight bits are used. The 7-bit codes were simply prefixed with another bit, giving another natural doubling. The original 7-bit codes were padded with 0. So the code for ‘j’ became 128 new characters were added. The codes for this alternate character set start with 1.
12
The Unicode Character Set
Even the extended version of the ASCII character set is not enough for international use. The Unicode character set uses 16 bits per character. The Unicode character set can represent 216, or over 65 thousand characters. Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.
13
Examples of Unicode Characters
Figure 3.6 A few characters in the Unicode character set
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.