Presentation is loading. Please wait.

Presentation is loading. Please wait.

Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.

Similar presentations


Presentation on theme: "Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown."— Presentation transcript:

1 Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown

2 Text - Representation ● ASCII – 7-bit code – 128 values in ASCII character set (English Alphabet) – use of 8th bit in text editors/word processors creates incompatibility ● ISO character sets – extended ASCII to support non-English text (symbols such as ¢ or œ ) – ISO Latin provides support for accented characters ● à, ö, ø, etc. – ISO sets include Chinese, Japanese, Korean & Arabic ● UNICODE – 16 bit format (Roman vs. Western European or Kanji – Japan) – 65,000 different symbols – 25 supported scripts of Version 2.0 Unicode Standard: Arabic, Armenian, Bengali, Bopomofo, Cyrilic, Devanagari, Georgian, Greek, Gujarati, Gurmkhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Latin, Lao, Malayalam, Oriya, Phonetic, Tamil, Telugu, Thai, Tibetan

3 ASCII ● All uppercase and lowercase letters ● Punctuation symbols like !., ? : ; “ ‘ etc. ● Digits 0, …, 9 ● Arithmetic symbols + = - / ● Assorted special symbols like # @ $ % ^ & * ( ) { } [ ] etc. ● Invisible formatting characters

4 ASCII

5 – Marked-up text ● nroff, troff ● LaTEX ● SGML – HTML – HyTime – XML, XSL, XLL – Structured Text ● structure of text represented in data structure, usually tree-based ● ODA, structure embedded in byte-stream with content – Hypertext ● non-linear ● graph or “web” structure : nodes and links ● currently subject of intensive ISO standards activity Text - Representation

6 ● Character operations – basic data type with assigned value – permits direct character comparison (a<b) ● String operations – comparison – concatenation – substring extraction and manipulation ● Editing – perhaps the most familiar set of operations on text – cut/copy/paste – strings v. blocks, dependent on document structure Text - Operations

7 ● Formatting – interactive or non-interactive (WYSIWYG v. LaTEX) – formatted output ● bitmap ● page description language (Postscript, PDF) – font management ● typeface ● point size (1 point = 1/72 of an inch) ● TrueType fonts : geometric description + kerning ● Pattern-matching and Searching – search and replace – wildcards – regular expressions – for large bodies of text, or text databases, use of inverted indices, hashing techniques and clustering. Text - Operations

8 ● Sorting – numerous varieties of sort, all of them extensively studied in basic programming – sort complexity is a major factor in data handling performance ● Compression – ASCII uses 7 bits per character, though most word-processors actually use the 8th bit to use up a byte per character – Information theory estimates 1-2 bits per character to be sufficient for natural language text – This redundancy can be removed by encoding : ● Huffman : varies the numbers of bits used to represent characters, shortest codes for highest frequency characters ● Lempel-Ziv : identifies repeating strings and replaces them by pointers to a table ● Both techniques compress English text at a ratio of between 2:1 and 3:1 Text - Operations

9 ● Encryption – text encryption is widely used in electronic mail and networked information systems – most widely-used techniques : ● DES ● RSA public-key ● PGP – subject of major controversy : ● key escrow systems ● Clipper chip ● “strong” encryption now being legally outlawed in a number of countries ● Language-specific operations – spell-checking – parsing and grammar checking – style analysis Text - Operations

10 About Fonts and Faces ● A typeface – family of graphic character (include many type sizes & styles) ● A font is a collection of characters of a single size ● Styles are boldface and italic (underlining & outlining) ● Serif vs. Sans Serif (‘sans’(French) – without)


Download ppt "Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown."

Similar presentations


Ads by Google