Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.

Slides:



Advertisements
Similar presentations
Text #ICANN50. Text #ICANN50 IDN Variant TLD Program GNSO Update Saturday 21 June 2014.
Advertisements

Bits and the "Why" of Bytes: Representing Information Digitally
Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
Digital Media Text Text Text in History Text came into use about 6,000 years ago.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Addition : _________________ Binary Numbers (contd)
Chapter 8 Bits and the "Why" of Bytes: Representing Information Digitally.
Data Representation (in computer system) Computer Fundamental CIM2460 Bavy LI.
XML, CM, and KM KMWorld 2001 Thursday November 1, 2001 Darlene Fichter Data Library Coordinator University of Saskatchewan Libraries Frank Cervone Assistant.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Dale & Lewis Chapter 3 Data Representation
1 Adrian Rissoné Information Systems Manager Department of Palaeontology The Natural History Museum Introduction ISO and the.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Lecture 5.
2.1 Different Text Attributes Font A set of printable or displayable text characters with its style and size specified Arial 16 point bold Arial 32 point.
Interactive Multimedia Development
Text Text. Multimedia Elements u Text u Graphics u Animation u Sound u Video.
Text.
Text. Table of Content 1.Introduction of text. 2.Text elements. 3.Types of text. 4.Fonts and typefaces. 5.Font Terminology. 6.Classification of fonts.
Chapter 2: Text.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Chapter Four Documents: The raw material How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
Chapter 6 Text and Multimedia Languages and Properties
Based on: Companion to Data Communications: From Basics to Broadband, Third Edition by William J. Beyda © 2000 Prentice Hall, Inc. All Rights Reserved.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Chapter 4-Text.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
B.Sc. Multimedia ComputingMedia Technologies Character Representation & Font Technology.
COM 205 Multimedia Applications St. Joseph’s College Fall 2004.
HTML (HyperText Markup Language)
CHAPTER FIVE TEXT.
Copyright (c) 2004 Prentice-Hall. All rights reserved. 1 Committed to Shaping the Next Generation of IT Experts. Formatting Text and Lists Essentials for.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Compsci Today’s topics l Binary Numbers  Brookshear l Slides from Prof. Marti Hearst of UC Berkeley SIMS l Upcoming  Networks Interactive.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to Interactive Media Interactive Media Components: Text.
Compsci Today’s topics l Binary Numbers  Brookshear l Slides from Prof. Marti Hearst of UC Berkeley SIMS l Upcoming  Networks Interactive.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Text. Text came into use about 6,000 years ago Text in History.
1 MULTIMEDIA TECHNOLOGY SMM 3001 MEDIA - TEXT. 2 What is Text? the basic element of most multimedia the basic element of most multimedia consisting of.
Week - 9 Multimedia: Text element. Overview Importance of text in a multimedia presentation. Understanding fonts and typefaces. Using text elements in.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
CS 101 – Sept. 11 Review linear vs. non-linear representations. Text representation Compression techniques Image representation –grayscale –File size issues.
1.4 Representation of data in computer systems Character.
Essential Skills for Computing Fonts
Binary Representation in Text
Binary Representation in Text
INTERNATIONALIZATION
Characters & Fonts Digital Multimedia, 2nd edition
DFP 4113 MULTIMEDIA TECHNOLOGY
Chapter 2: Text.
Text.
Representing Information as bit patterns
Text.
Basic Communication Concepts
Characters & Fonts Digital Multimedia, 2nd edition
DirectWrite By Lukas Morozovas™.
Assist. Lecturer Safeen H. Rasool Collage of Science Department of IT
INFOCODING BASICS & EXAMPLES OF CURRENT USE
The Data Element.
The Data Element.
ASCII and Unicode.
Presentation transcript:

Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown

Text - Representation ● ASCII – 7-bit code – 128 values in ASCII character set (English Alphabet) – use of 8th bit in text editors/word processors creates incompatibility ● ISO character sets – extended ASCII to support non-English text (symbols such as ¢ or œ ) – ISO Latin provides support for accented characters ● à, ö, ø, etc. – ISO sets include Chinese, Japanese, Korean & Arabic ● UNICODE – 16 bit format (Roman vs. Western European or Kanji – Japan) – 65,000 different symbols – 25 supported scripts of Version 2.0 Unicode Standard: Arabic, Armenian, Bengali, Bopomofo, Cyrilic, Devanagari, Georgian, Greek, Gujarati, Gurmkhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Latin, Lao, Malayalam, Oriya, Phonetic, Tamil, Telugu, Thai, Tibetan

ASCII ● All uppercase and lowercase letters ● Punctuation symbols like !., ? : ; “ ‘ etc. ● Digits 0, …, 9 ● Arithmetic symbols + = - / ● Assorted special symbols like $ % ^ & * ( ) { } [ ] etc. ● Invisible formatting characters

ASCII

– Marked-up text ● nroff, troff ● LaTEX ● SGML – HTML – HyTime – XML, XSL, XLL – Structured Text ● structure of text represented in data structure, usually tree-based ● ODA, structure embedded in byte-stream with content – Hypertext ● non-linear ● graph or “web” structure : nodes and links ● currently subject of intensive ISO standards activity Text - Representation

● Character operations – basic data type with assigned value – permits direct character comparison (a<b) ● String operations – comparison – concatenation – substring extraction and manipulation ● Editing – perhaps the most familiar set of operations on text – cut/copy/paste – strings v. blocks, dependent on document structure Text - Operations

● Formatting – interactive or non-interactive (WYSIWYG v. LaTEX) – formatted output ● bitmap ● page description language (Postscript, PDF) – font management ● typeface ● point size (1 point = 1/72 of an inch) ● TrueType fonts : geometric description + kerning ● Pattern-matching and Searching – search and replace – wildcards – regular expressions – for large bodies of text, or text databases, use of inverted indices, hashing techniques and clustering. Text - Operations

● Sorting – numerous varieties of sort, all of them extensively studied in basic programming – sort complexity is a major factor in data handling performance ● Compression – ASCII uses 7 bits per character, though most word-processors actually use the 8th bit to use up a byte per character – Information theory estimates 1-2 bits per character to be sufficient for natural language text – This redundancy can be removed by encoding : ● Huffman : varies the numbers of bits used to represent characters, shortest codes for highest frequency characters ● Lempel-Ziv : identifies repeating strings and replaces them by pointers to a table ● Both techniques compress English text at a ratio of between 2:1 and 3:1 Text - Operations

● Encryption – text encryption is widely used in electronic mail and networked information systems – most widely-used techniques : ● DES ● RSA public-key ● PGP – subject of major controversy : ● key escrow systems ● Clipper chip ● “strong” encryption now being legally outlawed in a number of countries ● Language-specific operations – spell-checking – parsing and grammar checking – style analysis Text - Operations

About Fonts and Faces ● A typeface – family of graphic character (include many type sizes & styles) ● A font is a collection of characters of a single size ● Styles are boldface and italic (underlining & outlining) ● Serif vs. Sans Serif (‘sans’(French) – without)