INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.

Slides:



Advertisements
Similar presentations
XHTML Basics.
Advertisements

Representing Information as Bit Patterns
Representing Information as Bit Patterns Lecture 4 CSCI 1405, CSCI 1301 Introduction to Computer Science Fall 2009.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
COMPUTER FUNDAMENTALS David Samuel Bhatti
Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every.
Computer Systems Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text.
9/15/09 - L3 CodesCopyright Joanne DeGroat, ECE, OSU1 Codes.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky Veronika.
Dale & Lewis Chapter 3 Data Representation
Introduction to Computing Using Python Chapter 6  Encoding of String Characters  Randomness and Random Sampling.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Agenda Data Representation – Characters Encoding Schemes ASCII
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Chapter 4: Representation of data in computer systems: Characters OCR Computing for GCSE © Hodder Education 2011.
Working with text ASCII and UNICODE.   
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
1 INFORMATION IN DIGITAL DEVICES. 2 Digital Devices Most computers today are composed of digital devices. –Process electrical signals. –Can only have.
CS151 Introduction to Digital Design
Text and Graphics September 26, Unit 3.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
1 3 Computing System Fundamentals 3.5 Data Representation.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
CISC1100: Binary Numbers Fall 2014, Dr. Zhang 1. Numeral System 2  A way for expressing numbers, using symbols in a consistent manner.  " 11 " can be.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
Representation of Characters
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.
Number Systems Denary Base 10 Binary Base 2 Hexadecimal Base 16
1 Problem Solving using Computers “Data....Representation, and Storage.
ASCII AND EBCDIC CODES By : madam aisha.
Representing Characters in a Computer System Representation of Data in Computer Systems.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
2. Data Formats. Introduction Examples pp Real World Data Computer Data Input device Dear Mom: Keyboard … Digital camera …
The ASCII Alphanumeric Code What is it? Why use it? How do we use it?
1.4 Representation of data in computer systems Character.
Introduction to computer science Lec2 cs111. Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8- bit character encoding used mainly on.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
Text and Images Key Revision Points.
Binary Representation in Text
Binary Representation in Text
Chapter 3 Data Representation Text Characters
Lesson Objectives Aims You should be able to:
Representing Information as bit patterns
Phnom Penh International University (PPIU)
Data Encoding Characters.
TOPICS Information Representation Characters and Images
Lecture 3 ISE101: Computing Fundamentals
Computer Data Types Basics of Computing.
Fundamentals of Data Representation
Tutorial 1.3 Using Element Attributes
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Learning Intention I will learn how computers store text.
Lab 3: File Permissions.
Introduction to UNICODE (ஒருங்குறி)
Presentation transcript:

INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder

ASCII: Tables & Description  American Standard Code for Information Interchange  Computers represent data as numbers, so an ASCII code is the numerical representation of a character such as ‘a’ or or an action of some sort  ASCII was introduced more then half a century ago, so it includes non-printing characters that are rarely used for their original purpose  See enclosed the ASCII character table(s) which include descriptions of the first 32 non-printing characters  Originally, ASCII was designed for use with teletypes and so the descriptions are somewhat obscure (c) 2010 Gideon Frieder

ASCII: Tables and Descriptions  If an ASCII format document is requested, this means that the document should contain just ‘plain’ text with no formatting such as tabs, bold, or underscoring (raw format)  This is usually done so that such document can easily be imported into almost all applications  Notepad creates ASCII text, or in MS Word you can save a file as ‘text only’ (.txt) (c) 2010 Gideon Frieder

ASCII  Originally used 128 codes (7 bits)  The 8 bit ASCII was 7 bit info and one bit for data assurance (parity bit)  Later extended to 256 codes (8 bits)  The extension was divided into two parts: “Unused” codes “Allowed” codes  Never really observed, THUS ISO 8859!! (c) 2010 Gideon Frieder

ASCII & UNICODE  ASCII is still heavily used today  Being gradually replaced by a new coding standard called UNICODE  Comes in different encodings  Most prevalent are UTF-8 (one byte codes) and UTF-16 (two byte codes) (c) 2010 Gideon Frieder

UNICODE: UTF-8 & UTF-16  The first 128 codes in UTF-8 are identical to the ASCII codes  ASCII coded files can usually be processed by programs that assume UTF-8 coding  UTF-16 coding covers non-Latin character sets  Far Eastern character sets (i.e., Indian, Thai, Japanese Kana)  Ideograms (i.e., Korean)  Pictograms (i.e., Chinese, Japanese Kanji)  As symbols evolve and change, so does UNICODE (i.e., simplified Chinese) (c) 2010 Gideon Frieder

ASCII Code  Standardized under ISO 8859, creating standard extended codes (codes )  Major variants are:  ISO Latin  ISO Eastern European  ISO Cyrillic  Microsoft has its own ASCII version called “code page 1252” which is ISO 8859 – 1 compatible, but uses the “unused” codes (c) 2010 Gideon Frieder

ASCII: Generic

Extended ASCII Codes

(c) 2010 Gideon Frieder

From Wikipedia 2011 Structure of QR Code & Highlights of Functional Elements