Lis508 lecture 1: bits, bytes and characters Thomas Krichel 2002-09-23.

Slides:



Advertisements
Similar presentations
Fill in missing numbers or operations
Advertisements

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
ITR3 lecture 1: bits, bytes and characters Thomas Krichel
Lis508 lecture 1: bits, bytes and characters Thomas Krichel
Transforming Data into Information lesson 7 This lesson includes the following sections: How Computers Represent Data How Computers Process Data Factors.
Data Representation. Units & Prefixes Review kilo, mega, and giga are different in binary! bit (b) – binary digit Byte (B) – 8 binary digits KiloByte.
Information Representation
ICS312 Set 2 Representation of Numbers and Characters.
A digital system is a system that manipulates discrete elements of information represented internally in binary form. Digital computers –general purposes.

Data Representation in Computers
Computer Arithmetic: Binary, Octal and Hexadecimal Presented by Frank H. Osborne, Ph. D. © 2005 ID 2950 Technology and the Young Child.
Binary and Decimal Numbers
IT-101 Section 001 Lecture #4 Introduction to Information Technology.
© BYU 02 NUMBERS Page 1 ECEn 224 Binary Number Systems and Codes.
Representing Information in Binary (Continued)
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Agenda Data Representation – Characters Encoding Schemes ASCII
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Bits & Bytes: How Computers Represent Data
Computers Organization & Assembly Language
Digital Design: From Gates to Intelligent Machines
Data Representation S2. This unit covers how the computer represents- Numbers Text Graphics Control.
Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages.
Chapter 2 Computer Hardware
Dept. of Computer Science Engineering Islamic Azad University of Mashhad 1 DATA REPRESENTATION Dept. of Computer Science Engineering Islamic Azad University.
Informatics I101 February 25, 2003 John C. Paolillo, Instructor.
1 Digital Systems and Binary Numbers EE 208 – Logic Design Chapter 1 Sohaib Majzoub.
Lec 3: Data Representation Computer Organization & Assembly Language Programming.
ICS312 Set 1 Representation of Numbers and Characters.
Introduction to Computer Design CMPT 150 Section: D Ch. 1 Digital Computers and Information CMPT 150, Chapter 1, Tariq Nuruddin, Fall 06, SFU 1.
CS151 Introduction to Digital Design
Number systems & Binary codes MODULE 1 Digital Logic Design Ch1-2 Outline of Chapter 1  1.1 Digital Systems  1.2 Binary Numbers  1.3 Number-base Conversions.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
EET 250 Number systems. Introduction to Number Systems While we live in a world where the decimal number is predominant in our lives, computers and digital.
Lis508 lecture 2: characters to textual documents Thomas Krichel
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
Overview 1-1 Information Representation
Chapter 1 Representing Data in a Computer. 1.1 Binary and Hexadecimal Numbers.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
1.4 Representation of data in computer systems Character.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
DATA REPRESENTATION - TEXT
Chapter 1 Digital Systems and Binary Numbers
Chapter 8 & 11: Representing Information Digitally
Machine level representation of data Character representation
Chapter 3 Data Representation Text Characters
NUMBER SYSTEMS.
Bits & Bytes How Computers Represent Data
Chapter 1 Digital Systems and Binary Numbers
Data Encoding Characters.
Representing Characters
LING 388: Computers and Language
Ch2: Data Representation
Data Representation Conversion 05/12/2018.
Information Representation
COMS 161 Introduction to Computing
Number Systems Lecture 2.
COMS 161 Introduction to Computing
LO1 – Understand Computer Hardware
Text Representation ASCII Collating Sequence
Chapter 3 - Binary Numbering System
ASCII and Unicode.
Presentation transcript:

lis508 lecture 1: bits, bytes and characters Thomas Krichel

Structure Bits Bytes Character sets –Coded character set –Character endcoding

Literature Norton new inside the PC chapter 4 htmhttp:// htm ations/ictp99/ictp99N2705.htmlhttp://wwwinfo.cern.ch/asdoc/WWW/public ations/ictp99/ictp99N2705.html htmlhttp:// html

Information Information is best understood as what it takes to answer a question. The simplest question has a yes or no answer. Therefore a bit is the natural measure of information. Term first used by John Turkey in Concatenation of binary digit.

Usage of bits Computers are sometimes classified by –The number of bits they can process at one time i.e. the register size. Larger registers make a computer run faster. –The number of bits they use to represent addresses i.e. address size. A larger address size allows to run larger programs. Graphics are also often described by the number of bits used to represent each dot.

Many bits The first chips used to process 8 bits at a time. It become customary to refer to them as a byte. Larger units are –Kilo byte is 2 power 10 bytes –Mega bytes is 2 power 20 bytes –Giga bytes is 2 power 30 bytes –Tera byte is 2 power 40 bytes From ancient Greek words for "thousand", "large", "giant", and "monster", respectively. Terms date back to the French revolution.

More than a monster In 1975, the General Conference of Weights and Measures (CGPM), based at Sèvres near Paris, agreed to add peta- (P) and exa- (E) Petabyte is 2 power 50 bytes Exabyte in 2 power 60 Nowadays they are followed by yottabyte (70) and zettabyte (80)

Hex numbers A byte is often represented by two hex numbers. Each hex number can encode 16 values Written 0 to 9, then A B C D E F. F is 15. Here, prefixed with 0x Use Microsoft calculator with scientific notation to convert.

decimal/binary numbers

Characters Much of the information processed by computers is in the form of characters. A character only makes sense for a human user of a minimum cultural level. A character is not a glyph. –ligatures

Representing characters Computers don't understand text, they only understand numbers. For computers to be able to treat text, there must be a correspondence between numbers and text characters. Such a correspondence is called a coded character set. Important examples are –ASCII –ISO –cp1252

ASCII American Standard Code for Information Interchange 7-bit character set. There is no such thing as 8-bit ASCII 95 printable symbols 33 control characters (0-31, 127) scii2.html has a list. scii2.html

ASCII control codes ACK (6, ^F) used to acknowledge receipt of message, NAK (21, ^U) used to signal non- receipt CR (13, ^M) is the carriage return LF (10, ^J) is the linefeed FF (12, ^L) is the form feed (new page) BS (8, ^H) is the backspace DEL (ALT-127) is delete ESC (^[) escape Different programs use them in different ways, a big pain in the a…

ISO PCs work with bytes, so manufactures were free to fill the other 128 characters. ISO , aka ISO-latin-1, it extends ASCII with characters that are used by the western European languages. It is the default character set of html. Positions 128 to 159 are not used. Cp1252 fills these with graphic chars.

Three concepts for characters Abstract Character Repertoire: the set of characters to be encoded, e.g., some alphabet or symbol set Coded Character Set : a mapping from an abstract character repertoire to a set of non-negative integers Character Encoding Scheme: a mapping from a coded character set to a serialized sequence of bytes

ISO Defines the Universal Character Set (UCS) UCS contains the characters required to represent characters used by practically all known languages, even the likes of Gurmukhi, Oriya, Telugu, Bopomofo, Runic. There are proposals for more, like Hieroglyphs and Tengwar. Note that there are about 6800 known languages..

UCS organization ISO defines formally a 31-bit character set. They are represented as 32 bits, i.e. 4 bytes, or 8 hex chars. The canonical form of ISO uses a four-dimensional coding space consisting of 256 groups. Each group consists of 256 planes with each plane containing 256 rows, each having 256 cells.

UCS organization The first plane (Plane 0x00) of Group (0x00) is called the Basic Multilingual Plane (BMP). It has been fixed since first publication. The subsequent 223 planes (0x01 to 0xDF) of Group 0x00, as well as planes 0x00 to 0xFF in Groups 0x01 to 0x5F are reserved for further standardization. The last 32 planes (0xE0 to 0xFF) of Group 0x00, as well as all code positions of 32 groups (0x60 to 0x7F) are reserved for private use.

Relationship with legacy sets Let U+(four hex numbers) denote characters in the BMP. The UCS characters U+0000 to U+007F are identical to those in ASCII The range U+0000 to U+00FF is identical to ISO (Latin-1).

Types of characters in UCS Letters –Base characters –Ideographic characters –Combining characters Digits Extenders

Thank you for your attention!