Digitizing Discrete Information

Slides:



Advertisements
Similar presentations
Review of HTML Ch. 1.
Advertisements

Review Ch.1,Ch.4,Ch.7. Review of tags covered various header tags Img tag Style, attributes and values alt.
Bits and the "Why" of Bytes: Representing Information Digitally
Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode.
Chapter 7 Representing Information Digitally. Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Learning Objectives Explain.
Chapter 7 Representing Information Digitally. Copyright © 2013 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Learning Objectives Explain.
Craig Schock, 2003 Binary Numbers Numbering Systems Counting Symbolic Bases Common Bases (10, 2, 8, 16) Representing Information Binary to Decimal Conversions.
1. Discrete / Continuous Representations Of numbers – binary & decimal Bits Hexadecimal - 'Hex' Representing text Bits and Bytes.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Chapter 8_1 Bits and the "Why" of Bytes: Representing Information Digitally.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
Chapter 7 Representing Information Digitally. Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA.
Chapter 8 Bits and the "Why" of Bytes: Representing Information Digitally.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
Chapter 2 Data Representation. Define data types. Visualize how data are stored inside a computer. Understand the differences between text, numbers, images,
1 Data Representation Computer Organization Prof. H. Yoon DATA REPRESENTATION Data Types Complements Fixed Point Representations Floating Point Representations.
The character data type char
Digital Design: From Gates to Intelligent Machines
Decimal Binary Octal Hex
Dept. of Computer Science Engineering Islamic Azad University of Mashhad 1 DATA REPRESENTATION Dept. of Computer Science Engineering Islamic Azad University.
Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly.
Digital Information  digits are symbols chapter 8 BITS & THE “WHY” OF BYTES.
Informatics I101 February 25, 2003 John C. Paolillo, Instructor.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Chapter 1 Evolution of Communication Networks.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
CS151 Introduction to Digital Design
The Information School of the University of Washington Oct 13fit digital1 Digital Representation INFO/CSE 100, Fall 2006 Fluency in Information Technology.
1 Information Representation in Computer Lecture Nine.
Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode.
The Information School of the University of Washington 15-Oct-2004cse digital1 Digital Representation INFO/CSE 100, Spring 2005 Fluency in Information.
CS 2130 Lecture 23 Data Types.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
Programming for GCSE Topic 2.2: Binary Representation T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
Chapter 7 Representing Information Digitally. Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA.
Binary Representation in Text
Binary Representation in Text
4th Edition, Irv Englander
Chapter 8 & 11: Representing Information Digitally
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Machine level representation of data Character representation
Chapter 3 Data Representation Text Characters
Binary Numbers and ASCII and EDCDIC
Chapter 2 Data Types and Representations
Digital Electronics Jess 2008.
Bits and the "Why" of Bytes: Representing Information Digitally
TOPICS Information Representation Characters and Images
Javascript, Loops, and Encryption
Chapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Ch2: Data Representation
ASCII Character Codes nul soh stx etx eot 1 lf vt ff cr so
Introduction to Computer Engineering
Data Representation Conversion 05/12/2018.
Digital Representation
Cosc 2P12 Week 2.
Chapter 2 Data Representation.
Number Systems Lecture 2.
Text Encoding.
School of Computer Science and Technology
Introduction to Computer Engineering
Rayat Shikshan Sanstha’s S. M. Joshi College, Hadapsar
Text Representation ASCII Collating Sequence
Cosc 2P12 Week 2.
Digital Representation of Data
ASCII and Unicode.
Chapter 2 Bits, Data Types, and Operations
Presentation transcript:

Digitizing Discrete Information Digitize Represent info with digits (symbols) Digits: { 0, 1, 2, …, 9 } Or digits: { A, B, C, …, Z } Or any set of distinct symbols

Symbols, Briefly Prefer short names for symbols One, two, …, Instead of “asterisk”, “closing parenthesis”, etc. Aside: we shorten many names in IT exclamation point => bang asterisk => star open parenthesis => open paren open curly brace => open brace

Ordering Symbols Want order for the digits/symbols Digitize 0 – 9 has obvious order But what about { !, @, #, …, ) }? Define a collating sequence Digitize Represent info with symbols

Fundamental Information Representation Given digital info, how to store it? Use physical phenomena Light Current Magnetism

Fundamental Information Representation In digital world Don’t care how much, just presence In logical world (basis of computing) True and false

Fundamental Information Representation Physical world can implement logical world Presence => “true” Absence => “false”

The PandA Representation We will use “PandA” for presence and absence representation Only two states Could use false for absent, true for present Or 0 for absent, and 1 for present

The PandA Representation Such a formulation is said to be discrete Discrete means “distinct” or “separable” Opposite of continuous No “shades of gray”

Analog vs. Digital Analog is continuous data/information Sound waves

Analog vs. Digital Digital is discrete info Obtained by sampling

A Binary System PandA encoding is binary

Bits Form Symbols PandA unit is a binary digit (bit) Bit sequences form binary numbers

Encoding Bits on a CD-ROM PandA bit values are pits and lands

Bits in Computer Memory Memory is a long sequence of bits Sidewalk Analogy

Sidewalk Memory Imagine clean sidewalk consisting of squares Presence of a stone on a square => 1 Absence of a stone => 0 Sidewalk: sequence of bits

Sidewalk Memory 1 1 1

Sidewalk Memory Writing info Reading info Put stone on square (1) Remove stone from square (0) Reading info

Alternative PandA Encodings Other ways to encode two states Color of stone Number of stones Another?

Combining Bit Patterns One bit with two states isn’t enough So we combine them

Hex Explained Hex numbers are base-16 A bit sequence may be 1111111110011000111000101010 Error prone Instead use hex

The 16 Hex Digits Hex digits { 0, 1, 2, …, 9, A, B, C, D, E, F } Can represent 4-bit sequences 0000 = 0 hex 0001 = 1 hex … 1001 = 9 hex 1010 = A hex 1111 = F hex

Hex to Bits and Back Again Each hex digit corresponds to 4 bits 0010 1011 1010 1101 2 B A D F A B 4 1111 1010 1011 0100 1 9 C 6 ?

Digitizing Numbers in Binary Need binary representations for Numbers Characters But also image video sound

Counting in Binary Binary numbers (base 2) uses digits 0 and 1 Decimal numbers (base 10) use 0 through 9 Counting to ten

Counting in Binary Place value representation

Place Value in a Decimal Number Example, 1010 (base 10) is (1 × 1000) + (0 × 100) + (1 × 10) + (0 × 1)

Place Value in a Binary Number Binary is base 2 so powers of 2 are used

Place Value in a Binary Number 1010 in binary (1 × 8) + (0 × 4) + (1 × 2) + (0 × 1)

Digitizing Text # of bits determines # of symbols that can be represented n bits => 2n symbols

Digitizing Text To digitize English text Roman letters Arabic numbers Punctuation Arithmetic symbols

Assigning Symbols So we need to represent 26 uppercase 26 lowercase letters 10 numerals 20 punctuation characters 10 arithmetic characters 3 other characters (new line, tab, and backspace) 95 symbols…enough for English

Assigning Symbols To represent 95 distinct symbols we need how many bits? Need to represent control characters too

Assigning Symbols ASCII stands for American Standard Code for Information Interchange Widely used 7-bit code Advantages of a “standard” Interoperability of h/w Communications among programs

Extended ASCII: An 8-Bit Code For other languages 7 bits aren’t enough IBM developed an 8-bit ASCII Uses 1 byte Uses 0 in leftmost bit followed by 7-bit ASCII codes Allows 128 more codes that start with 1 Can handle most Western languages

ASCII Character Set (Decimal) Decimal - Character 0 NUL 1 SOH 2 STX 3 ETX 4 EOT 5 ENQ 6 ACK 7 BEL 8 BS 9 HT 10 NL 11 VT 12 NP 13 CR 14 SO 15 SI 16 DLE 17 DC1 18 DC2 19 DC3 20 DC4 21 NAK 22 SYN 23 ETB 24 CAN 25 EM 26 SUB 27 ESC 28 FS 29 GS 30 RS 31 US 32 SP 33 ! 34 " 35 # 36 $ 37 % 38 & 39 ' 40 ( 41 ) 42 * 43 + 44 , 45 - 46 . 47 / 48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7 56 8 57 9 58 : 59 ; 60 < 61 = 62 > 63 ? 64 @ 65 A 66 B 67 C 68 D 69 E 70 F 71 G 72 H 73 I 74 J 75 K 76 L 77 M 78 N 79 O 80 P 81 Q 82 R 83 S 84 T 85 U 86 V 87 W 88 X 89 Y 90 Z 91 [ 92 \ 93 ] 94 ^ 95 _ 96 ` 97 a 98 b 99 c 100 d 101 e 102 f 103 g 104 h 105 i 106 j 107 k 108 l 109 m 110 n 111 o 112 p 113 q 114 r 115 s 116 t 117 u 118 v 119 w 120 x 121 y 122 z 123 { 124 | 125 } 126 ~ 127 DEL

ASCII Character Set (Hexadecimal) Hexadecimal - Character 00 NUL 01 SOH 02 STX 03 ETX 04 EOT 05 ENQ 06 ACK 07 BEL 08 BS 09 HT 0A NL 0B VT 0C NP 0D CR 0E SO 0F SI 10 DLE 11 DC1 12 DC2 13 DC3 14 DC4 15 NAK 16 SYN 17 ETB 18 CAN 19 EM 1A SUB 1B ESC 1C FS 1D GS 1E RS 1F US 20 SP 21 ! 22 " 23 # 24 $ 25 % 26 & 27 ' 28 ( 29 ) 2A * 2B + 2C , 2D - 2E . 2F / 30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7 38 8 39 9 3A : 3B ; 3C < 3D = 3E > 3F ? 40 @ 41 A 42 B 43 C 44 D 45 E 46 F 47 G 48 H 49 I 4A J 4B K 4C L 4D M 4E N 4F O 50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W 58 X 59 Y 5A Z 5B [ 5C \ 5D ] 5E ^ 5F _ 60 ` 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6A j 6B k 6C l 6D m 6E n 6F o 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w 78 x 79 y 7A z 7B { 7C | 7D } 7E ~ 7F DEL

Beyond ASCII Unicode Uses up to 4 bytes to handle how many characters? Allows all modern scripts (Kanji, Arabic, Cyrillic, Hebrew, etc.) Contains 8-bit ASCII as the low 256 characters for compatibility Allows ancient scripts like Egyptian hieroglyphics

ASCII Coding of Phone Numbers How to encode 888 555 1212 in ASCII? Encode each digit with its ASCII byte 8 8 8 5 5 etc. 00111000 00111000 00111000 00110101 00110101 etc.

Another ASCII Example From Lab 1 CSCI ftw! Takes ? bytes to store. Representation in ASCII? 43 53 43 49 20 66 74 77 21 0A In Binary? 0100 0011 0101 0011 0100 0011 0100 1001 ... 0010 0001 0000 1010

Advantages of Long Encodings Short encodings save memory Examples of longer encodings NATO Broadcast Alphabet Bar Codes

NATO Broadcast Alphabet NATO alphabet Used for radio communication Purposely inefficient Distinctive amid noise (‘m’ versus ‘n’) Letters represented with word “symbols” a => alpha, b => bravo, c => charlie Digits keep their usual names Except 9 => niner

NATO Broadcast Alphabet

Bar Codes Universal Product Codes (UPC) use more bits than necessary UPC-A encoding uses 7 bits to encode the digits 0 – 9

Bar Codes Encodes manufacturer (left side) and product (right side) Different bit combinations are used for each side One side is complement of the other Bit patterns were chosen to appear as different as possible

Bar Codes Encodings for each side make it possible to recognize whether code is upside down

Metadata and the OED To represent info Need to convert to binary Need to describe its properties Characteristics of the content also need to be encoded How is the content structured? What other content is it related to? Where was it collected? When was it created or captured? What units is it given in? How should it be displayed? And so on…

Metadata and the OED Metadata info describing info often specified with tags (like with HTML)

Properties of Data ASCII encodes characters Metadata gives properties of data font style color justification margins etc.

Properties of Data Content and metadata example

Using Tags for Metadata Oxford English Dictionary (OED) Definitive reference for every English word’s meaning, etymology, and usage Printed version is 20 volumes, weighs 150 pounds, and fills 4 feet of shelf space

Structure Tags Digital OED uses tags to indicate structure <hw> for a headword (word defined) <pr> for pronunciation <ph> for phonetic notations <ps> for part of speech <hm> for homonym numbers <e> for entire entry <hg> for head group (all info at start of definition)

Structure Tags Algorithms utilize tags Search Formatting …

Quiz What’s the first step in debugging? Fix the error in this CSS check for obvious isolate the problem reproduce the problem pinpoint Fix the error in this CSS body { color; red }

Quiz Like all engineers, programmers begin with a _____________ – a precise description of the input, how the system should behave, and how the output should be produced.

Summary Digitizing info Storing info using PandA ASCII Metadata Bits, bytes, hex ASCII Metadata