Numbers in Codes GCNU 1025 Numbers Save the Day. Coding Converting information into another form of representation (codes) based on a specific rule Encoding:

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Digital Fundamentals Floyd Chapter 2 Tenth Edition
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Computer Science 335 Data Compression.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Chapter 9: Huffman Codes
Variable-Length Codes: Huffman Codes
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Digital Logic Chapter 2 Number Conversions Digital Systems by Tocci.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Dale & Lewis Chapter 3 Data Representation
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
© 2009 Pearson Education, Upper Saddle River, NJ All Rights ReservedFloyd, Digital Fundamentals, 10 th ed Digital Fundamentals Tenth Edition Floyd.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Communication Technology in a Changing World Week 2.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Error Detection and Correction
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter Nine: Data Transmission. Introduction Binary data is transmitted by either by serial or parallel methods Data transmission over long distances.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
The Mathematics of Star Trek Workshop
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
3.1 Denary, Binary and Hexadecimal Number Systems
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Communication Technology in a Changing World
Communication Technology in a Changing World
Fundamentals of Data Representation
Chapter Nine: Data Transmission
Greedy Algorithms Alexandra Stefan.
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Numbers in Codes GCNU 1025 Numbers Save the Day

Coding Converting information into another form of representation (codes) based on a specific rule Encoding: information to symbols Decoding: symbols back to information

Binary Codes Two symbols are used to represent data Example: Morse code, ASCII code

Morse Codes On-off tones, lights, clicks, dots and dashes, etc. Click to learn Morse codes

Binary codes Language of computers: 0 and 1 (binary system) Codeword: a string of 0’s and 1’s representing a character ASCII code: American Standard Code for Information Interchange 128 characters, of which 33 are control characters Enables the use of same codewords in different machines

Binary Codes: Play

Coding of Chinese characters (optional) Example: Chinese telegraph code (non-binary) 4-digit: Decoding (relatively easy for documentation): 3413  法 Encoding (more difficult for documentation): 法  3413 Four-corner method: method for documentation for encoding

Coding of Chinese characters (optional) Example: Chinese telegraph code (non-binary) 4-digit: Four-corner method: method for documentation for encoding

Coding of Chinese characters (optional) Example: Chinese telegraph code (non-binary) 4-digit: Four-corner method: method for documentation for encoding

Announcement In-class Assignment #1 on Sep 19 (Friday) 10% of final score Coverage: up to Section 2.2 Books, notes, other materials and discussions all allowed Help from instructor and teaching assistant Assignments submitted after class subject to penalty

Numbers in Codes GCNU 1025 Numbers Save the Day

Error-detection for binary codes Rule: every valid codeword has a special property Parity check: a validity check concerning the parity (i.e. being odd or even) of the number of 1’s in a codeword Example: 1001 sent as 1011 Original number of 1’s: 2 (even) Number of 1’s in 1011: 3 (odd) 1 error leads to a change of parity of the number of 1’s Error detected if all valid codewords consist of an even number of 1’s

Simple parity check Rule: the last digit of a codeword is a check digit appended to the original message (to be sent) so that the total number of 1’s in the codeword is even! Example: sending a message Check digit to be appended: 0 Codeword for the message: (total number of 1’s: 2) Example: sending a message Check digit to be appended: 1 Codeword for the message: (total number of 1’s: 4)

Error-correction in codes Is it possible to detect AND correct an error (without re- transmission/data re-entry)? Error-correction by multiple entries Sending all messages 3 times regardless of existence/absence of errors Example: 1100 sent as Error-correction power: a received message of can be automatically corrected to without further data re-entry High resources demand: tripling message length Error-correction by multiple parity check digits

Error-correction in codes Example: transmit 1001 by multiple parity check digits 

Error-correction in codes 1-error correction: if the message received is How can we detect/correct the error? (assume at most 1 error)

Lengths of codes Basic question: how many digits do we need? How many digits are needed to encode 2 characters (e.g A, B)? How many digits are needed to encode 4 characters? How many digits are needed to encode 26 characters (A-Z)?

Numbers in Codes GCNU 1025 Numbers Save the Day

Efficiency of data transmission Run-length encoding (RLE): reduce number of characters transmitted (data compression) Example: black-and-white documents

Efficiency of data transmission Run-length encoding (RLE): reduce number of characters transmitted (data compression) Example: black-and-white documents

Efficiency of data transmission Run-length encoding (RLE): reduce number of characters transmitted (data compression) Example: black-and-white documents Reduce length of duplicated characters Common for faxed documents and files containing runs Size increased if runs are absent

Efficiency of data transmission Example: two different ways of encoding Type 1 Type 2

Efficiency of data transmission Example: two different ways of encoding Type 1: fixed number of digits used Type 2: different numbers of digits used

Efficiency of data transmission Different coding methods Fixed length code: fixed number of digits used Variable length code: different numbers of digits used

Efficiency of data transmission Variable length code Shorter code for frequently used characters: efficiency enhanced Is there anything wrong with the following code? Is there anything wrong in encoding BIT? Is there anything wrong in decoding ?

Efficiency of data transmission Variable length code Is there anything wrong with the following code? Is there anything wrong in encoding BIT? No! Is there anything wrong in decoding ? Yes! Possible multiple interpretations (BIT or FET)! Prefix property: no codeword can be a prefix of another codeword Uniquely decipherable code: code satisfying the prefix property

Efficiency of data transmission Variable length code Prefix property: no codeword can be a prefix of another codeword Uniquely decipherable code: code satisfying the prefix property Example: the code is not uniquely decipherable as the codeword of B is a prefix of the codeword of F (this set of code does not satisfy the prefix property)

Efficiency of data transmission Variable length code Example: do these two codes satisfy the prefix property?

Numbers in Codes GCNU 1025 Numbers Save the Day

Efficiency of data transmission Variable length code Which of the two uniquely decipherable codes is more efficient?

Efficiency of data transmission Variable length code Which of the two uniquely decipherable codes has a shorter average length?

Efficiency of data transmission Variable length code Which of the two uniquely decipherable codes has a shorter average length? Scheme 1! Example: DELETE THE FILE Scheme 1: 52 digits in total Scheme 2: 49 digits in total

Efficiency of data transmission Variable length code Which of the two uniquely decipherable codes has a shorter average length? Scheme 1! Example: DELETE THE FILE Scheme 1: 52 digits in total Scheme 2: 49 digits in total E is a very common (heavy) character Frequencies (weights) also important! Weighted average should be considered instead

Efficiency of data transmission

Variable length code Weighted average code length Choice of frequency tables: Choice #1: frequency table from specific message Choice #2: general frequency table for typical English passages

Efficiency of data transmission Variable length code Weighted average code length (Partial) example: Morse code

Classwork: Calculate the weighted average code length for the Morse codes, using the general frequency table Answer: 2.544

Numbers in Codes GCNU 1025 Numbers Save the Day

Huffman code Aim: produce a code with the smallest weighted average code length for a given frequency table Basic principle: shorter codewords for more frequent characters Tool: a tree built from bottom to top with characters being the “leaves”

Huffman code Example: a code for 4 characters Step 1: combine the 2 with lowest probabilities

Huffman code Example: a code for 4 characters Step 2: combine the 2 among “D”, “E” and “LT” with lowest probabilities

Huffman code Example: a code for 4 characters Step 3: combine the 2 among “E” and “LTD” with lowest probabilities

Huffman code Example: a code for 4 characters Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability

Huffman code Example: a code for 4 characters Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability

Huffman code Example: a code for 4 characters Step 5: read out the codewords from the top of the tree

Huffman code Example: a code for 4 characters Does the code constructed this way always satisfy the prefix property? If “11” is a codeword for D, is it possible for other codewords to begin with “11”? No (as the branch for D stops at “11”)!

Classwork: Constructing Huffman code

Numbers in Codes GCNU 1025 Numbers Save the Day

Constructing Huffman code

Huffman code: remarks Multiple possible Huffman codes for same frequency table Different number of layers possible Are the weighted average code lengths the same? Different Huffman codes for same frequency table have same weighted average code length Smallest weighted average code length guaranteed (proof out of scope)

Huffman codes: comparison

Numbers in Codes GCNU 1025 Numbers Save the Day

Arithmetic coding No one-to-one correspondence between characters and codewords (unlike Huffman code) Encode whole message into one number Example: “DELETE” encoded as (decimal number)

Arithmetic coding Example: “DELETE” encoded as (decimal number) Step 1: Divide the interval (0, 1) into portions

Arithmetic coding Example: “DELETE” encoded as (decimal number) Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)

Arithmetic coding Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)

Arithmetic coding Example: “DELETE” encoded as (decimal number) Step 3: Choose (zoom into) portion of second character “E” and divide the portion according to the probabilities

Arithmetic coding Step 4: Keep choosing (zooming into) portions in correct order and dividing the chosen portion according to the probabilities

Arithmetic coding Step 5: Choose the portion of “END” when the message ends Step 6: Choose any number within the range of “END” as the codeword for the message (e.g )

Arithmetic coding Example: decoded with the frequency table Step 1: Divide into portions Step 2: Where is ? Zoom in! is in Section D: first character of message is “D” Step 3: Repeat Step 1 and 2. Stop when it hits “END”!

Arithmetic coding

Numbers in Codes GCNU 1025 Numbers Save the Day

Units in daily life Examples of prefixes: Mega-pixel Nano-meter Giga-watt

SI prefixes International system of units Examples: km, mm, cm, mL Some common SI prefixes:

Units in data transmission

SI prefixes commonly used for transmission speed Example: 56kbps kbps: kilo-bit per second kilo (SI prefix): 1000 Bit: binary digit

Binary prefixes Different from SI prefixes: same letter, different meaning 1024 used instead of 1000 Comparison:

Units in computer systems (file size)

Units in telecommunication

Example: How long does it take to download a 4 MB song via a 56K modem? 4 MB: 4 x 1024 x 1024 x 8 bits 56k modem: 56 Kbps transfer rate 56 Kbps: 56 x 1000 bits per second (Minimum) Time needed for downloading: ~600 secondsTime needed for downloading

Classwork 10: telecommunication

Units in hard disk packaging Confusion in units: SI prefixes used in packaging of hard disks/flash drives True capacity of disk/computer memory (e.g. RAM)/file size expressed by binary prefixes

Units in hard disk packaging

Numbers in Codes -End-