CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 15 - Case Study: Huffman Codes.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
CS-2852 Data Structures LECTURE 11 Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Greedy Algorithms Huffman Coding
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Dale & Lewis Chapter 3 Data Representation
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
Characters CS240.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Assignment 6: Huffman Code Generation
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
ISNE101 – Introduction to Information Systems and Network Engineering
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com

CS-2852 Data Structures, Andrew J. Wozniewicz Agenda Encodings Morse Code Huffman Trees

CS-2852 Data Structures, Andrew J. Wozniewicz Character Encodings UNICODE (6.0): Different Encodings – UTF-8 8-bits per character for ASCII chars Up to 4 bytes per character for other chars – UTF-16 2 bytes per character Some characters encoded with 4 bytes

CS-2852 Data Structures, Andrew J. Wozniewicz Character Encodings Standard ASCII – 7 bits per character – 128 distinct values Extended ASCII – 8 bits per character – 256 distinct values EBCDIC (“manifestation of purest evil” – E. Raymond) – 8 bits per character – IBM-mainframe specific

CS-2852 Data Structures, Andrew J. Wozniewicz Fixed-Length Encoding UTF-8, Extended ASCII: – 8 bits per character Always the same number of bits per symbol Log 2 n bits per symbol to distinguish among n symbols More efficient encoding possible, if not all characters are equally likely to appear

CS-2852 Data Structures, Andrew J. Wozniewicz Variable-Length Encoding

CS-2852 Data Structures, Andrew J. Wozniewicz Morse Code The length of each character in Morse is approximately inversely proportional to its frequency of occurrence in English. The most common letter in English, the letter "E," has the shortest code, a single dot.

CS-2852 Data Structures, Andrew J. Wozniewicz Prefix Codes Design a code in such a way that no complete code for any symbol is the beginning (prefix) for any other symbol. If the relative frequency of symbols in a message is known: efficient prefix encoding can be found Huffman encoding is a prefix code

CS-2852 Data Structures, Andrew J. Wozniewicz Huffman Encoding Lossless data compression algorithm “Minimum-redundancy” code by David Huffman (1952) Variable-length code table The technique works by creating a binary tree of nodes. The symbols with the lowest frequency appear farthest away from the root. The tree can itself be efficiently encoded and attached with the message to enable decoding.

CS-2852 Data Structures, Andrew J. Wozniewicz Example of a Huffman Tree left=0, right=1

CS-2852 Data Structures, Andrew J. Wozniewicz Huffman Algorithm Begin with the set of leaf nodes, containing symbols and their frequencies Find two leaves with the lowest weights and merge them to produce a node that has these two nodes as its left and right branches. – The weight of the new node is the sum of the two weights Remove the two leaves from the original set and replace them by this new node.

CS-2852 Data Structures, Andrew J. Wozniewicz Example of Huffman Tree-Building Initial leaves{(A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1)} Merge{(A 8) (B 3) ({C D} 2) (E 1) (F 1) (G 1) (H 1)} Merge{(A 8) (B 3) ({C D} 2) ({E F} 2) (G 1) (H 1)} Merge{(A 8) (B 3) ({C D} 2) ({E F} 2) ({G H} 2)} Merge{(A 8) (B 3) ({C D} 2) ({E F G H} 4)} Merge{(A 8) ({B C D} 5) ({E F G H} 4)} Merge{(A 8) ({B C D E F G H} 9)} Final merge{({A B C D E F G H} 17)}

CS-2852 Data Structures, Andrew J. Wozniewicz Summary Encodings Morse Code Huffman Trees

Questions? Image copyright © 2010 andyjphoto.com