Functional Programming Lecture 15 - Case Study: Huffman Codes.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
© M. Winter COSC 4P41 – Functional Programming Enumerated types data Temp = Cold | Hot data Season= Spring | Summer | Autumn | Winter weather ::
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
Communication Technology in a Changing World Week 2.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Characters CS240.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Functional Programming Lecture 16 - Case Study: Huffman Codes con’t.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Assignment 6: Huffman Code Generation
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Huffman Coding.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Communication Technology in a Changing World
Communication Technology in a Changing World
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Functional Programming Lecture 15 - Case Study: Huffman Codes

The Problem Design a coding/decoding scheme and implement in Haskell. This requires: - an algorithm to encode a message, - an algorithm to decode a message, - an implementation.

Fixed and Variable Length Codes A fixed length code assigns the same number of bits to each code word. E.g. ASCII letter -> 7 bits (up to 128 code words) So to encode the string at we need 14 bits. A variable length code assigns a different number of bits to each code word, depending on the frequency of the code word. Frequent words are assigned short codes; infrequent words are assigned long codes. E.g. a at encoded by for go left b t 1 for go right tree to encode and decode

Coding 0 1 a 0 1 b t a is encoded by 1 bit, 0 b is encoded by 2 bits, 10 t is encoded by 2 bits, 11 An important property of a Huffman code is that the codes are prefix codes: no code of a letter (code word) is the prefix of the code of another letter (code word). E.g. 0 is not a prefix of 10 or is not a prefix of 0 or is not a prefix of 0 or 10 So, aa is encoded by 00. ba is encoded by 100.

Decoding 0 1 a 0 1 b t The encoded message is decoded as: 10 - b 0 - a 11 - t 0 - a 11 - t In view of the frequency of t, this is probably not a good code. t should be encoded by 1 bit! ps. Morse code is a type of Huffman code.

A Haskell Implementation Types -- codes -- data Bit = L | R deriving (Eq, Show) type Hcode = [Bit] -- Huffman coding tree characters at leaf nodes, plus frequencies frequencies as well at internal nodes -- data Tree = Leaf Char Int | Node Int Tree Tree Assume that codes are kept in table (rather than read off a tree). -- table of codes -- type Table = [(Char, Hcode)]

Encoding -- encode a message according to code table encode each character and concatenate -- codeMessage :: Table -> [Char] -> Hcode codeMessage tbl = concat. map (lookupTable tbl) -- lookup the code for a character in code table -- lookupTable :: Table -> Char -> Hcode lookupTable [] c = error lookupTable lookupTable ((ch,code):tbl) c | ch == c = code | otherwise = lookupTable tbl c

Decoding -- decode a message according to code tree if at a leaf node, then character is decoded, start again at root if at an internal node, then follow sub-tree according to next code bit -- decode :: Tree -> Hcode -> [Char] decode tr = decodetree tr where decodetree (Node f t1 t2) (L:rest) = decodetree t1 rest decodetree (Node f t1 t2) (R:rest) = decodetree t2 rest decodetree (Leaf ch f) rest = ch:(decodetree tr rest)

Example codetree = Node 3 (Leaf a 0) (Node 3 (Leaf b 1) (Leaf t 2)) -- assume a is most frequent, denoted by smallest number -- message = [R,L,L,R,R,R,R,L,R,R] decode codetree message => decodetree Node 3 t1 (Node 3..) R: [L,L,R,R,R,R,L,R,R] => decodetree (Node 3 (Leaf b 1) (Leaf t 2)) L: [L,R,R,R,R,L,R,R] => decodetree ( Leaf b 1) L:[R,R,R,R,L,R,R] => b : decodetree Node 3 (Leaf a 0) (Node 3..) L: [R,R,R,R,L,R,R] => b: decodetree (Leaf a 0)) [R,R,R,R,L,R,R] => b : a: decodetree Node 3 (Leaf a 0) (Node 3..) [R,R,R,R,L,R,R]

We still have to make: the code tree the code table (Next lecture!)