Functional Programming Lecture 16 - Case Study: Huffman Codes con’t.

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 15 - Case Study: Huffman Codes.
Advertisements

Modern Programming Languages, 2nd ed.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
CS 410 Applied Algorithms Applied Algorithms Lecture #7 Counting.
Lecture 4 (week 2) Source Coding and Compression
0 PROGRAMMING IN HASKELL Chapter 10 - Declaring Types and Classes.
Merge and Count Merge and count step. n Given two sorted halves, count number of inversions where a i and a j are in different.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
HST 952 Computing for Biomedical Scientists Lecture 9.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
0 PROGRAMMING IN HASKELL Chapter 5 - List Comprehensions.
Optimal Merging Of Runs
Higher-Order Functions Koen Lindström Claessen. What is a “Higher Order” Function? A function which takes another function as a parameter. Examples map.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Advanced Programming Andrew Black and Tim Sheard Lecture 4 Intro to Haskell.
Q&A II – Sunday Feb 13 th 2011 BITS. Signed binary  What are the following numbers in signed binary?     
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Huffman Codes Message consisting of five characters: a, b, c, d,e
© M. Winter COSC 4P41 – Functional Programming Enumerated types data Temp = Cold | Hot data Season= Spring | Summer | Autumn | Winter weather ::
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
CSCE 3110 Data Structures & Algorithm Analysis Sorting (I) Reading: Chap.7, Weiss.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
A Second Look At ML 1. Outline Patterns Local variable definitions A sorting example 2.
1 You’re Invited! Course VI Freshman Open House! Friday, April 7, :30-5:00 PM FREE Course VI T-Shirts (while supplies last) and Department.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Recursion on Lists Lecture 5, Programmeringsteknik del A.
Chapter SevenModern Programming Languages1 A Second Look At ML.
CS1022 Computer Programming & Principles Lecture 2.2 Algorithms.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Review Quick Sort Quick Sort Algorithm Time Complexity Examples
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
List Operations CSCE 314 Spring CSCE 314 – Programming Studio Tuple and List Patterns Pattern matching with wildcards for tuples fst (a, _) = a.
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Set Comprehensions In mathematics, the comprehension notation can be used to construct new sets from old sets. {x2 | x  {1...5}} The set {1,4,9,16,25}
Set Comprehensions In mathematics, the comprehension notation can be used to construct new sets from old sets. {x2 | x  {1...5}} The set {1,4,9,16,25}
String is a synonym for the type [Char].
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
B/B+ Trees 4.7.
Merging Merge. Keep track of smallest element in each sorted half.
Higher-Order Functions
Huffman Codes Let A1,...,An be a set of items.
PROGRAMMING IN HASKELL
Huffman Compression.
Huffman Coding CSE 373 Data Structures.
Sorting.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
PROGRAMMING IN HASKELL
PROGRAMMING IN HASKELL
A G L O R H I M S T A Merging Merge.
Podcast Ch23d Title: Huffman Compression
A G L O R H I M S T A Merging Merge.
A G L O R H I M S T A Merging Merge.
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Functional Programming Lecture 16 - Case Study: Huffman Codes con’t

Frequencies to Code Tree How do we build a code tree with optimal codes? 1. Find frequencies of individual characters. i) scan message and pair each character with a count of 1 e.g. [(‘b’,1),(‘a’, 1),(‘t’,1), (‘t’,1 ),(‘a’,1),(‘t’,1)] from “battat” ii) bring together counts, and sort e.g. [(‘a’,2),‘(‘b’,1),(‘t’,3)] iii) and sort on frequency e.g. [(‘b’,1),(‘a’,2),(‘t’,3)]

2. Build up tree by repeatedly taking 2 characters which occur least frequently and make a single character, or tree. i) build a leaf node from each character and sort e.g. [Leaf ‘b’1, Leaf ‘a’, 2, Leaf ‘t’ 3] ii) amalgamate together, pairwise, according to smallest frequencies, until a single tree is built e.g. [Leaf ‘b’ 1, Leaf ‘a’ 2, Leaf ‘t’ 3] then [Node 3 (Leaf ‘b’1),(Leaf ‘a’, 2), Leaf ‘t’ 3] then [Node 6 (Node 3 (Leaf ‘b’1) (Leaf ‘a’, 2)) (Leaf ‘t’ 3)]

Tree to table t b a This tree can be turned into a table: b [ L,L] a [L,R] t [R] In Haskell a list of pairs: [(‘b’,[L,L]), (‘a’, [L,R]), (‘t’, [R])]

Frequencies Finding frequencies involves - count each character, - add together counts, - sort list This means we sort twice - to bring together the counts, - to sort on frequency So, we will define a mergeSort and reuse: frequency :: [Char] -> [(Char,Int)] -- count each character, add together counts, then sort list -- frequency = (mergeSort freqMerge). (mergeSort alphaMerge). (map start) where start ch = (ch,1)

mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a] -- merge, in order, the results of sorting the front and rear halves of the list -- mergeSort merge xs | length xs < 2 = xs | otherwise = merge (mergeSort merge first) (mergeSort merge second) where first = take half xs second = drop half xs half = (length xs ) ‘div’ 2

Now, the two different “merge”’s -- compare characters, amalgamate entries for same character -- alphamerge:: [(Char,Int)] -> [(Char,Int)] -> [(Char,Int)] alphaMerge xs [] = xs alphaMerge [] ys = ys alphaMerge ((p,n):xs) ((q,m):ys) | (p==q) = (p,n+m) : alphaMerge xs ys | (p<q) = (p,n) : alphaMerge xs ((q,m):ys) | otherwise = (q,m) : alphaMerge ((p,n):xs) ys

-- compare frequencies, sort on frequency -- freqmerge :: [(Char,Int)] -> [(Char,Int)] -> [(Char,Int)] freqMerge xs [] = xs freqMerge [] ys = ys freqMerge ((p,n):xs) ((q,m):ys) | (n<m || (n==m && p<q)) = (p,n) : freqMerge xs ((q,m):ys) | otherwise = (q,m) : freqMerge ((p,n):xs) ys

Make Code Tree Make code tree from list of frequencies. Two stages: turn each char-frequency into a tree, successively pair trees until one tree. makeTree :: [(Char,Int)] -> Tree makeTree = makeCodes. toTreeList -- turn each char-frequency into a tree -- toTreeList :: [(Char,Int)] -> [Tree] toTreeList = map (uncurry Leaf) How does this work? By treating a constructor as a function! data Tree = Leaf Char Int | Node … can be viewed as defining a function Leaf :: Char->Int -> Tree E.g. data Bool = True | False defines two functions True :: Bool, False :: Bool.

-- successively pair trees -- makeCodes :: [Tree] -> Tree makeCodes [t] = t makeCodes ts = makeCodes (amalgamate ts) -- amalgamate trees pair together first two in list insert result in list in ascending order -- amalgamate :: [Tree] -> [Tree] amalgamate (t1:t2:ts) = insTree (pair t1 t2) ts -- pair trees, combining frequency counts -- pair :: Tree -> Tree -> Tree pair t1 t2 = Node (v1+v2) t1 t2 where v1 = value t1 v2 = value t2 value :: Tree -> Int value (Leaf _ n) = n value (Node n _ _ ) = n -- insert a tree into a list sorted by frequency -- insTree :: Tree -> [Tree] -> [Tree ]

Code Table Finally, convert a Huffman tree into a table of codes. codeTable :: Tree -> Table codeTable = convert [] -- build up table of character, code pairs the code describes the path “so far” to a character -- convert :: Hcode -> Tree -> Table convert code (Leaf ch f) = [(ch,code)] convert code (Node f t1 t2) = (convert (code ++ [L]) t1) ++ (convert (code ++ [R]) t2)

E.g. codeTable (Node 6 (Node 3 (Leaf ‘b’1) (Leaf ‘a’, 2)) (Leaf ‘t’ 3)) = convert [] (Node 6 (Node 3 (Leaf ‘b’1) (Leaf ‘a’, 2)) (Leaf ‘t’ 3)) = (convert [L] (Node 3 (Leaf ‘b’1) (Leaf ‘a’, 2)) ) ++ (convert [R] ( Leaf ‘t’, 3 )) = (convert [L,L] (Leaf ‘b’1) ) ++ (convert [L,R] (Leaf ‘a’2) ) ++ (convert [R] ( Leaf ‘t’, 3 )) = [(‘b’,[L,L])] ++ [(‘a’,[L,R])] ++ [(‘t’,[R])] = [(‘b’,[L,L]), (‘a’,[L,R]), (‘t’,[R])] This hasn’t been easy! Read Book - Section 15.3 onwards (to end of Chapter)

Note: Haskell provides mechanisms for dealing with programming in the large, i.e. modularity, but these are not covered in this course.