Optimal Merging Of Runs

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
A Data Compression Algorithm: Huffman Compression
CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Data Coding Run Length Coding
Compression & Huffman Codes
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Tries 5/27/2018 3:08 AM Tries Tries.
Chapter 5 : Trees.
Greedy Technique.
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
13 Text Processing Hongfei Yan June 1, 2016.
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Huffman Coding.
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Optimal Merging Of Runs 4 3 6 9 Cost = 44 Cost = 42 7 15 22 13 Best merge sequence? Level scheme described earlier is optimal only when we start with equal-size runs. New scheme runs into problems with input runs being on different disks. May need to move some runs from one disk to another before run merging phase.

Weighted External Path Length WEPL(T) = S(weight of external node i) * (distance of node i from root of T) 4 3 6 9 7 15 22 WEPL(T) = 4 * 2 + 3*2 + 6*2 + 9*2 = 44 = Merge Cost

Weighted External Path Length WEPL(T) = S(weight of external node i) * (distance of node i from root of T) 4 3 6 9 7 13 22 WEPL(T) = 4 * 3 + 3*3 + 6*2 + 9*1 = 42 = Merge Cost Find binary tree with minimum WEPL.

Other Applications Message coding and decoding. Lossless data compression.

Message Coding & Decoding Messages M0, M1, M2, …, Mn-1 are to be transmitted. The messages do not change. Both sender and receiver know the messages. So, it is adequate to transmit a code that identifies the message (e.g., message index). Mi is sent with frequency fi. Select message codes so as to minimize transmission and decoding times. Messages could be text messages as in wires.

Example n = 4 messages. The frequencies are [2, 4, 8, 100]. Use 2-bit codes [00, 01, 10, 11]. Transmission cost = 2*2 + 4*2 + 8*2 + 100*2 = 228. Decoding is done using a binary tree.

Example Decoding cost = 2*2 + 4*2 + 8*2 + 100*2 = 228 1 M3 M0 M1 M2 Decoding cost = 2*2 + 4*2 + 8*2 + 100*2 = 228 = transmission cost = WEPL

Example Every binary tree with n external nodes defines a code set for n messages. 2 4 8 100 1 M3 M0 M1 M2 Decoding cost = 2*3 + 4*3 + 8*2 + 100*1 = 134 = transmission cost = WEPL

Another Example No code is a prefix of another! M0 M1 M2 M3 M4 M5 M8 1 M0 M1 M2 M3 M4 M5 M8 M9 M6 M7 No code is a prefix of another!

Lossless Data Compression Alphabet = {a, b, c, d}. String with 10 as, 5 bs, 100 cs, and 900 ds. Use a 2-bit code. a = 00, b = 01, c = 10, d = 11. Size of string = 10*2 + 5*2 + 100*2 + 900*2 = 2030 bits. Plus size of code table.

Lossless Data Compression Use a variable length code that satisfies prefix property (no code is a prefix of another). a = 000, b = 001, c = 01, d = 1. Size of string = 10*3 + 5*3 + 100*2 + 900*1 = 1145 bits. Plus size of code table. Compression ratio is approx. 2030/1145 = 1.8.

Lossless Data Compression 1 d 1 c 1 a b Size of coded/compressed string size is WEPL. Decode 0001100101… addbc… Compression ratio is maximized when the decode tree has minimum WEPL.

Huffman Trees Trees that have minimum WEPL. Binary trees with minimum WEPL may be constructed using a greedy algorithm. For higher order trees with minimum WEPL, a preprocessing step followed by the greedy algorithm may be used. Huffman codes: codes defined by minimum WEPL trees.

Greedy Algorithm For Binary Trees Start with a collection of external nodes, each with one of the given weights. Each external node defines a different tree. Reduce number of trees by 1. Select 2 trees with minimum weight. Combine them by making them children of a new root node. The weight of the new tree is the sum of the weights of the individual trees. Add new tree to tree collection. Repeat reduce step until only 1 tree remains.

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]. 9 2 5 4 7

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]. 5 5 7 7 9 9 6 2 4

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]. 7 9 11 5 6 2 4

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]. 11 16 5 6 7 9 2 4

Example n = 5, w[0:4] = [2, 5, 4, 7, 9]. 27 16 11 5 6 7 9 2 4

Data Structure For Tree Collection Operations are: Initialize with n trees. Remove 2 trees with least weight. Insert new tree. Use a min heap. Initialize … O(n). 2(n – 1) remove min operations … O(n log n). n – 1 insert operations … O(n log n). Total time is O(n log n). Or, (n – 1) remove mins and (n – 1) change mins.

Higher Order Trees Greedy scheme doesn’t work! 3-way tree with weights [3, 6, 1, 9]. 3 6 1 4 9 19 Optimal Tree Cost = 23 9 19 3 6 1 10 Greedy Tree Cost = 29

Cause Of Failure One node is not a 3-way node. 6 1 10 9 19 Greedy Tree Cost = 29 One node is not a 3-way node. A 2-way node is like a 3-way node, one of whose children has a weight of 0. Must start with enough runs/weights of length 0 so that all nodes are 3-way nodes.

How Many Length 0 Runs To Add? k-way tree, k > 1. Initial number of runs is r. Add least q >= 0 runs of length 0. Each k-way merge reduces the number of runs by k – 1. Number of runs after s k-way merges is r + q – s(k – 1) For some positive integer s, the number of remaining runs must become 1.

How Many Length 0 Runs To Add? So, we want r + q – s(k–1) = 1 for some positive integer s. So, r + q – 1 = s(k – 1). Or, (r + q – 1) mod (k – 1) = 0. Or, r + q – 1 is divisible by k – 1. This implies that q < k – 1. (r – 1) mod (k – 1) = 0 => q = 0. (r – 1) mod (k – 1) != 0 => q = k –1 – (r – 1) mod (k – 1). Or, q = (1 – r) mod (k – 1). May simply use next to last formula if you wish. Last formula RHS evaluates first to a –ve number but must add k-1 to get into the permissible 0 to k-1 range.

Examples k = 2. q = (1 – r) mod (k – 1) = (1 – r) mod 1 = 0. So, no runs of length 0 are to be added. k = 4, r = 6. q = (1 – r) mod (k – 1) = (1 – 6) mod 3 = (–5)mod 3 = (6 – 5) mod 3 = 1. So, must start with 7 runs, and then apply greedy method. (r-1)mod (k-1) = 0 when k = 2, all r.