Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Functional Programming. Pure Functional Programming Computation is largely performed by applying functions to values. The value of an expression depends.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
מבוא מורחב - שיעור 10 1 Symbols Manipulating lists and trees of symbols: symbolic differentiation Lecture 10.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 26 Binary Search Trees.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
מבוא מורחב - שיעור 81 Lecture 8 Lists and list operations (continue).
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Trees (Revisited) CHAPTER 15 6/30/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights.
1 You’re Invited! Course VI Freshman Open House! Friday, April 7, :30-5:00 PM FREE Course VI T-Shirts (while supplies last) and Department.
Spring 2004Programming Development Techniques 1 Topic 11 Sets and their Representation April 2004.
Data Abstraction: Sets Binary Search Trees CMSC Introduction to Computer Programming October 30, 2002.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 19 Binary Search Trees.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman encoding.
מבוא מורחב 1 Representing lists as binary trees
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
CS 152: Programming Language Paradigms February 12 Class Meeting Department of Computer Science San Jose State University Spring 2014 Instructor: Ron Mak.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Data Abstraction: Sets
HUFFMAN CODES.
B/B+ Trees 4.7.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Chapter 25 Binary Search Trees
Top 50 Data Structures Interview Questions
Chapter 5 : Trees.
The Greedy Method and Text Compression
Lecture 14 - Environment Model (cont.) - Mutation - Stacks and Queues
The Greedy Method and Text Compression
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Binary Search Trees.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Lecture # , , , , מבוא מורחב.
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4) Lecture 11 Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4) מבוא מורחב - שיעור 11

The Abstract Data Type Set A set is a collection of distinct items. The following are allowable set operations. (empty-set? set) (make-empty-set) (element-of-set? x set) (adjoin-set x set) (union-set s1 s2) (intersection-set s1 s2) Contract: the operations have the usual meaning of set operations. Set  Boolean  Set Item × Set  Boolean Item × Set  Set Set × Set  Set מבוא מורחב - שיעור 11

Implementing Sets Must decide on a Representation How a set is represented. Then must write an implementation Write the code of the operations. Each operation can be written separately We will take a look at several alternative representations/implementations מבוא מורחב - שיעור 11

Version 1: Unordered List Representation: a set is represented as a list. No duplicates are allowed. Implementation: (define (empty-set? set)(null? set)) (define (make-empty-set)'()) מבוא מורחב - שיעור 11

Version 1: element-of-set? (define (element-of-set? x set) (cond ((null? set) #f) ((equal? x (car set)) #t) (else (element-of-set? x (cdr set))))) equal? : Like eq? for symbols. Compares the contents of lists and trees. Can be applied for numbers and strings. (eq? (list 'a 'b) (list 'a 'b)) (equal? (list 'a 'b) (list 'a 'b))  #f  #t מבוא מורחב - שיעור 11

Version 1: Adjoin-set (define (adjoin-set x set) (if (element-of-set? x set) set (cons x set))) מבוא מורחב - שיעור 11

Version 1: Intersection (define (intersection-set set1 set2) (cond ((or (null? set1) (null? set2)) '()) ((element-of-set? (car set1) set2) (cons (car set1) (intersection-set (cdr set1) set2))) (else (intersection-set (cdr set1) set2)))) Or is this better? (define (intersection-set set1 set2) (cond ((or (null? set1) (null? set2)) '()) ((element-of-set? (car set1) set2) (adjoin-set (car set1) (intersection-set (cdr set1) set2))) (else (intersection-set (cdr set1) set2)))) מבוא מורחב - שיעור 11

Version 1: Union Is the alternative better this time? (define (union-set set1 set2) (cond ((null? set1) set2)) ((not (element-of-set? (car set1) set2)) (cons (car set1) (union-set (cdr set1) set2))) (else (union-set (cdr set1) set2)))) Is the alternative better this time? (define (union-set set1 set2) (cond ((null? set1) set2)) (else (adjoin-set (car set1) (union-set (cdr set1) set2))))) מבוא מורחב - שיעור 11

Version 1: Time Complexity Suppose a set contains n elements Element-of-set Adjoin-set Intersection-set Union-set (n) (n) (n2) (n2) מבוא מורחב - שיעור 11

Version 2: Ordered List Representation: a set (of numbers) is represented as an ordered list without duplicates. (define (element-of-set? x set) (cond ((null? set) #f) ((= x (car set)) #t) ((< x (car set)) #f) (else (element-of-set? x (cdr set))))) Time complexity: (n) You will implement adjoin-set yourself. empty-set? and make-empty-set are the same. מבוא מורחב - שיעור 11

Version 2: Intersection intersection-set from version 1 will work here. (define (intersection-set set1 set2) (cond ((or (null? set1) (null? set2)) '()) ((element-of-set? (car set1) set2) (cons (car set1) (intersection-set (cdr set1) set2))) (else (intersection-set (cdr set1) set2)))) But, its complexity is (n2). Can we do it better ? מבוא מורחב - שיעור 11

Version 2: Better Intersection (define (intersection-set set1 set2) (if (or (null? set1) (null? set2)) '() (let ((x1 (car set1)) (x2 (car set2))) (cond ((= x1 x2) (cons x1 (intersection-set (cdr set1) (cdr set2)))) ((< x1 x2) (intersection-set (cdr set1) set2)) (else (intersection-set set1 (cdr set2))))))) מבוא מורחב - שיעור 11

Version 2: Intersection Example set1 set2 intersection (1 3 7 9) (1 4 6 7) (1 (3 7 9) (4 6 7) (1 (7 9) (4 6 7) (1 (7 9) (6 7) (1 (7 9) (7) (1 (9) () (1 7) Time and space  (n) Union -- similar מבוא מורחב - שיעור 11

Complexity unordered ordered empty-set? make-empty-set element-of-set adjoin-set intersection-set union-set (1) (1) (1) (1) (n) (n) (n) (n) (n2) (n) (n2) (n) מבוא מורחב - שיעור 11

Version 3: Binary Trees Binary Search: Lion in the desert מבוא מורחב - שיעור 11

Version 3: Binary Trees Store the elements in the nodes of a binary tree. The values in the left subtree of a node v are all smaller than the value stored at v. The values in the right subtree of a node v are all larger than the value stored at v. 7 3 1 5 9 12 A possible representation of the set {1,3,5,7,9,12} : מבוא מורחב - שיעור 11

Version 3: Binary Trees A set has many representations: 7 3 1 5 9 12 3 Height= (log n) Balanced Tree Unbalanced Tree מבוא מורחב - שיעור 11

Version 3: Representation (define (make-tree entry left right) (list entry left right)) (define (entry tree) (car tree)) (define (left-branch tree) (cadr tree)) (define (right-branch tree) (caddr tree)) 7 3 1 5 9 12 מבוא מורחב - שיעור 11

Version 3: Element-of-set (define (element-of-set? x set) (cond ((null? set) #f) ((= x (entry set)) true) ((< x (entry set)) (element-of-set? x (left-branch set))) (else (element-of-set? x (right-branch set))))) Complexity: (h), where h is the height of the tree. If tree is balanced, then h  log(n) In the worst case, h  n מבוא מורחב - שיעור 11

Version 3: Adjoin-set (define (adjoin-set x set)   (cond ((null? set) (make-tree x '() '()))         ((= x (entry set)) set)         ((< x (entry set))          (make-tree (entry set)                      (adjoin-set x (left-branch set))                     (right-branch set)))         (else          (make-tree (entry set)                     (left-branch set)                     (adjoin-set x (right-branch set)))))) Complexity: (h), where h is the height of the tree. מבוא מורחב - שיעור 11

If a tree is roughly balanced, then h  log(n). Complexity We omit the trivial operations. unordered ordered trees Element-of-set Adjoin-set Intersection-set Union-set (n) (n) (h) (n) (n) (h) (n2) (n) (n) (n2) (n) (n) If a tree is roughly balanced, then h  log(n). Main challenge: Keep the trees roughly balanced. (More on this next term in “Data Structures”.) מבוא מורחב - שיעור 11

Random trees are fairly balanced (define (rand-tree n range) (if (= n 0) '() (adjoin-set (random range) (rand-tree (- n 1) range)))) (define (height tree) (if (null? tree) 0 (+ 1 (max (height (left-branch tree)) (height (right-branch tree)))))) (height (rand-tree 1000 1000000))  (height (rand-tree 10000 1000000))  ~ 22 ~ 31 average over several runs. height of a balanced tree: ~ 10 , ~13 resp. מבוא מורחב - שיעור 11

Huffman encoding trees מבוא מורחב - שיעור 11

Data Transmission “sos” Bob Alice We wish to send information efficiently from Alice to Bob Morse code not necessarily the most efficient you could think of מבוא מורחב - שיעור 11

Fixed Length Codes Represent data as a sequence of 0’s and 1’s Sequence: BACADAEAFABBAAAGAH A fixed length code (ASCII): A 000 B 001 C 010 D 011 E 100 F 101 G 110 H 111 Encoding of sequence: 001000010000011000100000101000001001000000000110000111 The Encoding is 18x3=54 bits long. Can we make the encoding shorter? מבוא מורחב - שיעור 11

Variable Length Code Make use of frequencies. Frequency of A=8, B=3, others 1. A 0 B 100 C 1010 D 1011 E 1100 F 1101 G 1110 H 1111 Example: BACADAEAFABBAAAGAH 100010100101101100011010100100000111001111 42 bits (20% shorter) Morse code is a variable length code. We can encrypt more efficiently if we give frequent symbols E is the most frequent word and so is represented by a single dot. Top decode, we can use a special separator code as in Morse code. But how do we decode? מבוא מורחב - שיעור 11

Prefix code  Binary tree Prefix code: No codeword is a prefix of any other codeword A 0 B 100 C 1010 D 1011 E 1100 F 1101 G 1110 H 1111 A B C D E F G H 1 מבוא מורחב - שיעור 11

Decoding Example 10001010 10001010 B 10001010 BA 10001010 BAC 1 A B C F G H 1 10001010 10001010 B 10001010 BA 10001010 BAC מבוא מורחב - שיעור 11

Abstract representation of code trees Constructors: make-leaf - Construct a leaf make-code-tree - Construct a code tree Predicates: leaf? - Is leaf? Selectors: left-branch - Select left branch right-branch - Select right branch symbol-leaf - the symbol attched to leaf מבוא מורחב - שיעור 11

Decoding a Message (define (decode bits tree) (define (decode-one bits current-branch) (if (null? bits) '() (let ((next-branch (choose-branch (car bits) current-branch))) (if (leaf? next-branch) (cons (symbol-leaf next-branch) (decode-one (cdr bits) tree)) (decode-one (cdr bits) next-branch))))) (decode-one bits tree)) (define (choose-branch bit branch) (cond ((= bit 0) (left-branch branch)) ((= bit 1) (right-branch branch)) (else (error "bad bit -- CHOOSE-BRANCH" bit)))) מבוא מורחב - שיעור 11

Huffman Tree = Optimal Length Code B C D E F G H 1 8 3 Before we can describe the algorithm for generating optimal codes, lets talk a little bit about our representation. Optimal: no code has better weighted average length מבוא מורחב - שיעור 11

Representation 17 A 9 5 4 B 2 2 2 C D E F G H {A,B,C,D,E,F,G,H} 8 3 {C,D} {E,F} {G,H} C D E F G H 1 1 1 1 1 1 מבוא מורחב - שיעור 11

Representation (Cont.) (define (make-leaf symbol weight) (list 'leaf symbol weight)) (define (leaf? object) (eq? (car object) 'leaf)) (define (symbol-leaf x) (cadr x)) (define (weight-leaf x) (caddr x)) מבוא מורחב - שיעור 11

Representation (Cont.) (define (make-code-tree left right) (list left right (append (symbols left) (symbols right)) (+ (weight left) (weight right)))) (define (left-branch tree) (car tree)) (define (right-branch tree) (cadr tree)) מבוא מורחב - שיעור 11

Representation (Cont.) (define (symbols tree) (if (leaf? tree) (list (symbol-leaf tree)) (caddr tree))) (define (weight tree) (weight-leaf tree) (cadddr tree))) מבוא מורחב - שיעור 11

Huffman’s Algorithm Build tree bottom-up, so that lowest weight leaves are farthest from the root. Repeatedly: Find two trees of lowest weight. merge them to form a new tree whose weight is the sum of their weights. מבוא מורחב - שיעור 11

Construction of Huffman tree 17 {A,B,C,D,E,F,G,H} 9 {B,C,D,E,F,G,H} 5 {B,C,D} 4 {E,F,G,H} 2 {C,D} 2 {E,F} 2 {G,H} A B C D E F G H 8 3 1 מבוא מורחב - שיעור 11

Construction of Huffman Tree Initial leaves {(A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1)} Merge {(A 8) (B 3) ({C D} 2) (E 1) (F 1) (G 1) (H 1)} Merge {(A 8) (B 3) ({C D} 2) ({E F} 2) (G 1) (H 1)} Merge {(A 8) (B 3) ({C D} 2) ({E F} 2) ({G H} 2)} Merge {(A 8) (B 3) ({C D} 2) ({E F G H} 4)} Merge {(A 8) ({B C D} 5) ({E F G H} 4)} Merge {(A 8) ({B C D E F G H} 9)} Final merge {({A B C D E F G H} 17)} מבוא מורחב - שיעור 11

Construction of Huffman tree (generate-huffman-tree '((A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (H 1) (G 1)) ((leaf a 8) ((((leaf g 1) (leaf h 1) (g h) 2) ((leaf f 1) (leaf e 1) (f e) 2) (g h f e) 4) (((leaf d 1) (leaf c 1) (d c) 2) (leaf b 3) (d c b) 5) (g h f e d c b) 9) (a g h f e d c b) 17) left-branch right-branch symbols weight מבוא מורחב - שיעור 11

Construction of Huffman tree (define (generate-huffman-tree pairs) (successive-merge (make-leaf-set pairs))) Sort pairs (define (make-leaf-set pairs) (if (null? pairs) '() (let ((pair (car pairs))) (adjoin-set (make-leaf (car pair) (cadr pair)) (make-leaf-set (cdr pairs)))))) Ordered insert Set of leaves, represented as lists ordered by weight. מבוא מורחב - שיעור 11

Construction of Huffman tree Need a variation of the ordered adjoin-set (since was written only for sets of numbers) (define (adjoin-set x set) (cond ((null? set) (list x)) ((< (weight x) (weight (car set)))(cons x set)) (else (cons (car set) (adjoin-set x (cdr set)))))) (define (successive-merge trees) (if (null? (cdr trees)) (car trees) (let ((smallest (car trees)) (2smallest (cadr trees)) (rest (cddr trees))) (successive-merge (adjoin-set (make-code-tree smallest 2smallest) rest))))) מבוא מורחב - שיעור 11

Summary We saw today that even a seemingly simple abstract concept like a set, when implemented on a computer, can give rise to many implementations, each with a different complexity. מבוא מורחב - שיעור 11