CSCE 3110 Data Structures & Algorithm Analysis

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Huffman Coding: An Application of Binary Trees and Priority Queues
Supervised Learning Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
CSCE 3110 Data Structures & Algorithm Analysis Binary Search Trees Reading: Chap. 4 (4.3) Weiss.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea Trees Applications.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
CSCE 3110 Data Structures & Algorithm Analysis
CSCE 3110 Data Structures & Algorithm Analysis
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Madivalappagouda Patil
Tries 5/27/2018 3:08 AM Tries Tries.
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Binary search tree. Removing a node
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Binary Tree.
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
The Greedy Method and Text Compression
B+-Trees.
The Greedy Method and Text Compression
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 6 Transform and Conquer.
Huffman Coding.
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Comparing Strings – How to
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
CENG 351 Data Management and File Structures
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Analysis of Algorithms CS 477/677
Presentation transcript:

CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Trees Applications

Trees: A Review (again? ) General trees one parent, N children Binary tree ISA General tree + max 2 children Binary search tree ISA Binary tree + left subtree < parent < right subtree AVL tree ISA Binary search tree + | height left subtree – height right subtree |  1

Trees: A Review (cont’d) Multi-way search tree ISA General tree + Each node has K keys and K+1 children + All keys in child K < key K < all keys in child K+1 2-4 Tree ISA Multi-way search tree + All nodes have at most 3 keys / 4 children + All leaves are at the same level B-Tree + All nodes have at least T keys, at most 2T(+1) keys

Tree Applications Data Compression Automatic Learning Huffman tree Decision trees

Huffman code Very often used for text compression Do you know how gzip or winzip works?  Compression methods ASCII code uses codes of equal length for all letters  how many codes? Today’s alternative to ASCII? Idea behind Huffman code: use shorter length codes for letters that are more frequent

Huffman Code Build a list of letters and frequencies “have a great day today” Build a Huffman Tree bottom up, by grouping letters with smaller occurrence frequencies

Huffman Codes Write the Huffman codes for the strings “abracadabra” “Veni Vidi Vici”

Huffman Code Running time? Suppose N letters in input string, with L unique letters What is the most important factor for obtaining highest compression? Compare: [assume a text with a total of 1000 characters] I. Three different characters, each occurring the same number of times II. 20 different characters, 19 of them occurring only once, and the 20st occurring the rest of the time

One More Application Heuristic Search Decision Trees Given a set of examples, with an associated decision (e.g. good/bad, +/-, pass/fail, caseI/caseII/caseIII, etc.) Attempt to take (automatically) a decision when a new example is presented Predict the behavior in new cases!

Data Records Name A B C D E F G 1. Jeffrey B. 1 0 1 0 1 0 1 - 2. Paul S. 0 1 1 0 0 0 1 - 3. Daniel C. 0 0 1 0 0 0 0 - 4. Gregory P. 1 0 1 0 1 0 0 - 5. Michael N. 0 0 1 1 0 0 0 - 6. Corinne N. 1 1 1 0 1 0 1 + 7. Mariyam M. 0 1 0 1 0 0 1 + 8. Stephany D. 1 1 1 1 1 1 1 + 9. Mary D. 1 1 1 1 1 1 1 + 10. Jamie F. 1 1 1 0 0 1 1 +

Fields in the Record A: First name ends in a vowel? B: Neat handwriting? C: Middle name listed? D: Senior? E: Got extra-extra credit? F: Google brings up home page? G: Google brings up reference?

Build a Classification Tree Internal nodes: features Leaves: classification F 1 A D A 2,3,7 1,4,5,6 10 Error: 30% 8,9

Different Search Problem Given a set of data records with their classifications, pick a decision tree: search problem! Challenges: Scoring function? Large space of trees. What’s a good tree? Low error on given set of records Small

“Perfect” Decision Tree middle name? 1 E EEC? 1 F Google? B Neat? 1 1 Training set Error: 0% (can always do this?)

Search For a Classification Classify new records New1. Mike M. 1 0 1 1 0 0 1 ? New2. Jerry K. 0 1 0 1 0 0 0 ?

The very last tree for this class