E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment,

Slides:



Advertisements
Similar presentations
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Trie/Suffix Trie/Suffix Tree. Trie A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
296.3: Algorithms in the Real World
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Modern Information Retrieval Chapter 8 Indexing and Searching.
Compression & Huffman Codes
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
Data Compression Basics & Huffman Coding
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Data Compression1 File Compression Huffman Tries ABRACADABRA
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Data Structures Week 6: Assignment #2 Problem
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Contents What is a trie? When to use tries
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Compression & Huffman Codes
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
IP Routers – internal view
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Patricia Practical Algorithm To Retrieve Information Coded In Alphanumeric. Compressed binary trie. All nodes are of the same data type (binary tries use.
Chapter 9: Huffman Codes
Binary Tries (continued)
Chapter 11 Data Compression
Trees Addenda.
Data Structure and Algorithms
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment, fast retrieval but large space overhead  Applications:  dictionaries  text searching (patricia tries)  compression (Ziv-Lembel encoding)

E.G.M. PetrakisTries2 Example  MACCBOY MACAW MACARON MACARONI MACARONIC MACAQUE MACADAMIA MACADAM MACACACO MACABRE.

E.G.M. PetrakisTries3 1. Words in word terminating symbol as many words as terminating symbols

E.G.M. PetrakisTries4 Word Searching  At most m+1 comparisons to find the word 1 a 2 ….a m+1, with root: i=1 and t(a i ): child of node a i. i=1; u = a i; while (u && i <= m) { if (t(a i ): undefined) X is not in the trie; else { u = t(a i ); i = i+1; } if (u && word(t) == X) X is in the trie; else X is not in the trie; }

E.G.M. PetrakisTries5 2. Words at the Leafs

E.G.M. PetrakisTries6 Example Implementation 26 symbols/node => large space overhead

E.G.M. PetrakisTries7 Insertions “bluejay” inserted

E.G.M. PetrakisTries8 3. Compact Trie compact nodes words at the leafs

E.G.M. PetrakisTries9 Patricia Trie  Practical Algorithm To Retrieve Information Coded In Alphanumeric  especially good for text searching  replace sub-words by a reference position in text together with its length  e.g., “jay” in “the blue jay” is (10,3)  internal nodes keep only the length and the first letter of the phrase  external nodes keep the position of the phrase in the text

E.G.M. PetrakisTries10 Suffix Trie  Construct a trie with all suffixes  “the => 12 suffixes  Merge branches with common prefix  Leafs point to suffices  Search: “e-jay”  go to position” 3 or 7  increase by query length: 5  check suffices at positions 3+5 and 7+5  the second one matches

E.G.M. PetrakisTries11 “ the blue-jay ” the b l uya j b Suffix Trie Search:

E.G.M. PetrakisTries12 Search Patricia  Scan query from left to right  go to first matching symbol in trie  increase by the increment  if not in the trie => search failed  if all symbols match the query => found !!!

E.G.M. PetrakisTries13 Patricia Pros & Cons  Good for secondary storage applications  the trie is not high  The space overhead is still a problem  Mainly useful for static environments

E.G.M. PetrakisTries14 Ziv-Lempel Encoding  Mainly compression method  for any kind of data  based on tries and symbol alphabets  Basic idea: compute code for repeating symbols or for sets of symbols  Globally and locally optimal  Compared with Huffman  no a-priori knowledge of symbol probabilities  Huffman is globally optimum but, not always  Locally optimal (for parts of the input)

E.G.M. PetrakisTries15 Encoding  Construct trie:  nodes: code of a symbol or of sequences of symbols  arc: alphabet symbol  initially a trie with all symbols  each symbol is assigned a unique code  Scan input from left to right:  read next symbol  follow appropriate branch in trie  if symbol not in trie => insert symbol in trie, assign it a new code and output its code

E.G.M. PetrakisTries16 initial trie root codes aczb

E.G.M. PetrakisTries17 alphabet {a, b, c} initial trie: Input:

E.G.M. PetrakisTries18  α output 1output 2  b  α  b

E.G.M. PetrakisTries19  c  b output 4 output 3  a  b b c 23 α 5 b 7

E.G.M. PetrakisTries20 output 5  a  b  a output 8  a output 1 1c 1 b 4 α 6

E.G.M. PetrakisTries21 Decoding  Follows the same sequence of steps in reverse order  start from the same initial trie  scan the code from left to right  read next code symbol  go to the node having the same code  if the code is not in the trie guess the appropriate position in the trie

E.G.M. PetrakisTries22 a b c 123 Alphabet: {a, b, c} Code:

E.G.M. PetrakisTries23  1  2  4 output “ α” output “ b” read(1): must be coming from “a” In order to go from “1” to “2”, we must have visited “b”

E.G.M. PetrakisTries24 output “ αb”  b 3 c b α 2 α In order to go from “2” to “4” we must go through “ab” (passing from 1) “a” was a child of “2”

E.G.M. PetrakisTries25 output “ c”  5 In order to go from “4” to “3” we must go through “c” and“c” was child of “4”

E.G.M. PetrakisTries26 8 In order to go from “3” to “5” we must go through “ba” and “b” was child of “3” The next symbol is “8”, but … there is no “8” in the trie!!

E.G.M. PetrakisTries27 output “ bαb”  1 … This may happen only if after “5” we passed through the same path since “8” cannot be under “6” or “7” (there is no “6” or “7” in the code)

E.G.M. PetrakisTries28 output “ α”

E.G.M. PetrakisTries29  n output “dxd” Special case: “x” can be null Generalize