Presentation is loading. Please wait.

Presentation is loading. Please wait.

E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment,

Similar presentations


Presentation on theme: "E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment,"— Presentation transcript:

1 E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment, fast retrieval but large space overhead  Applications:  dictionaries  text searching (patricia tries)  compression (Ziv-Lembel encoding)

2 E.G.M. PetrakisTries2 Example  MACCBOY MACAW MACARON MACARONI MACARONIC MACAQUE MACADAMIA MACADAM MACACACO MACABRE.

3 E.G.M. PetrakisTries3 1. Words in Tree @: word terminating symbol as many words as terminating symbols

4 E.G.M. PetrakisTries4 Word Searching  At most m+1 comparisons to find the word x@=a 1 a 2 ….a m+1, with root: i=1 and t(a i ): child of node a i. i=1; u = a i; while (u != @ && i <= m) { if (t(a i ): undefined) X is not in the trie; else { u = t(a i ); i = i+1; } if (u == @ && word(t) == X) X is in the trie; else X is not in the trie; }

5 E.G.M. PetrakisTries5 2. Words at the Leafs

6 E.G.M. PetrakisTries6 Example Implementation 26 symbols/node => large space overhead

7 E.G.M. PetrakisTries7 Insertions “bluejay” inserted

8 E.G.M. PetrakisTries8 3. Compact Trie compact nodes words at the leafs

9 E.G.M. PetrakisTries9 Patricia Trie  Practical Algorithm To Retrieve Information Coded In Alphanumeric  especially good for text searching  replace sub-words by a reference position in text together with its length  e.g., “jay” in “the blue jay” is (10,3)  internal nodes keep only the length and the first letter of the phrase  external nodes keep the position of the phrase in the text

10 E.G.M. PetrakisTries10 Suffix Trie  Construct a trie with all suffixes  “the blue-jay@” => 12 suffixes  Merge branches with common prefix  Leafs point to suffices  Search: “e-jay”  go to position” 3 or 7  increase by query length: 5  check suffices at positions 3+5 and 7+5  the second one matches 1. theblue-jay@ 2. heblue-jay@ 3. eblue-jay@ 4. blue-jay@ 5. lue-jay@ 6. ue-jay@ 7. e-jay@ 8. -jay@ 9. jay@ 10. ay@ 11. y@ 12. @

11 E.G.M. PetrakisTries11 “ the blue-jay ” the b l uya j - @ b- 916452111210 73 8 Suffix Trie Search: “e-jay@”

12 E.G.M. PetrakisTries12 Search Patricia  Scan query from left to right  go to first matching symbol in trie  increase by the increment  if not in the trie => search failed  if all symbols match the query => found !!!

13 E.G.M. PetrakisTries13 Patricia Pros & Cons  Good for secondary storage applications  the trie is not high  The space overhead is still a problem  Mainly useful for static environments

14 E.G.M. PetrakisTries14 Ziv-Lempel Encoding  Mainly compression method  for any kind of data  based on tries and symbol alphabets  Basic idea: compute code for repeating symbols or for sets of symbols  Globally and locally optimal  Compared with Huffman  no a-priori knowledge of symbol probabilities  Huffman is globally optimum but, not always  Locally optimal (for parts of the input)

15 E.G.M. PetrakisTries15 Encoding  Construct trie:  nodes: code of a symbol or of sequences of symbols  arc: alphabet symbol  initially a trie with all symbols  each symbol is assigned a unique code  Scan input from left to right:  read next symbol  follow appropriate branch in trie  if symbol not in trie => insert symbol in trie, assign it a new code and output its code

16 E.G.M. PetrakisTries16 initial trie root codes aczb 123 26........

17 E.G.M. PetrakisTries17 alphabet {a, b, c} initial trie: Input:

18 E.G.M. PetrakisTries18  α output 1output 2  b  α  b

19 E.G.M. PetrakisTries19  c  b output 4 output 3  a  b b c 23 α 5 b 7

20 E.G.M. PetrakisTries20 output 5  a  b  a output 8  a output 1 1c 1 b 4 α 6

21 E.G.M. PetrakisTries21 Decoding  Follows the same sequence of steps in reverse order  start from the same initial trie  scan the code from left to right  read next code symbol  go to the node having the same code  if the code is not in the trie guess the appropriate position in the trie

22 E.G.M. PetrakisTries22 a b c 123 Alphabet: {a, b, c} Code: 1 2 4 3 5 8 1

23 E.G.M. PetrakisTries23  1  2  4 output “ α” output “ b” read(1): must be coming from “a” In order to go from “1” to “2”, we must have visited “b”

24 E.G.M. PetrakisTries24 output “ αb”  3 5 1 4 b 3 c b α 2 α In order to go from “2” to “4” we must go through “ab” (passing from 1) “a” was a child of “2”

25 E.G.M. PetrakisTries25 output “ c”  5 In order to go from “4” to “3” we must go through “c” and“c” was child of “4”

26 E.G.M. PetrakisTries26 8 In order to go from “3” to “5” we must go through “ba” and “b” was child of “3” The next symbol is “8”, but … there is no “8” in the trie!!

27 E.G.M. PetrakisTries27 output “ bαb”  1 … This may happen only if after “5” we passed through the same path since “8” cannot be under “6” or “7” (there is no “6” or “7” in the code)

28 E.G.M. PetrakisTries28 output “ α”

29 E.G.M. PetrakisTries29  n output “dxd” Special case: “x” can be null Generalize


Download ppt "E.G.M. PetrakisTries1  Trees of order >= 2  Variable length keys  The decision on what path to follow is taken based on potion of the key  Static environment,"

Similar presentations


Ads by Google