Trie/Suffix Trie/Suffix Tree
Trie A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used to store large dictionaries of English (say) words in spelling-checking programs and in natural- language "understanding" programs. Given the data: –an, ant, all, allot, alloy, aloe, are, ate, be
Tire (Cont.) The idea is that all strings sharing a common stem or prefix hang off a common node. When the strings are words over {a..z}, a node has at most 27 children - one for each letter plus a terminator. The elements in a string can be recovered in a scan from the root to the leaf that ends a string. All strings in the trie can be recovered by a depth-first scan of the tree.
Suffix Trie The idea behind suffix trie is to assign to each symbol in a text an index corresponding to its position in the text (i.e., first symbol has index 1, last symbol has index n = # of symbols in the text).
Suffix Trie (Cont.) A suffix trie is an ordinary trie in which the input strings are all possible suffixes. A suffix of a text [t1... tn] is a substring [ti... tn] where i is an integer between 1 and n.
Suffix Trie (Cont.) To demonstrate the structure of the resulting tree we will build the suffix trie corresponding to the following text: TEXT: G O O G O L $ POSITION:
Suffix Trie (Cont.)
Suffix Tree The suffix tree is created by compacting every unary node in the suffix trie.
Suffix Tree (Cont.)