Download presentation
Presentation is loading. Please wait.
Published byVernon Pearson Modified over 9 years ago
1
Suffix Trees, Suffix Arrays and Suffix Trays Richard Cole Tsvi Kopelowitz Moshe Lewenstein
2
Indexing problem Input: Text T=t 1,…,t n (preprocess to DS) Queries: Pattern P=p 1,…,p m (use DS) T= 51430
3
Suffix Property P appears at location i of T iff P is a prefix of the suffix T i T= T 14 = 51430
4
Suffix Tree A suffix tree for string S is a compressed trie of all suffixes of S. { $ b$ ab$ bab$ abab$ } a b a b $ a b $ b $ $ $ Example: s=abab$
5
Suffix Tree The size of the suffix tree of S is O(|S|). { $ b$ ab$ bab$ abab$ } Example: s=abab$ 0 1 a b a b $ a b $ b 2 $ 3 $ 4 $
6
Suffix Tree The size of the suffix tree of S is O(|S|). { $ b$ ab$ bab$ abab$ } 0 1 [2,3] 2 3 4 Example: s=abab$ [2,4] [4,4] [1,1] [2,4]
7
Indexing and Suffix Trees Navigate from root. (Use suffix property). P = ssi Time: O(|P| + occ)
8
Indexing and Suffix Trees Navigate from root. (Use suffix property). P = ssi Time: O(|P| log|Σ| + occ)
9
Suffix Trees Weiner 1973 (linear time construction!) McCreight 1975 (space efficient) Ukkonen 1995 (online) Farach 1997 (poly range alphabets)
10
Suffix Array POS 11 8 5 2 1 10 9 7 4 6 3 All suffixes S1S1 mississippi S2S2 ississippi S3S3 ssissippi S4S4 sissippi S5S5 issippi S6S6 ssippi S7S7 sippi S8S8 ippi S9S9 ppi S 10 pi S 11 i Sorted suffixes S 11 i S8S8 ippi S5S5 issippi S2S2 ississippi S1S1 mississippi S 10 pi S9S9 ppi S7S7 sippi S4S4 sissippi S6S6 ssippi S3S3 ssissippi
11
Suffix Array 11 8 5 2 1 10 9 7 4 6 3 m i s s i s s i p p i S = SA(S) = P = pi
12
Suffix Array 11 8 5 2 1 10 9 7 4 6 3 m i s s i s s i p p i S = SA(S) = P = pi
13
Suffix Array 11 8 5 2 1 10 9 7 4 6 3 m i s s i s s i p p i S = SA(S) = P = pi
14
Suffix Array 11 8 5 2 1 10 9 7 4 6 3 m i s s i s s i p p i S = SA(S) = P = pi
15
Suffix Array 11 8 5 2 1 10 9 7 4 6 3 m i s s i s s i p p i S = SA(S) = P = pi Time: O(|P|*log |S|)
16
Suffix Array Introduced: Manber and Myers (1993). Gonnet, Baeza-Yates, Snider (1992) (PAT arrays). Manber and Myers (1993): Time - O(|P| + log |S|)
17
Suffix Array Construction Manber and Myers (1993) - O(n log n). Karkkainen-Sanders (2003) - O(n) (poly range) 2 Other papers as well.
18
End of Story? No. Lots of questions. 1.Construction Time of Suffix Trees. 2.Query Time. 3.Compressed Indexing Structures. 4.Indexing with Errors. 5.Real-Time S.T. construction.
19
Query Time for Large Alphabets Suffix Trees: O(|P|*log|Σ|) (deterministic) Suffix Arrays: O(|P| + log |T|) Suffix Trays: O(|P|+log|Σ|) for alphabets {1,…,|Σ|}
20
Query Time for Large Alphabets Actually it is easy to answer queries in O(|P|) time. Create at every node of suffix tree - |∑| length array. Then navigation at every node is O(1). However, time and space of suffix tree construction = O(n|∑| )
21
Query Time for Large Alphabets Suffix Trees: O(|P|*log|Σ|) (deterministic) Suffix Arrays: O(|P| + log |S|) Suffix Trays: O(|P|+log|Σ|) for alphabets {1,…,|Σ|}
22
Suffix Tree – Suffix Array connection The ordering of the suffixes (leaves) in suffix tree is exactly the suffix array
23
Suffix Array POS 8 5 2 11 1 9 10 6 3 7 4 12 All suffixes S1S1 mississippi$ S2S2 ississippi$ S3S3 ssissippi$ S4S4 sissippi$ S5S5 issippi$ S6S6 ssippi$ S7S7 sippi$ S8S8 ippi$ S9S9 ppi$ S 10 pi$ S 11 i$ S 12 $ sorted suffixes S8S8 ippi$ S5S5 issippi$ S2S2 ississippi$ S 11 i$ S1S1 mississippi$ S9S9 ppi$ S 10 pi$ S6S6 ssippi$ S3S3 ssissippi$ S7S7 sippi$ S4S4 sissippi$ S 12 $
24
Example: Mississippi$ 8 5 2 11 1 9 10 6 3 7 4 12 SA(mississippi) =
25
Suffix Tree – Suffix Array connection We utilize this connection as follows: Every node in the suffix tree corresponds to an interval in suffix array.
26
Example: Mississippi$ 8 5 2 11 1 9 10 6 3 7 4 12 SA(mississippi) =
27
Suffix Tree – Suffix Array connection Moreover, Time to search in suffix array on interval I is: O(|P| + log |I|).
28
Suffix Tree – Suffix Array connection DFN: a |Σ|-leaf is a node that (1) has at least |Σ| leaves in its subtree (2) all its children do not. Number of leaves in subtree of |Σ|-leaf is O(|Σ| 2 ). Why? At most |Σ| children – each with less than |Σ| leaves in subtree.
29
Suffix Tree – Suffix Array connection Number of leaves in subtree of |Σ|-leaf is O(|Σ| 2 ). Time to search in suffix array for |Σ|-leaf is: O(|P| + log |Σ|).
30
Example: Mississippi$ 8 5 2 11 1 9 10 6 3 7 4 12 SA(mississippi) =
31
Suffix Tray Idea Outline: Navigate in suffix tree till a |Σ|-leaf is hit and then move to suffix array (time in SA - O(|P| + log |Σ|)) Problem: Navigation in suffix tree O(|P| log |Σ|) time. We promised O(|P| + log |Σ|).
32
Suffix Tray Recall idea: Create at every node of suffix tree - |∑| length array. Then navigation at every node is O(1). Too expensive overall: O(n|∑| ) But OK for O(n/|Σ|) nodes.
33
Suffix Tray Idea: Truncate suffix trees at |Σ|-leaves for Σ-tree Would be nice: size of Σ-tree = O(n/|Σ|) However, this is not the case. a $ $ $ $ $a a a a $ < | Σ| leaves | Σ|-leaf - the rest
34
< | Σ| leaves | Σ|-leaf - the rest $ a $ $ $ $ $ab ab $ $ab $ $ baba S=ababababa$
35
Suffix Tray Alternative Idea: Extend def. of Σ-tree by removing all nodes with fewer than |Σ| leaves in its subtree. Nodes in Σ-tree: 1.Σ-leaf 2.Branching-Σ-node: node with at least 2 children 3.Others – nodes with only one child.
36
Suffix Tray - Example $a $ $ $ $$ab ab $ $ab $ $ baba < | Σ| leaves | Σ|-leaf - others - branching |Σ|- node
37
Suffix Tray Observation: # of Σ-leafs = O(n/|Σ|) Hence, # of branching-Σ-nodes = O(n/|Σ|) So, we can save Σ-tables for navigation at each.
38
Suffix Tray – What is Left? $a $ $ $ $$ab ab $ $ab $ $ baba < | Σ| leaves | Σ|-leaf - others - branching |Σ|- node
39
Suffix Tray Nodes in Σ-tree with only one child. a b b c d e 8 5 2 11 1 9 10 6 3 7 4 12 Interval less than |Σ| 2
40
Suffix Tray Size of suffix Tray: O(n) Navigation: 1.Σ-leaf - jump to suffix array 2.Branching-Σ-node: look at Σ-array 3.Others – look at one character to Σ-tree child. Time: O(|P| + log|Σ|)
41
End of Story? No. Lots of questions. 1.Construction Time of Suffix Trees. 2.Query Time. 3.Compressed Indexing Structures. 4.Indexing with Errors. 5.Real-Time S.T. construction.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.