McCreight's suffix tree construction algorithm

Slides:



Advertisements
Similar presentations
On-line Construction of Suffix Trees Chairman : Prof. R.C.T. Lee Speaker : C. S. Wu ( ) June 10, 2004 Dept. of CSIE National Chi Nan University.
Advertisements

AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
MUMmer 游騰楷杜海倫 王慧芬曾俊雄 2007/01/02. Outlines Suffix Tree MUMmer 1.0 MUMmer 2.1 MUMmer 3.0 Conclusion.
Advanced Data structure
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Suffix Trees Construction and Applications João Carreira 2008.
Suffix Tree. Suffix Tree Representation S=xabxac Represent every edge using its start and end text location.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Linear Time Construction of Suffix Tree.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
1 Suffix Trees Charles Yan Suffix Trees: Motivations Substring problem: One is given a text T of length m. After O (m) preprocessing time, one.
Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Linear Time Construction of Suffix Tree.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Discussion #36 Spanning Trees
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
Suffix trees and suffix arrays presentation by Haim Kaplan.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
Data Structures – LECTURE 10 Huffman coding
Aho-Corasick Algorithm Generalizes KMP to handle sets of strings New ideas –keyword trees –failure functions/links –output links.
Building Suffix Trees in O(m) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Fibonacci Heaps These lecture slides are adapted from CLRS, Chapter 20.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
Assignment 4. (Due on Dec 2. 2:30 p.m.) This time, Prof. Yao and I can explain the questions, but we will NOT tell you how to solve the problems. Question.
Randomized Algorithms - Treaps
A Space-Economical Suffix Tree Construction Algorithm Edward M. McCreight (1976) { From Ukkonen to McCreight and Weiner: A Unifying } View of Linear-Time.
1. 2 Overview  Suffix tries  On-line construction of suffix tries in quadratic time  Suffix trees  On-line construction of suffix trees in linear.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
String Matching with k Mismatches Moshe Lewenstein Bar Ilan University Modified by Ariel Rosenfeld.
Improved string matching with k mismatches (The Kangaroo Method) Galil, R. Giancarlo SIGACT News, Vol. 17, No. 4, 1986, pp. 52–54 Original: Moshe Lewenstein.
On the Sorting-Complexity of Suffix Tree Construction MARTIN FARACH-COLTON PAOLO FERRAGINA S. MUTHUKRISHNAN Requires Math fonts downloadable from herehere.
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
Joint Advanced Student School Compressed Suffix Arrays Compression of Suffix Arrays to linear size Fabian Pache.
1 Fat heaps (K & Tarjan 96). 2 Goal Want to achieve the performance of Fibonnaci heaps but on the worst case. Why ? Theoretical curiosity and some applications.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Nov String algorithms, Q Ukkonen’s suffix tree algorithm ● Recall McCreight’s approach: – For i = 1.. n+1, build compressed trie of {x[j..n]$
Nov String Algorithms, Q Algorithmic programming...and testing your suffix trees...
15-853:Algorithms in the Real World
Succinct Data Structures
Andrzej Ehrenfeucht, University of Colorado, Boulder
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Ukkonen's suffix tree construction algorithm
Binary Trees, Binary Search Trees
Lecture 9 Algorithm Analysis
Algorithms (2IL15) – Lecture 2
Heaps and the Heapsort Heaps and priority queues
Algorithms (2IL15) – Lecture 5 SINGLE-SOURCE SHORTEST PATHS
Lecture 9 Algorithm Analysis
3.4 Push-Relabel(Preflow-Push) Maximum Flow Alg.
Basic Graph Algorithms
Lecture 9 Algorithm Analysis
Suffix trees.
Suffix trees and suffix arrays
Binary Trees, Binary Search Trees
Trees.
Graphs and Algorithms (2MMD30)
Suffix Trees String … any sequence of characters.
Lower bound for sorting, radix sort
String Matching with k Mismatches
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Fibonacci Heaps.
Disjoint Sets Data Structure (Chap. 21)
Huffman Coding Greedy Algorithm
Chapter 12&13: Binary Search Trees (BSTs)
Lecture 21 Amortized Analysis
Binary Trees, Binary Search Trees
Presentation transcript:

McCreight's suffix tree construction algorithm  3 b aab$ 2 $ba $baa 1 4 $ 5 a 6 String Algorithms; 12 Nov 2007

Motivation Recall: the suffix tree is an extremely useful data structure with space usage and construction time in O(n). Today we see the first algorithm for constructing a suffix tree in time O(n).

Suffix trees x=abaab 1: abaab$ 2: baab$ 3: aab$ 4: ab$ 5: b$ 6: $ A suffix tree of a sequence, x, is a compressed trie of all suffixes of the sequence x$. x=abaab 1: abaab$ 2: baab$ 3: aab$ 4: ab$ 5: b$ 6: $  3 b aab$ 2 $ba $baa 1 4 $ 5 a 6

McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, Ti, where ... is a trie of sequences x$[1..n+1], x$[2..n+1], ..., x$[i..n+1]    a b a b a b b b aab$ $baa 2 aab$ $baa 2 aab$ $b 3 1 1 1 T1 T2 T3

McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, Ti, where ... is a trie of sequences x$[1..n+1], x$[2..n+1], ..., x$[i..n+1] For i=n+1, Ti is the suffix tree for x.  3 b aab$ 2 $ba $baa 1 4 $ 5 a 6

McCreight's algorithm Iteratively, for i=1,...,n+1 build tries, Ti, where ... is a trie of sequences x$[1..n+1], x$[2..n+1], ..., x$[i..n+1] For i=n+1, Ti is the suffix tree for x. The essential trick is being clever in how we insert x[i..n]$ into Ti so we don't spend O(n2) all in all.

Terminology A common prefix of x and y is a string, p, that is a prefix of both: x: y: p: The longest common prefix, p=LCP(x,y), is a prefix such that: x[|p|+1] ≠ y[|p|+1] x: a y: a ≠ b b p:

LCP and LCA For suffixes of x, x[i..n], x[j..n], their longest common prefix is their lowest common ancestor in the suffix tree: x[i..n]: a x[j..n]: b LCP: a LCA(i,j) a ≠ b b LCP(x[i..n],x[j..n]) i j

Head and tail Let head(i) denote the longest LCP of x[i..n]$ and x[j..n]$ for all j < i Let tail(i) be the string such that x[i..n]$=head(i)tail(i) Iteration i in McCreight’s algorithm consist of finding (or inserting) the node for head(i), and appending tail(i) head(i) tail(i) i

“The Trick” The trick in McCreight's algorithm is a clever way of finding head(i)

Lemma 5.2.1 Let head(i) = x[i..i+h]. Then x[i+1..i+h] is a prefix of head(i+1) i i+h head(i) tail(i) head(i+1) tail(i+1) i+1 ≥ 0

Proof Trivial for h=0 (head(i) empty), so assume h>0: Let head(i) = ay: By def.  j<i such that LCP(i,j)=ay Thus suffix j+1 and i+1 share prefix y Thus y is a prefix of LCP(i+1,j+1) Thus y is a prefix of head(i+1) a i+1 x[i..n]: a x[j..n]: a j+1

Suffix link Define s(u) =  if u=, v if u=av As a pointer from x[i..k] to x[i+1..k]: x[i+1..k] s x[i..k] i+1 i (ex 5.2.3: if u is a node, so is s(u))

Corollary of Lemma 5.2.1 s(head(i)) is a prefix of head(i+1) Thus: s(head(i)) is an ancestor of head(i+1) s(head(i)) s head(i) head(i+1) i i+1 s(head(i)) can be used as a shortcut!

Slowscan and fastscan Slowscan: if we do not know if string y is in Ti, we must search character by character Fastscan: if we do know that y is in x, we can jump directly from node to node At node u at (path-)depth d, follow the edge with label starting with y[d] Continue until we reach the end of y On a node (if y is in Ti) Or on an edge (if y is a prefix of a string in Ti) y[1] y[d1] y[d2] y[d3]

Sketch of McCreight's algorithm  Begin with the tree T1: For i=1,...,n, build tree Ti+1 satisfying: Ti+1 is a compressed trie for x[j..n]$, j ≤ i+1 All non-terminal nodes (with the possible exception of head(i)) have a suffix link s(-) Each iteration must: Add node i+1 Potentially add head(i+1) Add tail(i+1) Add suffix link head(i) → s(head(i)) x$ 1

At the beginning of iteration i... s(u) u s(parent(head(i))) parent(head(i)) s s(head(i))=w v head(i) Let head(i)=uv and parent(head(i))=u and w=s(u)v=s(head(i)) By the invariant, s(parent(head(i))) and the suffix link exists; by the lemma, w is an ancestor of head(i+1)

Steps in iteration i... s(u) u s(parent(head(i))) parent(head(i)) s s(head(i))=w v head(i) head(i+1) Move quickly to w, then search for head(i+1) starting there

Observe: w is in Ti head(i) is a prefix of x[j..n] for some j < i Thus w is a prefix of x[j+1..n] for some j < i i.e w is a prefix of some suffix j ≤ i i.e. w is in Ti Consequently: we can search for w from s(u) using fastscan! s(u) u s s(head(i))=w v

If w is a node Update s(head(i)) := w Then search for head(i+1) using slowscan

If w is on an edge If w is not a node, then all suffix j<i with prefix w agree on the next letter By definition of head(i) there is j<i such that suffix x[i..n] and x[j..n] differs after head(i) x[i+1..n] must also disagree at that character Thus head(i+1) must be w Add node w, update head(i):=w and set head(i+1)=w

McCreight's algorithm Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1

Example: x = abaab abaab$ 1 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1  abaab$ 1

Example: x = abaab head(1)= tail(1)=abaab$ i=1 abaab$ 1 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=1  abaab$ 1

Example: x = abaab head(2)= tail(2)=baab$ i=1 abaab$ $baab 2 1 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=1  abaab$ $baab 2 1

Example: x = abaab head(2)= tail(2)=baab$ i=2 a $baab baab$ $ba 2 3 1 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 i=2  a $baab baab$ $ba 2 3 1 head(3)=a tail(3)=ab$

Example: x = abaab head(3)=a i=3 tail(3)=ab$ a $baab baab$ $ba 2 3 1 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$  a $baab baab$ $ba 2 3 1

Example: x = abaab head(3)=a i=3 tail(3)=ab$ u v = a a $baab baab$ $ba Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$ u  v = a a $baab baab$ $ba 2 3 1 head(i)

Example: x = abaab head(3)=a i=3 tail(3)=ab$ u w v = a a $baab baab$ Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$ u w  v = a a $baab baab$ v[2..|v|]= $ba 2 3 1 head(i)

Example: x = abaab head(3)=a i=3 tail(3)=ab$ w a $baab b aab$ $ba 2 3 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$ w  a $baab b aab$ $ba 2 3 1 head(i) head(i+1)

Example: x = abaab head(3)=a i=3 tail(3)=ab$ w a $baab b aab$ $ba 2 3 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$ w  a $baab b aab$ $ba 2 3 1 head(i) head(i+1)

Example: x = abaab head(3)=a i=3 tail(3)=ab$ w a $baab b aab$ $ba 2 3 Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$ w  a $baab b aab$ $ba 2 3 1 head(i) head(i+1)

Example: x = abaab head(3)=a i=3 tail(3)=ab$ a $baab b aab$ $ba 2 3 $ Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(3)=a i=3 tail(3)=ab$  a $baab b aab$ $ba 2 3 $ 1 head(i+1) 4

Example: x = abaab head(4)=ab i=4 tail(4)=$ a a u $baab v=b b aab$ $ba Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(4)=ab i=4 tail(4)=$  a a u $baab v=b b aab$ $ba 2 3 $ 1 head(i) 4

Example: x = abaab head(4)=ab i=4 tail(4)=$ s(u) w a a u $baa b v=b Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(4)=ab i=4 tail(4)=$ s(u) w  a a u $baa b v=b b aab$ $ba 2 3 $ 1 head(i) 4

Example: x = abaab head(4)=ab i=4 tail(4)=$ w a head(i+1) a $baa b Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(4)=ab i=4 tail(4)=$ w  a head(i+1) a $baa b b aab$ $ba 2 3 $ 1 head(i) 4

Example: x = abaab head(4)=ab i=4 tail(4)=$ w a head(i+1) a $baa b Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(4)=ab i=4 tail(4)=$ w  a head(i+1) a $baa b b aab$ $ba 2 3 $ 1 head(i) 4

Example: x = abaab head(4)=ab i=4 tail(4)=$ a head(i+1) a $baa b Construct tree for x[1..n] for i = 1 to n do if head(i)= then head(i+1) = slowscan(,s(tail(i))) add i+1 and head(i+1) as node if necessary continue u = parent(head(i)) ; v = label(u,head(i)) if u≠ then w = fastscan(s(u),v) else w = fastscan(,v[2..|v|]) if w is an edge then add a node for w head(i+1) = w else if w is a node then head(i+1) = slowscan(w,tail(i)) add head(i+1) as node if necessary s(head(i)) = w add leaf i+1 and edge between head(i+1) and i+1 head(4)=ab i=4 tail(4)=$  a head(i+1) a $baa b b aab$ 5 $ba 2 3 $ 1 head(i) 4

Done Nothing new for i=5, inserting $... b a $ b $ 2 $baa $ aab$ 5 $ba  b a $ b $ 2 $baa $ aab$ 5 $ba 6 3 4 1

Correctness Correctness follows from the invariant: At iteration i we have a trie of all suffixes j < i. After the final iteration we have a trie of all suffixes of x$, i.e. we have the suffix tree of x.

Time and space usage Everything but searching is constant time per suffix, so the running time is O(n + “slowscan” + “fastscan”). We are not using more space than time, so the space usage is the same.

Slowscan time usage We use slowscan to find head(i+1) from w=s(head(i)), i.e. time |head(i+1)|-|head(i)|+1 for iteration i head(i) tail(i) head(i+1) tail(i+1) = |head(i+1)|-|head(i)|+1 A telescoping sum Sum = |head(n+1)|-|head(1)|+n Equal to n since head(n+1)=head(1)=

Fastscan time usage Fastscan uses time proportional to the number of nodes it process Define d(v) as the (node-)depth of node v Fastscan increases the node depth Following parent and suffix pointers decreases the node depth Time usage of fastscan is bounded by the total depth-increase (Amortized analysis)

Proposition d(v) ≤ d(s(v))+1 Proof: For any ancestor u of v, s(u) is an ancestor of s(v) Except for the empty prefix and the single letter prefix of v, the s(u)’s are different u s(u) v s(v)

Corollary In each step, before calling fastscan, we decrease the depth by at most 2: d(u) = d(head(i))-1; d(w) ≥ d(u)-1 u w head(i) The total decrease is thus 2n

Time usage of fastscan The time usage of fastscan is bounded by n plus the total decrease of depth, i.e. the time usage of fastscan is O(n)

Summary We iteratively build tries of suffixes of x. Using suffix links and fastscan we can quickly find where to insert the next suffix in our current trie. By amortized analysis, the total running time becomes linear.