Presentation is loading. Please wait.

Presentation is loading. Please wait.

Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,

Similar presentations


Presentation on theme: "Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,"— Presentation transcript:

1 Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2, Simon J. Puglisi 4, and Shiho Sugimoto 2 1.University of Sheffield, United Kingdom 2.Kyushu University, Japan 3.King’s College London, United Kingdom 4.University of Helsinki, Finland

2 Everything is String. A closed string is a string with a proper substring that occurs as a prefix and a suffix but does not have internal occurrences [Fici, 2011]. Closed Strings a b c a b a c a c b a a b c a a a a a a a Closing border

3 Everything is String. A closed string is a string with a proper substring that occurs as a prefix and a suffix but does not have internal occurrences [Fici, 2011]. –A string of length 1 is closed, where the closing border is the empty string ε. A closed string has a unique closing border. Closed Strings a b c a b a c a c b a a b c a a a a a a a Closing border

4 Everything is String. We introduce the Longest Closed Factor Array of a string w and an algorithm which computes it in O(n log n / loglog n) time and O(n) space. –n is the length of w. We introduce the Closed Factorization of a string w and the algorithm which compute it in O(n) time and space. –n is the length of w. Our Contribution

5 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

6 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ A = 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

7 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ A =5 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

8 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ A =5 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

9 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ A =54 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

10 Everything is String. Definition of Longest Closed Factor Array w =ababaacbbbcbcc$ A =543521632431211 123456789101112131415 The longest closed factor array of w of length n is an array A[1..n] of integers such that for any 1 ≤ i ≤ n, A[i] = l if and only if l is the length of the longest closed prefix of w[i..n].

11 Everything is String. Theorem 1 Given a string w of length n over an integer alphabet, the closed factor array of w can be computed in O(n log n / loglog n) time and O(n) space. Computing Longest Closed Factor Array

12 Everything is String. Lemma 1 The longest prefix of w[i..n] which has another occurrence to the right of i, is the closing border of the longest closed factor starting at i. Computing Longest Closed Factor Array w =ababaacbbbcbcc$ 123456789101112131415

13 Everything is String. Lemma 1 The longest prefix of w[i..n] which has another occurrence to the right of i, is the closing border of the longest closed factor starting at i. Computing Longest Closed Factor Array w =ababaacbbbcbcc$ 123456789101112131415

14 Everything is String. Lemma 1 The longest prefix of w[i..n] which has another occurrence to the right of i, is the closing border of the longest closed factor starting at i. Computing Longest Closed Factor Array w =ababaacbbbcbcc$ 123456789101112131415

15 Everything is String. Lemma 1 The longest prefix of w[i..n] which has another occurrence to the right of i, is the closing border of the longest closed factor starting at i. Computing Longest Closed Factor Array w =ababaacbbbcbcc$ 123456789101112131415

16 Everything is String. 1.Construct and preprocess the suffix tree of w. 2. i 1. 3.Compute the closing border b i starting at position i. –with the suffix tree constructed in Step 1 4.Find the leftmost occurrence j of b i in w[i+1..n]. –with a range successor query 5.A[i] j + |b i | – i. 6. i i +1. 7.Repeat Steps 3~5 until i = n. Outline of Our Algorithm

17 Everything is String. Construct the suffix tree of a given string w. Each leaf of the suffix tree stores the beginning position of the suffix corresponding to the leaf. Any internal node v of the suffix tree is labeled by the maximum leaf value in the subtree rooted at v. Step 1 a a a b a $ b $ $ a b a $ $ a b a b a $ $ 1 2 3 4 5 6 7 4 6 5 w = abaaba$ SA

18 Everything is String. Outline of Our Algorithm 1.Construct and preprocess the suffix tree of w. 2. i 1. 3.Compute the closing border b i starting at position i. –with the suffix tree constructed in Step 1 4.Find the leftmost occurrence j of b i in w[i+1..n]. –with a range successor query 5.A[i] j + |b i | – i. 6. i i +1. 7.Repeat Steps 3~5 until i = n.

19 Everything is String. Compute the closing border b i starting at position i. –Find the highest node x labeled i. –The path from the root to the parent of x is the closing border of longest closed factor starting at position i. Step 3 a a a b a $ b $ $ a b a $ $ a b a b a $ $ 1 2 3 4 5 6 7 4 6 5 Suffix Tree of abaaba$

20 Everything is String. Step 3 root i t i x u w i pathlabel(x) pathlabel(u) Suffix Tree of w a x : the highest node labeled i t u : the parent of x t How do we find node x?

21 Everything is String. Compute the closing border b i starting at position i. –Find the highest node x labeled i. Traverse the suffix tree from the root. –O(|x|) time for a constant alphabet. –O(|x| log n) time for an integer alphabet. An array P[1..n] enables us to find node x in O(1) time. –P[i] contains a pointer to node x in the tree for which i is the maximum leaf value. –P can be computed in O(n) time with pre-order traversing. Step 3

22 Everything is String. Outline of Our Algorithm 1.Construct and preprocess the suffix tree of w. 2. i 1. 3.Compute the closing border b i starting at position i. –with the suffix tree constructed in Step 1 4.Find the leftmost occurrence j of b i in w[i+1..n]. –with a range successor query 5.A[i] j + |b i | – i. 6. i i +1. 7.Repeat Steps 3~5 until i = n.

23 Everything is String. Step 4 root i i h x u G Suffix Tree of w a x : the highest node labeled i t t w i pathlabel(x) pathlabel(u) th u : the parent of x h is the successor of i in the set of the leaf values.

24 Everything is String. Compute the longest closed factor starting at position i. –Use a range successor query data structure for the suffix array [Yu et al., 2011]. Each internal node v stores the beginning and ending positions of the corresponding range in the suffix array. Step 4 a a a b a $ b $ $ a b a $ $ a b a b a $ $ 1 2 3 4 5 6 7 4 6 5 Suffix Tree of a b a a b a $ 1234567

25 Everything is String. Compute the longest closed factor starting at position i. –Use a range successor query data structure for the suffix array [Yu et al., 2011]. Each internal node v stores the beginning and ending positions of the corresponding range in the suffix array. Range successor query need O(log n / loglog n) time for each position i. Step 4 a a a b a $ b $ $ a b a $ $ a b a b a $ $ 1 2 3 4 5 6 7 4 6 5 Suffix Tree of a b a a b a $ 1234567

26 Everything is String. Given a string w of length n over an integer alphabet, the closed factor array of w can be computed in O(n log n / loglog n) time and O(n) space. Our Result 1

27 Everything is String. The closed factorization of string w of length n is a sequence (G 0,G 1,…,G k ) of strings such that G 0 = ε, w = G 1 …G k and, for each 1 ≤ j ≤ k, G j is the longest closed prefix of w[|G 1 …G j- 1 |+1..n]. Definition of Closed Factorization a b a b a a c b b b c b c c $ 123456789101112131415

28 Everything is String. Theorem 2 Given a string w of length n over an integer alphabet, the closed factorization of w can be computed in O(n) time and space. Computing Closed Factorization

29 Everything is String. 1.Construct and preprocess the suffix tree of w. 2. i 1. 3.Compute the closing border b i starting at position i. –with the suffix tree constructed in Step 1 4.Find the leftmost occurrence j of b i in w[i+1..n]. –with the KMP algorithm –Stop the KMP algorithm as soon as j is found. 5. i j + |b i |. 6.Repeat Steps 3~5 until i = n. Outline of Our Algorithm

30 Everything is String. We can compute each factor G j in O(|G j |) time with the KMP algorithm. Because the sum of the lengths of all factors is n, the total time to compute the closed factorization is O(n). Algorithm of Closed Factorization

31 Everything is String. Given a string w of length n over an integer alphabet, the closed factorization of w can be computed in O(n) time and space. Our Result 2

32 Everything is String. We introduced the Longest Closed Factor Array of a string and proposed an algorithm which computes it in O(n log n / loglog n) time and O(n) space. We introduced the Closed Factorization of a string and proposed an algorithm which computes it in O(n) time and space. Conclusion

33 Everything is String. Can we efficiently compute the longest closed factor array without range successor queries? Can we find the longest closed factor containing each position without the longest closed factor array? Open Problems


Download ppt "Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,"

Similar presentations


Ads by Google