HABATAKITAI Laboratory Everything is String. Computing palindromic factorization and palindromic covers on-line Tomohiro I, Shiho Sugimoto, Shunsuke Inenaga,

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Boosting Textual Compression in Optimal Linear Time.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Longest Common Subsequence
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
Fast Algorithms For Hierarchical Range Histogram Constructions
1 Suffix Arrays: A new method for on-line string searches Udi Manber Gene Myers May 1989 Presented by: Oren Weimann.
Inferring Strings from Suffix Trees and Links on a Binary Alphabet Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
1 CSCI-2400 Models of Computation. 2 Computation CPU memory.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: Boyer-Moore Algorithm.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: KMP Algorithm Lecturer:
Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
Aho-Corasick Algorithm Generalizes KMP to handle sets of strings New ideas –keyword trees –failure functions/links –output links.
An O(N 2 ) Algorithm for Discovering Optimal Boolean Pattern Pairs Hideo Bannai, Heikki Hyyro, Ayumi Shinohara, Masayuki Takeda, Kenta Nakai and Satoru.
1 Boyer-Moore Charles Yan Exact Matching Boyer-Moore ( worst-case: linear time, Typical: sublinear time ) Aho-Corasik ( A set of pattern )
Linear Time Algorithms for Finding and Representing all Tandem Repeats in a String Dan Gusfield and Jens Stoye Journal of Computer and System Science 69.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia.
Longest Palindromic Substring Yang Liu. Problem Given a string S Find the longest palindromic substring in S. Example: S=“abcbcbb”. The longest palindromic.
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
Longest Increasing Subsequences in Windows Based on Canonical Antichain Partition Erdong Chen (Joint work with Linji Yang & Hao Yuan) Shanghai Jiao Tong.
Zvi Kohavi and Niraj K. Jha 1 Memory, Definiteness, and Information Losslessness of Finite Automata.
Computing Left-Right Maximal Generic Words Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Random Generation and Enumeration of Bipartite Permutation Graphs Toshiki Saitoh (JAIST, Japan) Yota Otachi (Gunma Univ., Japan) Katsuhisa Yamanaka (UEC,
Parallel Suffix Array Construction by Accelerated Sampling Matthew Felice Pace University of Warwick Joint work with Alexander Tiskin University of Warwick.
Finding Characteristic Substrings from Compressed Texts Shunsuke Inenaga Kyushu University, Japan Hideo Bannai Kyushu University, Japan.
Sorting Fun1 Chapter 4: Sorting     29  9.
An Implementation of The Teiresias Algorithm Na Zhao Chengjun Zhan.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet:
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 1 Yoshiaki Matsuoka 1, Tomohiro I 2, Shunsuke Inenaga 1, Hideo Bannai 1,
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
ALGORITHMS.
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
Bringing Together Paradox, Counting, and Computation To Make Randomness! CS Lecture 21 
28 Aug, 2006PSC Song Classification for Dancing Manolis Cristodoukalis, Costas Iliopoulos, M. Sohel Rahman, W.F. Smyth.
Applications of Suffix Trees Dr. Amar Mukherjee CAP 5937 – ST: Bioinformatics University of central Florida.
Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.
Example 2 You are traveling by a canoe down a river and there are n trading posts along the way. Before starting your journey, you are given for each 1
Languages and Strings Chapter 2. (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Efficient Computation of Substring Equivalence Classes with Suffix Arrays this study is a joint work with Shunsuke Inenaga, Hideo Bannai and Masayuki Takeda.
Burrows-Wheeler Transformation Review
Alternative Algorithms for Lyndon Factorization
Succinct Data Structures
Enumerating Distances Using Spanners of Bounded Degree
Searching Similar Segments over Textual Event Sequences
Suffix trees.
Reachability on Suffix Tree Graphs
Minimizing the Aggregate Movements for Interval Coverage
On the Range Maximum-Sum Segment Query Problem
Dynamic Programming II DP over Intervals
Recap lecture 18 NFA corresponding to union of FAs,example, NFA corresponding to concatenation of FAs,examples, NFA corresponding to closure of an FA,examples.
Lecture 5 Dynamic Programming
Presentation transcript:

HABATAKITAI Laboratory Everything is String. Computing palindromic factorization and palindromic covers on-line Tomohiro I, Shiho Sugimoto, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda (Kyushu University)

HABATAKITAI Laboratory Everything is String. When the union of intervals [b 1,e 1 ],…,[b h,e h ] equals [1,n], the set of these intervals is called a cover of a string w of length n. There exist several kind of covers with specific properties. –All intervals correspond to the same substring (cover). [Li and Smyth, 2002] Cover of string a a a b a a a b a a a a b a

HABATAKITAI Laboratory Everything is String. When the union of intervals [b 1,e 1 ],…,[b h,e h ] equals [1,n], the set of these intervals is called a cover of a string w of length n. There exist several kind of covers with specific properties. –All intervals correspond to the same substring (cover). [Li and Smyth, 2002] –All substrings corresponding to the intervals are permutations of w[b 1,e 1 ] (Abelian cover). [Matsuda et al., 2014 (unpublished)] Cover of string a a a b a a a b a a a a b a

HABATAKITAI Laboratory Everything is String. A mutually disjoint cover of a string w is called a factorization of string w. Each interval of a factorization is called a factor. There exist several factorizations with specific properties. –Each factor is the longest previously occurring substring (LZ77). [Ziv and Lempel, 1977] Factorizations of string a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. A mutually disjoint cover of a string w is called a factorization of string w. Each interval of a factorization is called a factor. There exist several factorizations with specific properties. –Each factor is the longest previously occurring substring (LZ77). [Ziv and Lempel, 1977] –A sequence of lexicographically non-increasing Lyndon words. (Lyndon factorization). [Chen, Fox, Lyndon, 1958] Factorizations of string a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. A mutually disjoint cover of a string w is called a factorization of string w. Each interval of a factorization is called a factor. There exist several factorizations with specific properties. –Each factor is the longest previously occurring substring (LZ77). [Ziv and Lempel, 1977] –A sequence of lexicographically non-increasing Lyndon words. (Lyndon factorization). [Chen, Fox, Lyndon, 1958] Factorizations of string In this work, we consider covers and factorizations with palindromes. a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. Palindrome When string w = ss R or w = sas R (s R is the reversed string of s and a is a character), string w is a palindrome. Examples – I– I –racecar – × 567 = 765 × – よのなかねかおかおかねかなのよ

HABATAKITAI Laboratory Everything is String. Palindrome When string w = ss R or w = sas R (s R is the reversed string of s and a is a character), string w is a palindrome. Examples – I –racecar – × 567 = 765 × – よのなかねかおかおかねかなのよ (= In this world, it is either money or face.)

HABATAKITAI Laboratory Everything is String. Smallest maximal palindromic factorization (SMPF) [Alatabbi et al., 2013] –off-line, O(n) time, O(n) space Smallest maximal palindromic factorization (SMPF) –on-line, O(n log n) time, O(n) space Smallest palindromic factorization (SPF) –on-line, O(n log n) time, O(n) space Smallest palindromic cover (SPC) –off-line, on-line, O(n) time, O(n) space Covers and factorizations with palindromes Previous Work This Work Independently Fici et al. obtained the same result for SPF.(Arxiv, 2014)

HABATAKITAI Laboratory Everything is String. For each interval [b, e] in a cover of string w, if w[b…e] is a palindrome, then this cover is called a palindromic cover of w. Palindromic cover a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. For each interval [b, e] in a cover of string w, if w[b…e] is a palindrome, then this cover is called a palindromic cover of w. Palindromic cover a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. Input string w of length n Output a smallest palindromic cover of w (a palindromic cover with minimum number of intervals) Smallest palindromic cover a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. The number of palindromes in w can be Ω(n 2 ). If we compute a smallest palindromic cover (SPC) with a naïve algorithm, then we need O(n 2 ) time. We propose an O(n) time algorithm which uses only maximal palindromes. Algorithm for smallest palindromic cover

HABATAKITAI Laboratory Everything is String. Definition If w[i..j] = (w[i..j]) R and w[i-1] ≠ w[j+1], i = 1 or j = n, then w[i..j] is the maximal palindrome of position (i+j)/2. Maximal palindrome a a b a a b a a b a b a a b b a a a b a 6

HABATAKITAI Laboratory Everything is String. Definition If w[i..j] = (w[i..j]) R and w[i-1] ≠ w[j+1], i = 1 or j = n, then w[i..j] is the maximal palindrome of position (i+j)/2. Maximal palindrome 14.5 a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. Maximal palindrome Definition If w[i..j] = (w[i..j]) R and w[i-1] ≠ w[j+1], i = 1 or j = n, then w[i..j] is the maximal palindrome of position (i+j)/2. Properties –The number of maximal palindromes of string of length n is 2n-1. –All palindromes of string w are substrings of some maximal palindromes of string w.

HABATAKITAI Laboratory Everything is String. A smallest palindromic cover can be computed in O(n) time. –Because we want to cover the input string with fewest palindromes, it suffices to consider only maximal palindromes. –All maximal palindromes can be computed in O(n) time [Manacher, 1975]. –Using a well known algorithm, we can compute a smallest cover of 2n-1 intervals in O(n) time. Algorithm For Smallest Palindromic Cover

HABATAKITAI Laboratory Everything is String. Off-line algorithm Algorithm For Smallest Palindromic Cover a a b a a b a a b a b a a b b a a a b a all maximal palindromes

HABATAKITAI Laboratory Everything is String. Off-line algorithm Algorithm For Smallest Palindromic Cover a a b a a b a a b a b a a b b a a a b a all maximal palindromes

HABATAKITAI Laboratory Everything is String. Off-line algorithm Algorithm For Smallest Palindromic Cover a a b a a b a a b a b a a b b a a a b a smallest palindromic cover

HABATAKITAI Laboratory Everything is String. Off-line algorithm We can extend this algorithm on-line (computing smallest palindromic covers for all prefixes). Algorithm For Smallest Palindromic Cover a a b a a b a a b a b a a b b a a a b a smallest palindromic cover

HABATAKITAI Laboratory Everything is String. Palindromic factorization When all factors of a factorization of string w are palindromes, it is called a palindromic factorization of string w.

HABATAKITAI Laboratory Everything is String. Palindromic factorization a a b a a b a a b a b a a b b a a a b a When all factors of a factorization of string w are palindromes, it is called a palindromic factorization of string w.

HABATAKITAI Laboratory Everything is String. Smallest Palindromic factorization When the number of factors of a palindromic factorization of string w is minimum, it is called a smallest palindromic factorization of string w. a a b a a b a a b a b a a b b a a a b a

HABATAKITAI Laboratory Everything is String. Computing online –We scan the input string w from left to right, and at each position i we compute the length of the last factor of a smallest palindromic factorization of w[1…i]. Assume we have processed w[1…i-1], and think about computing the length of the last factor of a smallest palindromic factorization of w[1…i]. –We use a dynamic programming. Algorithm for smallest palindromic factorization

HABATAKITAI Laboratory Everything is String. Any smallest palindromic factorization of w[1…i] has one of the suffix palindromes ending at position i. Algorithm for smallest palindromic factorization w i1

HABATAKITAI Laboratory Everything is String. Any smallest palindromic factorization of w[1…i] has one of the suffix palindromes ending at position i. Algorithm for smallest palindromic factorization w i1 suffix palindrome

HABATAKITAI Laboratory Everything is String. Algorithm for smallest palindromic factorization However the number of suffix palindromes ending at position i can be Ω(i). Ω(i)Ω(i) To overcome this, we show that at each position i, it suffices to consider only O(log i) suffix palindromes. w i1

HABATAKITAI Laboratory Everything is String. Algorithm for smallest palindromic factorization However the number of suffix palindromes ending at position i can be Ω(i). So a naïve DP algorithm takes O(n 2 ) time. Ω(i)Ω(i) To overcome this, we show that at each position i, it suffices to consider only O(log i) suffix palindromes. w i1

HABATAKITAI Laboratory Everything is String. Property of suffix palindromes There are O(log n) different periods for suffix palindromes of a string of length n [Apostolico et al., 1995] . a b a b a c a b a b a c a b a b a f a b a b a c a b a b a c a b a b a O(log n)

HABATAKITAI Laboratory Everything is String. Property of suffix palindromes There are O(log n) different periods for suffix palindromes of a string of length n [Apostolico et al., 1995] . –We can maintain these groups online in O(n log n) time, because they are periodic. O(log n) a b a b a c a b a b a c a b a b a f a b a b a c a b a b a c a b a b a

HABATAKITAI Laboratory Everything is String. Smallest palindromic factorization Consider a group of suffix palindromes w.r.t. period d. For any position j, let k j be the size (the number of factors) of a smallest palindromic factorization of w[1…j]. Let k i (d) be the size of a smallest palindromic factorization of w[1…i] which has a suffix palindrome with period d. Then we have k i (d) = min{k i-l +1, k i-d(d) } i d d d d d d i-di-d l

HABATAKITAI Laboratory Everything is String. Some strings don’t have a maximal palindromic factorization. –a b a c a On the other hand, all strings have a palindromic cover and palindromic factorization. For any string w, we have the following relationships between the sizes of smallest palindromic cover (SPC(w)), smallest palindromic factorization (SPF(w)), and smallest maximal palindromic factorization (SMPF(w)). –|SPC(w)| ≦ |SPF(w)| ≦ |SMPF(w)| Relationships between the sizes of palindromic cover and factorizations

HABATAKITAI Laboratory Everything is String. Some strings don’t have a maximal palindromic factorization. –a b a c a On the other hand, all strings have a palindromic cover and palindromic factorization. For any string w, we have the following relationships between the sizes of smallest palindromic cover (SPC(w)), smallest palindromic factorization (SPF(w)), and smallest maximal palindromic factorization (SMPF(w)). –|SPC(w)| ≦ |SPF(w)| ≦ |SMPF(w)| Relationships between the sizes of palindromic cover and factorizations 4 a b b a b a

HABATAKITAI Laboratory Everything is String. Some strings don’t have a maximal palindromic factorization. –a b a c a On the other hand, all strings have a palindromic cover and palindromic factorization. For any string w, we have the following relationships between the sizes of smallest palindromic cover (SPC(w)), smallest palindromic factorization (SPF(w)), and smallest maximal palindromic factorization (SMPF(w)). –|SPC(w)| ≦ |SPF(w)| ≦ |SMPF(w)| Relationships between the sizes of palindromic cover and factorizations 3 a b b a b a 4

HABATAKITAI Laboratory Everything is String. a b b a b a Some strings don’t have a maximal palindromic factorization. –a b a c a On the other hand, all strings have a palindromic cover and palindromic factorization. For any string w, we have the following relationships between the sizes of smallest palindromic cover (SPC(w)), smallest palindromic factorization (SPF(w)), and smallest maximal palindromic factorization (SMPF(w)). –|SPC(w)| ≦ |SPF(w)| ≦ |SMPF(w)| Relationships between the sizes of palindromic cover and factorizations 2 3 4

HABATAKITAI Laboratory Everything is String. We presented algorithms which compute –a smallest palindromic cover in O(n) time off-line and on-line. –a smallest palindromic factorization in O(n log n) time on-line. –a smallest maximal palindromic factorization in O(n log n) time on-line. Conclusion

HABATAKITAI Laboratory Everything is String. Enumerating all smallest palindromic covers. Enumerating all smallest palindromic factorizations. Computing a palindromic factorization where each factor is distinct. Open problems

HABATAKITAI Laboratory Everything is String. Algorithm For Smallest Palindromic Cover w l s i 352 h h

HABATAKITAI Laboratory Everything is String. The sum of periods