Linear-time computation of local periods Linear-time computation of local periods Gregory Kucherov INRIA/LORIA Nancy, France joint work with Roman Kolpakov.

Slides:



Advertisements
Similar presentations
Chapter 13. Red-Black Trees
Advertisements

Palindrome Example TM M = (, Q, ) =,, 0, 1 Q = {q start, q copy, q left, q test, q halt } x {0, 1}* PAL(x) = 1 if x is a palindrome and 0 otherwise. That.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Isolation Technique April 16, 2001 Jason Ku Tao Li.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Suffix Trees Construction and Applications João Carreira 2008.
Equivalence, Order, and Inductive Proof
Computability and Complexity
The Fundamental Theorem of Arithmetic (2/12) Definition (which we all already know). A number greater than 1 is called prime if its only divisors are 1.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Tirgul 8 Graph algorithms: Strongly connected components.
Feb Polygon Triangulation Shmuel Wimer Bar Ilan Univ., School of Engineering.
Math 3121 Abstract Algebra I
Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
Operator Placement for In-Network Stream Query Processing.
Algorithm Design Techniques: Induction Chapter 5 (Except Section 5.6)
Data Structures – LECTURE 10 Huffman coding
1 More Applications of the Pumping Lemma. 2 The Pumping Lemma: Given a infinite regular language there exists an integer for any string with length we.
Fall 2006Costas Busch - RPI1 More Applications of the Pumping Lemma.
Prof. Busch - LSU1 More Applications of the Pumping Lemma.
Aho-Corasick Algorithm Generalizes KMP to handle sets of strings New ideas –keyword trees –failure functions/links –output links.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Linear Time Algorithms for Finding and Representing all Tandem Repeats in a String Dan Gusfield and Jens Stoye Journal of Computer and System Science 69.
Building Suffix Trees in O(m) time Weiner had first linear time algorithm in 1973 McCreight developed a more space efficient algorithm in 1976 Ukkonen.
Assignment 4. (Due on Dec 2. 2:30 p.m.) This time, Prof. Yao and I can explain the questions, but we will NOT tell you how to solve the problems. Question.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
Mathematical Preliminaries Strings and Languages Preliminaries 1.
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Straight line drawings of planar graphs – part II Roeland Luitwieler.
Kazunori Hirashima 1, Hideo Bannai 1, Wataru Matsubara 2, Kazuhiko Kusano 2, Akira Ishino 2, Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University,
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Cyclic Codes for Error Detection W. W. Peterson and D. T. Brown by Maheshwar R Geereddy.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
A fast algorithm for the generalized k- keyword proximity problem given keyword offsets Sung-Ryul Kim, Inbok Lee, Kunsoo Park Information Processing Letters,
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
V Spanning Trees Spanning Trees v Minimum Spanning Trees Minimum Spanning Trees v Kruskal’s Algorithm v Example Example v Planar Graphs Planar Graphs v.
On the Sorting-Complexity of Suffix Tree Construction MARTIN FARACH-COLTON PAOLO FERRAGINA S. MUTHUKRISHNAN Requires Math fonts downloadable from herehere.
Ravello, /09C.E. On some researches... Chiara Epifanio.
Powers and roots. Square each number a) 7 b) 12 c) 20 d) 9 e) 40 a) 49 b) 144 c) 400 d) 81 e) 1600.
Lecture 2 Plan: 1. Automatic Boolean Algebras 2. Automatic Linear Orders 3. Automatic Trees 4. Automatic Versions of König’s lemma 5. Intrinsic Regularity.
Properties of Regular Languages
Complexity and Computability Theory I Lecture #8 Instructor: Rina Zviel-Girshin Lea Epstein.
1 Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Math Review 1.
The Order of Operations Chapter Evaluate inside grouping symbols ( ), { }, [ ], | |, √ (square root), ─ (fraction bar) 2.Evaluate exponents 3.Multiply.
Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.
Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Chapter 5 Limits and Continuity.
CSCI 2670 Introduction to Theory of Computing
On universal partial words
Modeling with Recurrence Relations
Hans Bodlaender, Marek Cygan and Stefan Kratsch
Definitions Identifying Parts.
HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory
Chapter 16: Greedy Algorithms
Randomized Algorithms CS648
Chapter 22: Elementary Graph Algorithms I
Chapter 34: NP-Completeness
Mathematical Preliminaries Strings and Languages
Scientific Notation M8N1.
More Applications of the Pumping Lemma
Dynamic Programming II DP over Intervals
Definitions Identifying Parts.
Presentation transcript:

Linear-time computation of local periods Linear-time computation of local periods Gregory Kucherov INRIA/LORIA Nancy, France joint work with Roman Kolpakov (Moscow) and Jean-Pierre Duval, Thierry Lecroq, Arnaud Lefebvre (Rouen) Haifa Stringology Workshop, April

2 Periodicities (repetitions) in strings   period:  the (global) period: minimal period  periodicity = word of period  Example: square, cube  : fractional periodicity  periodicities = “runs” of squares  (cyclic) root, 8/3 exponent

3 Finding periodicities CGCGGCAGTTTTGCCGACTGTTTGGGACTTGCTCGAACTTGCCTATGCCAAGCTGCCGACGATTC CGCCCACCCTGTTGGAACGCGATTTTAATTTCCCGCCTTTTTCCGAACTCGAAGCCGAAGTCGCC AAAATCGCCGATTATCAAACGCGTGCCGGAAAGGAATGCCGCCGTGCAGCCTGAAACCTCCGCCC AATACCAGCACCGTTTCGCCCAAGCCATACGCGGGGGCGAAGCCGCAGACGGTCTGCCGCAAGAC CGACTGAACGTCTATATCCGCCTGATACGCAACAATATCTACAGCTTTATCGACCGTTGTTATAC CGAAACGCTGCAATACTTTGACCGCGAAGAATGGGGCCGTCTGAAAGAAGGTTTCGTCCGCGACG CGTGCGCCCAAACGCCCTATTTTCAAGAAATCCCCGGCGAGTTCCTCCAATATTGCCAAAGCCTG CCGCTTTTAGACGGCATTTTGGCACTGATGGATTTTGAATATACCCAATTGCTGGCAGAAGTTGC TCAAATTCCGGATATTCCCGACATTCATTATTCAAATGACAGCAAATACACACCTTCCCCTGCGG CCTTTATCCGGCAATATCGATATGATGTTACCGATGATTTGCATGAAGCGGAAACAGCCTTGTTA ATATGGCGAAACGCCGAAGATGATGTGATGTACCAAACATTGGACGGCTTCGATATGATGCTGCT AGAAATAATGGGGTTCTCCGCGCTTTCGTTTGACACCCTCGCCCAAACCCTTGTCGAATTTATGC CTGAGGACGATAATTGGAAAAATATTTTGCTTGGGAAATGGTCAGGCTGGACTGAACAAAGGATT ATCATCCCCTCCTTGTCCGCCATATCCGAAAATATGGAAGACAATTCCCCGGGCC

4 Finding periodicities CGCGGCAGTTTTGCCGACTGTTTGGGACTTGCTCGAACTTGCCTATGCCAAGCTGCCGACGATTC CGCCCACCCTGTTGGAACGCGATTTTAATTTCCCGCCTTTTTCCGAACTCGAAGCCGAAGTCGCC AAAATCGCCGATTATCAAACGCGTGCCGGAAAGGAATGCCGCCGTGCAGCCTGAAACCTCCGCCC AATACCAGCACCGTTTCGCCCAAGCCATACGCGGGGGCGAAGCCGCAGACGGTCTGCCGCAAGAC CGACTGAACGTCTATATCCGCCTGATACGCAACAATATCTACAGCTTTATCGACCGTTGTTATAC CGAAACGCTGCAATACTTTGACCGCGAAGAATGGGGCCGTCTGAAAGAAGGTTTCGTCCGCGACG CGTGCGCCCAAACGCCCTATTTTCAAGAAATCCCCGGCGAGTTCCTCCAATATTGCCAAAGCCTG CCGCTTTTAGACGGCATTTTGGCACTGATGGATTTTGAATATACCCAATTGCTGGCAGAAGTTGC TCAAATTCCGGATATTCCCGACATTCATTATTCAAATGACAGCAAATACACACCTTCCCCTGCGG CCTTTATCCGGCAATATCGATATGATGTTACCGATGATTTGCATGAAGCGGAAACAGCCTTGTTA ATATGGCGAAACGCCGAAGATGATGTGATGTACCAAACATTGGACGGCTTCGATATGATGCTGCT AGAAATAATGGGGTTCTCCGCGCTTTCGTTTGACACCCTCGCCCAAACCCTTGTCGAATTTATGC CTGAGGACGATAATTGGAAAAATATTTTGCTTGGGAAATGGTCAGGCTGGACTGAACAAAGGATT ATCATCCCCTCCTTGTCCGCCATATCCGAAAATATGGAAGACAATTCCCCGGGCC

5 Some work has been done... ... see R.Kolpakov,G.Kucherov, Periodic structures in words, chapter of the 3rd Lothaire volume Applied Combinatorics on Words, Cambridge University Press, 2005

6 Some work has been done... ... see R.Kolpakov,G.Kucherov, Periodic structures in words, chapter of the 3rd Lothaire volume Applied Combinatorics on Words, Cambridge University Press, 2005  different results based on common simple techniques: extension functions and s-factorization

7 Rest of this talk  Basics –extension functions –computing periodicities in time –s-factorisation (Lempel-Ziv factorization) –computing periodicities in time  Computing all local periods in time

8 Extension function: simplest definition  all values can be computed in time [Main&Lorentz 84]

9 Extension function: simplest definition  all values can be computed in time [Main&Lorentz 84]  a refined algorithm is presented in [Lothaire 05] (inspired from Manacher’s linear-time algorithm for computing palindromes)

10 Extension function: variants

11 Using extension functions to compute periodicities  Lemma: There exists a square of period iff

12 Using extension functions to compute periodicities  Example: a t a c g a a c g a a c g g t a c g a a c g a c g a a g a a c

13 Using extension functions to compute periodicities  Example: a t a c g a a c g a a c g g t a c g a a c g a c g a a g a a c

14 Using extension functions to compute periodicities This implies (using binary division) that  one can compute a compact representation of all squares (maximal periodicieis) in time  one can compute all squares in time [Crochemore 81, Main&Lorentz 84]  one can test the square-freeness in time

15 s-factorization (Lempel-Ziv factorization) , where : –if letter which immediately follows does not occur in, then –otherwise is the longest subword occurring at least twice in  Example:  s-factorization (Lempel-Ziv factorization) can be computed in linear time using suffix tree or DAWG

16 Why s-factorization is useful here

17 Why s-factorization is useful here

18 Why s-factorization is useful here  lemma of [Main 89]

19 Computing (a compact representation of) all squares in linear time 1.compute the s-factorization of (in ) 2.for each factor A.compute all maximal periodicities ending inside and crossing the border between and (in ) B.recover all maximal periodicities occurring inside from a left copy of (in ) Important: the number of maximal periodicities is while the number of squares can be

20 Using extension functions + s-factorization to compute periodicities This implies that  one can compute a compact representation of all squares (maximal periodicities) in time [Kolpakov,Kucherov 99]  one can compute all squares (but also cubes,...) in time  one can test the square-freeness in time [Crochemore 83, Main&Lorentz 85]

21 Local periods minimal (local) square at = minimal square centered at local period at (denoted ) = root length of the minimal square at internal square right-external square left- and right-external square

22 Critical Factorization Theorem   for any,  global period of   Critical Factorization Theorem: For every, there exists a position such that = global period of

23 Computing local periods (minimal squares)  compute separately –internal minimal squares –left-external and right-external minimal squares –both left- and right-external minimal squares  focus on internal minimal squares  compute s-factorization  for each factor, compute minimal squares ending in this factor

24 Minimal squares inside a factor

25 Minimal squares inside a factor

26 Minimal squares crossing factor border  focus on squares crossing the left border of

27 Minimal squares crossing factor border  focus on squares crossing the left border of  focus on those of them centered inside

28 Minimal squares crossing factor border  focus on squares crossing the left border of  focus on those of them centered inside  general idea: compute squares and pick the minimal ones

29 Minimal squares crossing factor border  focus on squares crossing the left border of  focus on those of them centered inside  general idea: compute squares and pick the minimal ones  be careful, the number of squares can be super-linear!!

30 Minimal squares crossing factor border  focus on squares crossing the left border of  focus on those of them centered inside  general idea: compute squares and pick the minimal ones  be careful, the number of squares can be super-linear!!  compute maximal periodicities in increasing order of periods

31 Minimal squares crossing factor border  focus on squares crossing the left border of  focus on those of them centered inside  general idea: compute squares and pick the minimal ones  be careful, the number of squares can be super-linear!!  compute maximal periodicities in increasing order of periods  only a linear number of squares need to be tested for minimality!!

32 Sketch of the proof  assume we are looking at squares of period

33 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found

34 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )

35 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )  if, then either, or

36 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )  if, then either, or

37 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )  if, then either, or

38 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )  if, then either, or

39 Sketch of the proof  assume we are looking at squares of period  consider largest period for which squares have been found  if, then test all squares of period (at most )  if, then either, or   at most squares need to be tested

40 Computing (right-)external squares

41 Computing (right-)external squares  use extension functions!

42 Computing (right-)external squares  use extension functions!

43 Computing (right-)external squares  use extension functions!

44 Computing (right-)external squares  use extension functions!  for each, find minimal such that  can be done in time

45 Conclusions  All local periods can be computed in  note that the global period of is