New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1.

Slides:



Advertisements
Similar presentations
Pumping Lemma Examples
Advertisements

Author : Xinming Chen,Kailin Ge,Zhen Chen and Jun Li Publisher : ANCS, 2011 Presenter : Tsung-Lin Hsieh Date : 2011/12/14 1.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Linear Time Construction of Suffix Tree.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
1 Module 19 LNFA subset of LFSA –Theorem 4.1 on page 131 of Martin textbook –Compare with set closure proofs Main idea –A state in FSA represents a set.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
1 Lecture 31 EQUAL language –Designing a CFG –Proving the CFG is correct.
Strings and Languages Operations
1 Module 30 EQUAL language –Designing a CFG –Proving the CFG is correct.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
Derrick Coetzee, Microsoft Research CC0 waiverCC0 waiver: To the extent possible under law, I waive all copyright and related or neighboring rights to.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia.
Second lecture REGULAR EXPRESSION. Regular Expression.
1 Learning languages from bounded resources: the case of the DFA and the balls of strings ICGI -Saint Malo September 2008 de la Higuera, Janodet and Tantini.
An Online Algorithm for Finding the Longest Previous Factors Daisuke Okanohara University of Tokyo Karlsruhe, Sep 15, 2008 Kunihiko.
Theory of Automata.
Formal Methods in SE Theory of Automata Qasiar Javaid Assistant Professor Lecture # 06.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Two examples English-Words English-Sentences alphabet S ={a,b,c,d,…}
Computing Left-Right Maximal Generic Words Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
SPLASH: Structural Pattern Localization Analysis by Sequential Histograms A. Califano, IBM TJ Watson Presented by Tao Tao April 14 th, 2004.
1 Context-Free Languages Not all languages are regular. L 1 = {a n b n | n  0} is not regular. L 2 = {(), (()), ((())),...} is not regular.  some properties.
Finding Characteristic Substrings from Compressed Texts Shunsuke Inenaga Kyushu University, Japan Hideo Bannai Kyushu University, Japan.
Kazunori Hirashima 1, Hideo Bannai 1, Wataru Matsubara 2, Kazuhiko Kusano 2, Akira Ishino 2, Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University,
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 8.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
An asymptotic lower bound for the maximal-number-of- runs function F. Franek & Q. Yang Dept. of Computing & Software McMaster University Hamilton, Ontario.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Recursive Definitions & Regular Expressions (RE)
Lecture 02: Theory of Automata:08 Theory of Automata.
Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 1 Yoshiaki Matsuoka 1, Tomohiro I 2, Shunsuke Inenaga 1, Hideo Bannai 1,
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
CS 3813: Introduction to Formal Languages and Automata
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
Linear-time computation of local periods Linear-time computation of local periods Gregory Kucherov INRIA/LORIA Nancy, France joint work with Roman Kolpakov.
Lecture # 4.
Lecture 2 Theory of AUTOMATA
Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences.
Lecture 02: Theory of Automata:2014 Asif Nawaz Theory of Automata.
Lecture 03: Theory of Automata:2014 Asif Nawaz Theory of Automata.
1 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations pop, push, and nop. PDAs always.
CHAPTER TWO LANGUAGES By Dr Zalmiyah Zakaria.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
By Dr.Hamed Alrjoub. 1. Introduction to Computer Theory, by Daniel I. Cohen, John Wiley and Sons, Inc., 1991, Second Edition 2. Introduction to Languages.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
Languages and Strings Chapter 2. (1) Lexical analysis: Scan the program and break it up into variable names, numbers, etc. (2) Parsing: Create a tree.
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
Continuous Distributions
Theory of Computation Lecture #
PROGRAMMING LANGUAGES
Alternative Algorithms for Lyndon Factorization
Andrzej Ehrenfeucht, University of Colorado, Boulder
Theory of Automata.
Regular Expressions (Examples)
Comparison of large sequences
Assume that p, q, and r are in
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Reachability on Suffix Tree Graphs
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
New Lower Bounds for the Maximum Number of Runs in a String
HIDING IN PLAIN SIGHT A QUICK INTRO TO STEGANOGRAPHY
Welcome to ! Theory Of Automata Irum Feroz
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
Presentation transcript:

New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1 Tohoku University, Japan 2 Kyushu University, Japan

/ 23 Contents  Introduction  New lower bounds  A brief history of results on bounds  Simple heuristics for generating run-rich strings  Analyzing asymptotic lower bounds  Discussion  Conclusion and further research 2Prague Stringology Conference 2008

/ 23 runs  runs: occurrence of a periodic factor  non-extendable (maximal)  exponent at least two  primitive-rooted  example: aabaabaaaacaacac aabaabaa= ( aab ) period:3 root :aab exponent: Prague Stringology Conference 2008

/ 23 number of runs: ρ(n)  run(w) : number of runs in string w  ρ(n) = max{run(w) : |w| = n }  maximum number of runs in a string of length n  For any string w, 4Prague Stringology Conference 2008 example run (aabaabbaabaa)=8

/ n [Franek et al. ’03] [Franek & Yang ’06] 1.05n 0.90n 0.95n c 1.048n [Crochemore et al. ’08] Max Number of Runs in a String 5n [Rytter ’06] 3.48n [Puglisi et al. ’08] 3.44n [Rytter ’07] 1.6n [Crochemore & Ilie ’08] n 2n 3n 4n 5n 0 cn [Kolpakov & Kucherov ’99] c 1.00n 5

/ 23 Our result: New lower bound We discovered a run-rich string τ aababaababbabaababaababbabaababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaababbaba ababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabb abaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaaba bbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaa babaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaab abbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbaba ababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbaba ababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababba baababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaabab aababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababb abaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaaba baabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaaba baababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaa babaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaab abbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbaba ababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaabab τ = 6Prague Stringology Conference 2008 run(τ) = 1455, | τ | = 1558 Known best lower bound [Franek et al. ’03] Known best lower bound [Franek et al. ’03] New lower bound

/ 23 How to generate run-rich string  run(τ) = 1455, | τ |= 1558  Let τ’ = τ[1:1557] (delete the last character), the number of runs not decrease drastically.  run(τ’) = 1453, | τ’ |= 1557  In order to generate run-rich string, We only have to do is to append single character to run-rich string. 7Prague Stringology Conference 2008

/ 23 8Prague Stringology Conference 2008 a aa ab aaaa aaab aaba aabb abaa abab abba abbb aaa aab aba abb aaaaa 1 aaaab 1 aaaba 1 aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 abaaa 1 abaab 1 ababa 1 ababb 2 abbaa 2 abbab 1 abbba 1 abbbb 1 aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 ababb 2 abbaa 2 aaaaa 1 aaaab 1 aaaba 1 Select Top10  The search first starts with the single string “ a ” in the buffer.  At each round, two new strings are created from each string in the buffer by appending “ a ” or “ b ” to the string.  The new strings are then sorted with respect to the number of runs.  Only those that fit in the buffer size are retained for the next round. buffer size:10

/ 23 9 Prague Stringology Conference 2008 aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 ababb 2 abbaa 2 aaaaa 1 aaaab 1 aaaba 1 aabaab 3 aababb 3 aabbaa 3 aaabba 2 aaabbb 2 aabaaa 2 aababa 2 aabbab 2 aabbba 2 aabbbb 2 aabaabb 4 aabbabb 4 aabaaba 3 aababba 3 aababbb 3 aabbaaa 3 aabbaab 3 aaabbaa 3 aababaa 3 aabbaba 3 Select Top10 Select Top10 The string in the buffer become run-rich. aaabba aaabbb aabaaa aabaab aababa aababb aabbaa aabbab aabbba aabbbb ababba ababbb abbaaa abbaab aaaaaa aaaaab aaaaba aaaabb aaabaa aaabab aabaaba aabaabb aababba aababbb aabbaaa aabbaab aaabbaa aaabbab aaabbba aaabbbb aabaaaa aabaaab aababaa aababab aabbaba aabbabb aabbbaa aabbbab aabbbba aabbbbb

/ Improving lower bound of ρ(n) (1/2) Prague Stringology Conference 2008 We discovered a run-rich string τ such that  run(τ) = 1455, | τ |= 1558  run(τ 2 ) = 2915, | τ 2 |= 2 ・ 1558 = 3116 run(τ 2 ) > 2run(τ)

/ 23 Improving lower bound of ρ(n) (2/2)  Using run-rich string τ, can we push lower bounds higher up more? krun(τ k )|τk||τk| ( ρ(n) ≧ ) run(τ k )/|τ k | :: Next, we give a formula that calculate number of runs in w k. 11Prague Stringology Conference 2008

/ 23 Number of runs in w k Theorem Let w be a string of length n. For any k ≧ 2, run(w k ) = Ak - B where A = run(w 3 ) - run(w 2 ) and B = 2run(w 3 ) - 3run(w 2 ) Theorem Let w be a string of length n. For any k ≧ 2, run(w k ) = Ak - B where A = run(w 3 ) - run(w 2 ) and B = 2run(w 3 ) - 3run(w 2 ) 12Prague Stringology Conference 2008

/ 23 Proof of the theorem (1/4)  If two strings w k and w are concatenated, the number of runs in w k+1 is changed in two cases: case (a): increase A new run may be newly created at the border between two strings. abba abbaabba 13Prague Stringology Conference 2008

/ 23 Proof of the theorem (2/4)  If two strings w k and w are concatenated, the number of runs in w k+1 is changed in two cases: case (b):decrease A suffix run in w k and a prefix run in w may be merged into one run in w k+1. aabaaaabaa aabaaaabaa 14Prague Stringology Conference 2008

/ 23 Proof of the theorem (3/4)  By periodicity lemma, there is no runs in w k such that length is longer than 2|w| except the whole string w k.  For any k ≧ 3, run(w k ) - run(w k-1 ) = c ( constant). w w w w w w w w w w 15Prague Stringology Conference 2008

/ 23 Proof of the theorem (4/4) Theorem Let w be a string of length n. For any k ≧ 2, run(w k ) = Ak - B where A = run(w 3 ) - run(w 2 ) and B = 2run(w 3 ) - 3run(w 2 ) Theorem Let w be a string of length n. For any k ≧ 2, run(w k ) = Ak - B where A = run(w 3 ) - run(w 2 ) and B = 2run(w 3 ) - 3run(w 2 ) proof For any k ≧ 3, run(w k ) - run(w k-1 ) is a constant. 16Prague Stringology Conference 2008

/ 23 Asymptotic behavior of ρ (n) Theorem For any string w and any ε>0, there exists a positive integer N such that for any n ≧ N, Theorem For any string w and any ε>0, there exists a positive integer N such that for any n ≧ N, proof 17Prague Stringology Conference 2008

/ 23 Discovered run-rich strings Length of τ r(τ)r(τ)r(τ2)r(τ2)r(τ3)r(τ3) ρ (n) ≧ We found some run-rich strings by using heuristic search. The strings in the buffer are sorted with respect to r(w 3 )-r(w 2 ), instead of r(w) for improving asymptotic behavior. current best lower bound 18Prague Stringology Conference 2008 See our web site [

/ 23 Discussion  What is the class of run-rich strings?  Sturmian words are not run-rich. [Rytter2008]  (for any Sturmian word w )  Any recursive construction of a sequence of run-rich strings?  We believe that compression has a clue to understanding.  run-rich string τ (|τ|= ) can be represented by only 24 LZ factors. 19Prague Stringology Conference 2008

/ 23 LZ-factorization of τ ( |τ| = ) a, (0,1) / b / (1, 3) / (1, 4) / (2, 8) / (5, 13) (12,19) / (26,31) / (49,38) / (50,63) / (89,93) / (113,162) / (57,317) / (249,693) / (275,984) / (879,2120) / (942,3041) / (2811,6521) / (2999,9374) / (8764,20072) / (9332,28878) / (27096,45341) / (38210,67195) 20Prague Stringology Conference 2008 aababaababbabaababaababbabaababab… τ = LZ(τ)= (0,1) (1,3) (1,4) (2,8) (5,13) :

/ 23 Conclusion  We Introduced new approach for analyzing lower bounds using heuristic search.  We Improved the lower bound of the number of runs in a string.  new lower bound is Prague Stringology Conference 2008

/ 23 Further research  Improving heuristic algorithm  Speed up for counting runs in strings  Find good heuristics  Guess run-rich strings in compressed form (LZ factors)  Analyzing the class of run-rich strings  Any recursive construction of a sequence of run-rich strings?  Relation with compression  Algorithms for finding all runs in strings  process compressed string without decompression. 22Prague Stringology Conference 2008

/ n [Franek et al. ’03] 1.05n 0.90n 0.95n c n [Matsubara et al. ’08] 1.048n [Crochemore et al. ’08] Max Number of Runs in a String 5n [Rytter ’06] 3.48n [Puglisi et al. ’08] 3.44n [Rytter ’07] 1.6n [Crochemore & Ilie ’08] n 2n 3n 4n 5n 0 cn [Kolpakov & Kucherov ’99] c 1.00n 23

/ 23  Appendix 24Prague Stringology Conference 2008

/ Conjecture: ρ(n) < n Prague Stringology Conference 2008