New Lower Bounds for the Maximum Number of Runs in a String

Slides:



Advertisements
Similar presentations
Analysis of Algorithms II
Advertisements

Boosting Textual Compression in Optimal Linear Time.
Theory Of Automata By Dr. MM Alam
Longest Common Subsequence
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
1 Lecture 31 EQUAL language –Designing a CFG –Proving the CFG is correct.
Strings and Languages Operations
1 Module 30 EQUAL language –Designing a CFG –Proving the CFG is correct.
Induction and recursion
Analysis of Recursive Algorithms October 29, 2014
New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1.
Induction and recursion
CMPS 3223 Theory of Computation
Theory of Automata.
Formal Methods in SE Theory of Automata Qasiar Javaid Assistant Professor Lecture # 06.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Two examples English-Words English-Sentences alphabet S ={a,b,c,d,…}
Computing Left-Right Maximal Generic Words Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 Chapter 1 Introduction to the Theory of Computation.
Kazunori Hirashima 1, Hideo Bannai 1, Wataru Matsubara 2, Kazuhiko Kusano 2, Akira Ishino 2, Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University,
Module 2 How to design Computer Language Huma Ayub Software Construction Lecture 8.
An asymptotic lower bound for the maximal-number-of- runs function F. Franek & Q. Yang Dept. of Computing & Software McMaster University Hamilton, Ontario.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Recursive Definitions & Regular Expressions (RE)
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Semi-dynamic compact index for short patterns and succinct van Emde Boas tree 1 Yoshiaki Matsuoka 1, Tomohiro I 2, Shunsuke Inenaga 1, Hideo Bannai 1,
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
CS 203: Introduction to Formal Languages and Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
Lecture 2 Theory of AUTOMATA
1 Section 11.1 Regular Languages Problem: Suppose the input strings to a program must be strings over the alphabet {a, b} that contain exactly one substring.
1 Chapter Regular Languages. 2 Section 11.1 Regular Languages Problem: Suppose the input strings to a program must be strings over the alphabet.
Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences.
Lecture 02: Theory of Automata:2014 Asif Nawaz Theory of Automata.
By Dr.Hamed Alrjoub. 1. Introduction to Computer Theory, by Daniel I. Cohen, John Wiley and Sons, Inc., 1991, Second Edition 2. Introduction to Languages.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
CSE15 Discrete Mathematics 03/06/17
CONTEXT-FREE LANGUAGES
Pumping Lemma.
Theory of Computation Lecture #
Implementation of Haskell Modules for Automata and Sticker Systems
PROGRAMMING LANGUAGES
Alternative Algorithms for Lyndon Factorization
Modeling with Recurrence Relations
Andrzej Ehrenfeucht, University of Colorado, Boulder
Theory of Automata.
Regular Expressions (Examples)
Pushdown Automata Reading: Chapter 6.
Algorithm design and Analysis
Assume that p, q, and r are in
Kleene’s Theorem Muhammad Arif 12/6/2018.
Pumping Lemma.
Reachability on Suffix Tree Graphs
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
Copyright © Cengage Learning. All rights reserved.
MODULE – 1 The Number System
Chapter 1 Introduction to the Theory of Computation
Copyright © Cengage Learning. All rights reserved.
Discrete Mathematics 7th edition, 2009
Welcome to ! Theory Of Automata Irum Feroz
Cool Results Involving Compositions
Presentation transcript:

New Lower Bounds for the Maximum Number of Runs in a String 1: Title Thank you, Chairman. The title of our work is “New lower bounds for the maximum number of runs in a string” and I'm Wataru matsubara from Graduate School of Information Sciences Tohoku University, Japan. Wataru Matsubara1, Kazuhiko Kusano1, Akira Ishino1, Hideo Bannai2, Ayumi Shinohara1 1Tohoku University, Japan 2Kyushu University, Japan

Contents Introduction New lower bounds A brief history of results on bounds Simple heuristics for generating run-rich strings Analyzing asymptotic lower bounds Discussion Conclusion and further research 2: Contents This is the contents of my presentation. So, make use of a extra time, we will talk some appendix topics. At first, I show a brief history of result on bounds. Next, I introduce simple heuristics for generating run-rich strings. And I discuss about asymptotic lower bounds. And finally, I talk about conclusion and further research. Prague Stringology Conference 2008

runs runs: occurrence of a periodic factor aabaabaa=(aab) example: non-extendable (maximal) exponent at least two primitive-rooted example: aabaabaaaacaacac 3 8 aabaabaa=(aab) period :3 root :aab exponent: 3 8 3: runs At first I introduce what is run. Run is a maximal (non-extendable) repetition of exponent at least two. For example, the string ("aabaabaaaacaacac") has 7 runs. Especially, "aabaabaa" has period 3, and the substring which has a period 3 is non-extendable. and the root of run is "aab", which is primitive. Prague Stringology Conference 2008

number of runs: ρ(n) run(aabaabbaabaa)=8 run(w) :number of runs in string w ρ(n) = max{run(w) : |w| = n } maximum number of runs in a string of length n For any string w, example run(aabaabbaabaa)=8 4: number of runs: rho(n) Next, we define functions about the number of runs in a string. For any string w, function run(w) represents the number of runs in w. for example, this formula means this string has 8 runs. And, rho(n) represents maximum number of runs in a string of length n. For small number n, We can calculate rho(n) to count runs in all strings up to the length n. There is great interest in the behavior of function rho(n). n 1 2 3 4 5 6 7 8 9 10 11 12 … ρ(n) Prague Stringology Conference 2008

Max Number of Runs in a String c cn [Kolpakov & Kucherov ’99] 5n c 5n   [Rytter ’06] 1.048n [Crochemore et al. ’08] 1.05n 4n 3.48n [Puglisi et al. ’08] 1.00n 3n 3.44n   [Rytter ’07] 0.95n 0.927n [Franek et al. ’03] [Franek & Yang ’06] 2n I talk about a known result about rho(n). In 1999, Kolpakov & Kucherov showed the number of runs in strings is linear, however only existence of a linear constant was given, they could not provide any bound on the constant c. Although they were not able to give bounds for the constant factor, there have been several works to this end. Rytter showed rho(n) is lower than < 5n, and later improved to 3.48n by Puglisi, Simpson, & Smyth, and again by Rytter to 3.44n. And Crochemore & Ilie showed rho(n) is lower than 1.6n and hinted how to lower it, may be to as low as 1.048n by exhaustive calculation using computer. 1.048n is the current best upper bound. On the contrary, few result exists for lower bounds. In 2003, Franek, Simpson, & Smyth presented a beautiful recursive construction of a sequence of binary strings. And they showed rho(n) is greater than ( 3/1+sqrt(5)= ) 0.927. (a):In 2006,Franek and Qian Yang formally proved a family of true asymptotic lower bounds arbitrarily close to ( 3/1+sqrt(5)= ) 0.927. (b):On the other hand, an asymptotic lower bound on rho(n) is presented by Franek and Quen Yang. It is shown (that) for any epsilon > 0, there exists an integer N>0 such that for any n>N, rho(n) >= (α-ε)n, where α= ( 3/1+sqrt(5)= )0.9270. Still a gap between the upper and lower bound, and is not fully understood. It is believed (that) rho(n) is lower than n, for binary alphabets, as supported by computations for string lengths up to 31. we also computed for all strings length up to 45, it is not exist counter example. A stronger conjecture is proposed by Franek, they believed rho(n) equal to 0.927. 1.6n [Crochemore & Ilie ’08] 0.90n n

Our result: New lower bound We discovered a run-rich string τ aababaababbabaababaababbabaababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaabbabaababbabaababaababbabaababaabbabaababaababbabaababaababbabaabab τ = run(τ) = 1455, | τ | = 1558 6: Our result I talk about our result. We discovered a run-rich string τ. In particular string tau is this string. tau is the string of length 1558, and contains 1455 runs. By definition of rho(n), we have rho(n) is greater than 0.933n. By the way, known best lower bound is 0.927 by Franec et al. in 2003. So, 0.933 is new lower bound. Known best lower bound [Franek et al. ’03] New lower bound Prague Stringology Conference 2008

How to generate run-rich string Let τ’ = τ[1:1557] (delete the last character), the number of runs not decrease drastically. run(τ’) = 1453, | τ’ |= 1557 In order to generate run-rich string, We only have to do is to append single character to run-rich string. 8: How to generate run-rich string By the way, how do get these strings? We will explain how to get this run-rich string. Prague Stringology Conference 2008

aaaa aaa aaab aaba aa aab aabb abaa aba abab abba a ab abb abbb The search first starts with the single string “a” in the buffer. At each round, two new strings are created from each string in the buffer by appending “a” or “b” to the string. The new strings are then sorted with respect to the number of runs. Only those that fit in the buffer size are retained for the next round. aaaa aaab aaba aabb abaa abab abba abbb buffer size:10 aaaaa 1 aaaab 1 aaaba 1 aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 abaaa 1 abaab 1 ababa 1 ababb 2 abbaa 2 abbab 1 abbba 1 abbbb 1 aaa aab aba abb aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 ababb 2 abbaa 2 aaaaa 1 aaaab 1 aaaba 1 aa ab Select Top10 a The search first starts with the single string “a” in the buffer. In this example, we set the buffer size to 10. At each round, two new strings are created from each string in the buffer by appending “a” or “b” to the string. The new strings are then sorted with respect to the number of runs. Only those that fit in the buffer size are retained for the next round. In other words, we select the top 10 strings with respect to the number of runs. Prague Stringology Conference 2008

The string in the buffer become run-rich. aaabb 2 aabaa 2 aabab 2 aabba 2 aabbb 2 ababb 2 abbaa 2 aaaaa 1 aaaab 1 aaaba 1 aaabba aaabbb aabaaa aabaab aababa aababb aabbaa aabbab aabbba aabbbb ababba ababbb abbaaa abbaab aaaaaa aaaaab aaaaba aaaabb aaabaa aaabab aabaab 3 aababb 3 aabbaa 3 aaabba 2 aaabbb 2 aabaaa 2 aababa 2 aabbab 2 aabbba 2 aabbbb 2 aabaaba aabaabb aababba aababbb aabbaaa aabbaab aaabbaa aaabbab aaabbba aaabbbb aabaaaa aabaaab aababaa aababab aabbaba aabbabb aabbbaa aabbbab aabbbba aabbbbb aabaabb 4 aabbabb 4 aabaaba 3 aababba 3 aababbb 3 aabbaaa 3 aabbaab 3 aaabbaa 3 aababaa 3 aabbaba 3 Select Top10 Select Top10 The string in the buffer become run-rich. Prague Stringology Conference 2008

Improving lower bound of ρ(n) (1/2) We discovered a run-rich string τ such that run(τ) = 1455, | τ |= 1558 run(τ2) = 2915, | τ2 |= 2・1558 = 3116 run(τ2) > 2run(τ) Wouldn't be possible to push lower bound up using run-rich string tau? For example, what about doing concatenate same strings tau. If concatenate same two string tau, we get the string whose length is 3116, And the number of runs is 2915. Therefore we can get new lower bound 0.935. Improved!! Prague Stringology Conference 2008

Improving lower bound of ρ(n) (2/2) Using run-rich string τ, can we push lower bounds higher up more? k run(τk) |τk| ( ρ(n)≧ ) run(τk)/|τk| 1 1455 1558 0.933889 2 2915 3116 0.935494 3 4374 4674 0.935815 4 5833 6232 0.935976 5 7292 7790 0.936072 6 8751 9348 0.936136 7 10210 10906 0.936182 8 11669 12464 0.936216 : 8: Improvement lower bound of rho(n) The more number of repetitions, the more lower limit will go up like this. So, In general, How many runs in string w to the k-th power.? We will consider asymptotic behavior of the number of runs. Next, we give a formula that calculate number of runs in wk. Prague Stringology Conference 2008

Let w be a string of length n. For any k≧2, run(wk) = Ak - B Number of runs in wk Theorem Let w be a string of length n. For any k≧2, run(wk) = Ak - B where A = run(w3) - run(w2) and B = 2run(w3) - 3run(w2) 9: Number of runs is w^k We show following theorem which gives a formula for r(w^k). Let w be a string of length n. For any k greater than 2, r(w^k) can be represent using k, r(w^2) and r(w^3). Let us show the proof of theorem. Prague Stringology Conference 2008

Proof of the theorem (1/4) If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases: case (a): increase A new run may be newly created at the border between two strings. abba abba 10: Proof of theorem (1/4) Assume w repeats twice, the number of runs is changed as follows: At first, number of runs can be increase. A new run may be newly created at the border between two strings. for example, we consider the string abba. there is one run in this string. If concatenate same two string, newly created runs aa and abbaabba at the border of two strings. therefore, number of runs can be increased in total. abbaabba Prague Stringology Conference 2008

Proof of the theorem (2/4) If two strings wk and w are concatenated, the number of runs in wk+1 is changed in two cases: case (b):decrease A suffix run in wk and a prefix run in w may be merged into one run in wk+1. aabaaaabaa aabaaaabaa 11: Proof of theorem (2/4) On the other hand, number of runs can be decrease. A suffix run in w^k and a prefix run in w may be merged into one run in w^k+1. for example, we think about the string "aabaaaabaa". there are four runs in this string. if concatenate same two strings, suffix run "aa" and prefix run "aa" are merged into one run "aaaa". And suffix run and prefix run “aabaa” to the second are merged into one run “aabaa” to the fourth . therefore, number of runs can be decrease in total. aabaaaabaaaabaaaabaa Prague Stringology Conference 2008

Proof of the theorem (3/4) By periodicity lemma, there is no runs in wk such that length is longer than 2|w| except the whole string wk. For any k≧3, run(wk) - run(wk-1) = c (constant). w 12: Proof of theorem (3/4) while string wk has period the length of w, there is no runs such that the length larger than twice of the length of w by periodicity lemma. Therefore, for any k larger than 3, if concatenate w^k and w, the number of runs will be increase some constant value. Prague Stringology Conference 2008

Proof of the theorem (4/4) Let w be a string of length n. For any k≧2, run(wk) = Ak - B where A = run(w3) - run(w2) and B = 2run(w3) - 3run(w2) proof For any k≧3, run(wk) - run(wk-1) is a constant. 13: Proof of theorem (4/4) we give the proof of theorem. Based on the formula, run(w^k) can be represented by using k, run(w^2) and run(w^3). Therefore, we can get the proof of the theorem. We can see that if k was increased infinitely, It can be seen that the asymptotic behavior of rho(n) is depend on r(w^3)-r(w^2). Formally saying, it is shown as follows. Prague Stringology Conference 2008

Asymptotic behavior of ρ(n) Theorem For any string w and any ε>0, there exists a positive integer N such that for any n≧N, proof 14: Asymptotic behavior of rho(n) For any string w and positive number ε, there exists a positive integer large N such that for any small n larger than large N, we have this formula. In other words, All we have to do is sort the strings in order of run(w^3)-run(w^2) in consideration of asymptotic behavior. Based on the theorem, we edit the heuristics of our algorithm. Prague Stringology Conference 2008

Discovered run-rich strings See our web site [http://www.shino.ecei.tohoku.ac.jp/runs] Length of τ r(τ) r(τ2) r(τ3) ρ (n) ≧ 125 110 227 343 0.928 1558 1455 2915 4374 0.93645 60064 56714 113448 170181 0.944542 105405 99541 199103 298664 0.944557 184973 174697 349417 524136 0.944565 We found some run-rich strings by using heuristic search. The strings in the buffer are sorted with respect to r(w3)-r(w2), instead of r(w) for improving asymptotic behavior. 15: Discovered run-rich strings In the our algorithms for finding run-rich strings, the new strings were sorted in order of run(w^3)-run(w^2) in consideration of asymptotic behavior. he table show discovered run-rich string using heuristic search algorithms. we get the best lower bound 0.944565 using run-rich string of length 184973. this is current best lower bound. current best lower bound Prague Stringology Conference 2008

Discussion What is the class of run-rich strings? Sturmian words are not run-rich. [Rytter2008] (for any Sturmian word w) Any recursive construction of a sequence of run-rich strings? We believe that compression has a clue to understanding. run-rich string τ (|τ|=184973) can be represented by only 24 LZ factors. 16: Discussion By the way, what is the class of run-rich strings? I found run-rich string using heuristic search, honesty saying I don't know the property of this string.. Sturmian word is known as the string class which has many repetition. Fibonacci word belong to Sturmian word. However Rytter show that the upper bound of the number of runs in Sturmian word is 0.8. So, Sturmian words is not run-rich string class. On the other hand, I found that run-rich string can be compressed efficiently. Prague Stringology Conference 2008

LZ-factorization of τ ( |τ| = 184973 ) τ = aababaababbabaababaababbabaababab… (0,1) (1,3) (1,4) (2,8) (5,13) : LZ(τ)= 17: LZ-factorization Lempel-Ziv is one of the most popular compression scheme. When the run-rich string is compressed using Lempel-ziv, it is expressible efficiently. I think there exist compression gives a clue to understanding the property of run-rich strings. a, (0,1) / b / (1, 3) / (1, 4) / (2, 8) / (5, 13) (12,19) / (26,31) / (49,38) / (50,63) / (89,93) / (113,162) / (57,317) / (249,693) / (275,984) / (879,2120) / (942,3041) / (2811,6521) / (2999,9374) / (8764,20072) / (9332,28878) / (27096,45341) / (38210,67195) Prague Stringology Conference 2008

Conclusion We Introduced new approach for analyzing lower bounds using heuristic search. We Improved the lower bound of the number of runs in a string. new lower bound is 0.944565. 18: Conclusion The conclusions are obtained as follows. We introduced new approach for analyzing lower bounds using heuristic search. We presented a new lower bound 0.944565 for the number of runs in a string. This is current best lower bounds. In other words, we show (that) Franek's conjecture was false. The proof was very simple, once after we verified (that) the runs is the string tau is 174719, and noticed some trivial properties of the strings. Prague Stringology Conference 2008

Further research Improving heuristic algorithm Speed up for counting runs in strings Find good heuristics Guess run-rich strings in compressed form (LZ factors) Analyzing the class of run-rich strings Any recursive construction of a sequence of run-rich strings? Relation with compression Algorithms for finding all runs in strings process compressed string without decompression. 19: Further research Further research is shown as follows. In order to improve heuristic algorithm, We will speed up for counting runs in strings, and find good heuristics. and we want to guess run-rich strings in compressed form. I want to develop algorithm for counting runs in a string. Especially, We are interested in fully-compressed algorithm, which can process without string decompression. Prague Stringology Conference 2008

Max Number of Runs in a String c cn [Kolpakov & Kucherov ’99] 5n c 5n   [Rytter ’06] 1.048n [Crochemore et al. ’08] 1.05n 4n 3.48n [Puglisi et al. ’08] 1.00n 3n 3.44n   [Rytter ’07] 0.944565n [Matsubara et al. ’08] 0.95n 2n 20: Max Number of Runs in a string Thank you for your attention! 1.6n [Crochemore & Ilie ’08] 0.927n [Franek et al. ’03] 0.90n n thank you for your attention.

Appendix Prague Stringology Conference 2008

Conjecture: ρ(n) < n Prague Stringology Conference 2008