Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences.

Slides:



Advertisements
Similar presentations
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Swing time portfolio BY. Activity 2: Experiment 1 1 Coin2 Coins3 Coins Trial Trial Trial Average Time Period of.
Estimating Running Time Algorithm arrayMax executes 3n  1 primitive operations in the worst case Define a Time taken by the fastest primitive operation.
Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Fast and Practical Algorithms for Computing Runs Gang Chen – McMaster, Ontario, CAN Simon J. Puglisi – RMIT, Melbourne, AUS Bill Smyth – McMaster, Ontario,
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 4: Control Structures I (Selection)
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Adrian Ilie COMP 14 Introduction to Programming Adrian Ilie June 30, 2005.
7.2 Measurement – Section 2 1 Objective A: To Convert between Celsius and Fahrenheit using the formulas There are two temperature scales: Fahrenheit, which.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
New Lower Bounds for the Maximum Number of Runs in a String Wataru Matsubara 1, Kazuhiko Kusano 1, Akira Ishino 1, Hideo Bannai 2, Ayumi Shinohara 1 1.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
1 CSA4050: Advanced Topics in NLP Spelling Models.
Computing Left-Right Maximal Generic Words Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda Kyushu University, Japan.
Polynomial Functions. A Polynomial is an expression that is either a real number, a variable, or a product of real numbers and variables with whole number.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
Kazunori Hirashima 1, Hideo Bannai 1, Wataru Matsubara 2, Kazuhiko Kusano 2, Akira Ishino 2, Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University,
Game theory & Linear Programming Steve Gu Mar 28, 2008.
1 Analysis of Algorithms CS 105 Introduction to Data Structures and Algorithms.
Sudoku and Orthogonality
An asymptotic lower bound for the maximal-number-of- runs function F. Franek & Q. Yang Dept. of Computing & Software McMaster University Hamilton, Ontario.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Computing longest common substring and all palindromes from compressed strings Wataru Matsubara 1, Shunsuke Inenaga 2, Akira Ishino 1, Ayumi Shinohara.
Relations coming from locations of the longest spotless segments in the 192 years long series of the every day sunspot observations The new way to forecast.
 APA  (American Psychological Association) is the most commonly used format for manuscripts in the Social Sciences.
POLYNOMIALS REVIEW Classifying Polynomials Adding & Subtracting Polynomials Multiplying Polynomials.
Fundamentals of Algorithms MCS - 2 Lecture # 8. Growth of Functions.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Claim, Evidence, and Reasoning
Communication and Computation on Arrays with Reconfigurable Optical Buses Yi Pan, Ph.D. IEEE Computer Society Distinguished Visitors Program Speaker Department.
Programming 1 DCT 1033 Control Structures I (Selection) if selection statement If..else double selection statement Switch multiple selection statement.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze,
Presented by: Aneeta Kolhe. Named Entity Recognition finds approximate matches in text. Important task for information extraction and integration, text.
Quiz 3 is due Friday September 18 th Lab 6 is going to be lab practical hursSept_10/exampleLabFinal/
ICT Introduction to Programming Chapter 4 – Control Structures I.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Everything is String. Closed Factorization Golnaz Badkobeh 1, Hideo Bannai 2, Keisuke Goto 2, Tomohiro I 2, Costas S. Iliopoulos 3, Shunsuke Inenaga 2,
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Linear-time computation of local periods Linear-time computation of local periods Gregory Kucherov INRIA/LORIA Nancy, France joint work with Roman Kolpakov.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Two Least Squares Applications Data Fitting Noise Suppression.
Chapter 9 Introduction to Arrays Fundamentals of Java.
Geometry 7-5 Areas of Regular Polygons. Review Areas.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
A Conjecture & a Theorem
Statistical Measures M M O D E E A D N I A R A N G E.
Algorithmic complexity: Speed of algorithms
Polynomials and Polynomial Functions
Geometric Mean Pythagorean Theorem Special Right Triangles
Strings: Tries, Suffix Trees
Stream Ciphers Day 18.
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
NanoBPM Status and Multibunch Mark Slater, Cambridge University
Faculty of Computer Science & Information System
Algorithmic complexity: Speed of algorithms
Try This… Measure (using your ruler), three segments 2 inches
Chapter 4: Control Structures I (Selection)
New Lower Bounds for the Maximum Number of Runs in a String
Algorithmic complexity: Speed of algorithms
Computational Geometry
Geometric Mean Pythagorean Theorem Special Right Triangles
Paths and Connectivity
Paths and Connectivity
Presentation transcript:

Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information Sciences Tohoku University, Japan 1

Background 2

Run (Maximal Repetition) Substring w which has period Non-extendable left nor right Count once with it’s minimal period 3 :run

The Number & The Sum of Exponents The number of runs and the sum of exponents (repetition counts) of runs are interesting issue Number: 6 Sum of exponents: 14.2

Maximum The maximum number of runs and the maximum value of sum of exponents of runs are still unknown 5 Number Sum of exponents ≦ cn Kolpakov and Kucherov, 1999 ≦ cn Kolpakov and Kucherov, 1999 ≦ 5n Rytter, 2006 ≦ 5n Rytter, 2006 ≦ 3.48n Puglisi et al., 2007 ≦ 3.44n Rytter, 2007 ≦ 1.048n Crochemore and Ilie, 2008 ≧ 0.927n Franek et al., 2003 ≧ 0.945n Matsubara et al., = n ? Conjecture = n ? Conjecture ≦ 2.9n Crochemore and Ilie, 2007 ≦ 2.9n Crochemore and Ilie, 2007 = 2n ? Conjecture = 2n ? Conjecture ≧ 1.854n Franek et al., 2003 ≧ 1.889n Matsubara et al., 2008 ≦ cn Kolpakov and Kucherov, 1999 ≦ 25n Rytter, 2006 ≦ 25n Rytter, 2006

Average The average number of runs is presented We show the average value of sum of exponents of runs 6 Number of runs Sum of exponents Puglisi & Simpson Australasian Journal of Combinatorics To appear (2008)  : alphabet size  (d) : Möbius function Our result

7

The average value of sum of exponents of runs in strings of length n is represented as follows 8  : alphabet size L(p) : number of Lyndon words of length p Number of runs Sum of exponents [Puglisi & Simpson, 2008]

Detail 9

Runs in all strings of length n 10 Complicated!

d(w,p)d(w,p)d(w,p)d(w,p) A string d(w,p) of length |w|-p is defined as follows w[i..j+p] is a run if and only if d(w,p)[i..j] is a 0-segment (maximal block of 0's) of length l ≧ p 11 w d(w,2) w>>2 w d(w,2)

Runs are classified according to its period 12 w d(w,1) d(w,2) d(w,3)

13 d(w,2) d(w,3) 0-segments are classified according to its length l=2 l=3 l=4

The number of 0-segments of length p in  n c(n,p)c(n,p)c(n,p)c(n,p) 14 Example  = 2, n = 5, p = 2

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p)

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 16 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up ( σ -1) 2 choices c(n,p)c(n,p)c(n,p)c(n,p) 17 (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

C(n,p)C(n,p)C(n,p)C(n,p) 0-segments of length l in d(w,p) correspond to runs of period p in w The length of the run is l+p and the exponents is (l+p)/p We denote by C(n,p) the sum of (l+p)/p for each 0- segments of length p or longer as follows 18 w d(w, p) p=2, l=3

C(n,p)C(n,p)C(n,p)C(n,p) 19 Example  =2, n=5, p=2

0-segments and runs An 0-segment of length l ≧ p in d(w,p) correspond to  p runs having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] 20 d(w,2) w 00000, and are not runs of period 2 but period 1

0-segments and runs An 0-segment of length l ≧ p in d(w,p) correspond to  p runs having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] 21 d(w,2) w 00000, and are not runs of period 2 but period 1 In the roots all strings of length p appear once

Counting a run once To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) 22 L(p) :number of Lyndon words of length p

Counting a run once 23 To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) L(p) :number of Lyndon words of length p

Counting a run once 24 To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) L(p) :number of Lyndon words of length p

To avoid counting a run more than once a run which has shorter period should be ignored A run has no shorter period ⇔ The root of a run is primitive The number of primitive strings of length p is pL(p) Counting a run once 25 L(p) :number of Lyndon words of length p

Average value of sum of exponents The sum of exponents of runs in  n and the average value of sum of exponents of runs in strings of length n are as follows 26

Limit of e(n) The average value e(n) grows almost linearly, as n increases 27

Limit of e(n) The limit of e(n)/n and the actual values are follows 28  e(n)/n  (d) :Möbius function

Summary 29

Summary The number of 0-segments of length p in  n The sum of (l+p)/p for each runs of period p or longer as follows The average value of sum of exponents of runs in strings of length n 30 Thank you for your attension

31

周期 32 NG

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 33 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 34 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 35 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 36 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 37 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 38 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 39 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 

The number of 0-segments of length p in  n Instead of 0-segments, pairs of strings ( ,  ), which separated by 0-segments of length p, are counted up c(n,p)c(n,p)c(n,p)c(n,p) 40 ( σ -1) 2 choices (  -1) 2 choices σ n-p-2 choices  n-p-2  choices (n - p+1) choices for position of 0-segments     ≠ 