Download presentation
Presentation is loading. Please wait.
Published byBeverly Owens Modified over 9 years ago
1
Ravello, 19-20-21/09C.E. On some researches... Chiara Epifanio
2
Ravello, 19-20-21/09C.E. Outline Compact representation of local automata The multidimensional Critical Factorizazion Theorem
3
Ravello, 19-20-21/09C.E. The multidimensional Critical Factorization Theorem Chiara Epifanio, Filippo Mignosi
4
Ravello, 19-20-21/09C.E. A word is a sequence of characters over an alphabet A, NZ w A {1,2,…n}, A N, A Z N w=a 1 …a n is periodic if p N s. t. w(x+p)= w(x) x,1 x n-p W p is a period of w
5
Ravello, 19-20-21/09C.E. a word may have more than a period (e. g. abaababaabaababaaba, that has periods 8 and 13) the smallest period of w is called “the” period of w.
6
Ravello, 19-20-21/09C.E. A factor v=w j …w j+n-1 of length n of w is a repetition of order if there exists a natural number p, 0 p n such that w i =w i+p for i = j,…,j+n-1-p and such that n/p . The number p is called a period of the repetition. The smallest period of the repetition is called the period of the repetition. Ex: abaaba Repetition of period 6 and order 1 period 5 and order 6/5 period 3 and order 2
7
Ravello, 19-20-21/09C.E. Word w has a central repetition of order in position i if there exists a factor v centered in i that is a repetition of order . In this case we denote c (w,i) the smallest period among all the central repetitions of order in position i and we call it the central local period of order in i. i We denote by P (w) the maximum of the central local periods of order in w. A position i is critical if c (w,i)=P (w). v
8
Ravello, 19-20-21/09C.E. The Critical Factorization Theorem Let w be a word having length |w| 2. In every sequence of l max {1, p(w)-1} consecutive positions there is a critical one and P (w)=p(w), =2.
9
Ravello, 19-20-21/09C.E. The Critical factorization Theorem in particular states that for =2 there exists at least one point such that the central local period detected at this point coincides with the (global) period of the word, i.e., there exists an integer j, 1 j |w|, such that c (w,j) =p(w), =2. We have given a new proof for =4.
10
Ravello, 19-20-21/09C.E. uv vw vwu Lemma 1 Let u, v, w be words such that uv and vw have period p and |v| p. Then the word uvw has period p. (cf. Lemma 8.1.2,Lothaire 2 chapter 8)
11
Ravello, 19-20-21/09C.E. w v v w Lemma 2 Suppose that w has period q and that there exists a factor v of w with |v| q that has period r, when r divides q. Then w has period r. (cf. Lemma 8.1.3,Lothaire 2 chapter 8)
12
Ravello, 19-20-21/09C.E. Fine and Wilf Theorem Let w be a word having periods p and q, with q p. If |w| p + q - gcd(p,q), then w has also period gcd(p,q).
13
Ravello, 19-20-21/09C.E. Multidimensional case (Multidimensional periodicity was introduced by Amir and Benson for the design of Pattern Matching algorithms (1991). Since then, lots of people worked on it giving slightly different definitions).
14
Ravello, 19-20-21/09C.E. If u is a factor of w then v is a periodicity vector for u if w((x,y)+v) = w(x,y) (x,y) Dom(u) t.c. ((x,y)+ v) Dom(u) u v is a periodicity vector for w if w((x,y)+v) = w(x,y) (x,y)
15
Ravello, 19-20-21/09C.E. A factor u of w is lattice-periodic with respect to v 1 and v 2 if v is a periodicity vector for u. L= =
16
Ravello, 19-20-21/09C.E. Given a subgroup H of Z d, a transversal T H of H is a subset of Z d such that for any element i Z d, there exists an unique element j T H such that i-j H. An n-cubic factor v is a repetition of order , if v is L periodic, L lattice; n is such that n/h L , where h L is the smallest integer such that every hypercube of side h L contains a transversal of L. The lattice L is called a period of the - repetition v.
17
Ravello, 19-20-21/09C.E. Word w has a central repetition of order in position j Z d if there exists a factor v of w centered in j that is a repetition of order . If w has at least a central repetition of order and period L in j, the set H={h L s.t. every hypercube of side h L contains a transversal of L} We denote c (w,j)=min(H). Let P (w) = limsup{c (w,j), j position in w}
18
Ravello, 19-20-21/09C.E. Lemma 3 Let v 1 and v 2 be two factors of same word w Z d that have both period a subgroup H. If sh(v 1 ) sh(v 2 ) contains a transversal of H then the factor v having shape sh(v)= sh(v 1 ) sh(v 2 ) has also period H. sh(v 1 ) sh(v 2 ) sh(v)
19
Ravello, 19-20-21/09C.E. Lemma 4 Let v 1 and v 2 be two factors of same word w Z d such that sh(v 2 ) sh(v 1 ). Suppose that v 1 has period H 1 and that v 2 has period H 2, with H 1 subgroup of H 2 and that sh(v 2 ) contains a transversal of H 1. Under these hypotheses v 1 has period H 2. sh(v 1 ) sh(v 2 )
20
Ravello, 19-20-21/09C.E. A generalization of the Fine & Wilf Theorem If w has two periodicity vectors v 1 and v 2 and w is “big enough” with respect to v 1 and v 2, then w is lattice-periodic with respect to v 1 and v 2.
21
Ravello, 19-20-21/09C.E. The multidimensional Critical Factorization Theorem Informally, the C.F.T. states that the maximal local repetition of order 2 is also a period of the whole word. But …. there is no total order among lattices!! Our solution is to order lattices by using the length h L of the side of the smallest hypercube that contains a transversal of L. We have further to prove that all the lattices with same maximal h L coincide over the word. To do this, for the moment, we loose the tightness of the local repetition order (4 instead of 2).
22
Ravello, 19-20-21/09C.E. Theorem Let w be a cubic bidimensional word, X be a cube included in the shape of w. Every cube T X, of side max(1,P 4 (X)-1) contains a position l such that c 4 (w,l)=P 4 (w). Let v be the factor of w having shape the intersection between sh(w) and the union X’ of the shapes of the 4-repetitions centered in position l X such that c 4 (w,l)=P 4 (X). Then v has period L, where L is a subgroup such that every cube of side P 4 (X) contains a transversal of L. sh(v)
23
Ravello, 19-20-21/09C.E. Proof of the theorem Lemma4Fine & Wilf generalizationLemma 3 Thesis
24
Ravello, 19-20-21/09C.E. Importance of the extension to the d-dimensional case (d 2). Difficulties on such an extension (new definitions, extension of already known results). It is known that for d=1 the tight value is =2. It remains an open problem to find the tight value of for any dimension. Applications. Conclusions and open problems
25
Ravello, 19-20-21/09C.E. Compact representation of local automata M. Crochemore, C. Epifanio, R. Grossi, F. Mignosi
26
Ravello, 19-20-21/09C.E. Compacting is a standard technique used for reducing the size of data structures such as factor automata, DAWG and suffix trees and consists on replacing paths in automata with single edges. In 2000 Crochemore, Mignosi, Restivo and Salemi gave an algorithm for “self-compressing” trie of antifactorial binary sets of words. The aim of that algorithm was to represent in a compact way antidictionaries to be sent to the decoder of a static compression scheme. What we have worked on is an improvement scheme of that algorithm that works for sets of words over any alphabet.
27
Ravello, 19-20-21/09C.E. The suffix trie of a word Tr(w) is a trie where the set of leaves is the set of suffixes of w that does not appear previously as a factor in w. Ex.:
28
Ravello, 19-20-21/09C.E. The suffix tree T(w) of a word w is a compressed suffix trie, where only leaves and forks are kept. Each edge is labelled with a substring of w. In this way the number of nodes and leaves of T(w) is smaller than 2|w|. But if the labels of arcs are stored explicitely, the implementation can have quadratic size. The simple solution is to represent labels by pairs of integers (position, position) or (position, length) and to keep the text aside. Ex.:
29
Ravello, 19-20-21/09C.E. There are classical on-line linear time implementations. All of them use suffix link function s, that is defined over all the nodes of the suffix trie and suffix tree by s(root)=root s(v)=v’, where v =a v’, v being the labelling of the path form the root to v and a being the first letter of v. Ex.:
30
Ravello, 19-20-21/09C.E. Our new approach is basically the same one of the suffix tree, but we compact a bit less, i.e. we keep all nodes of the suffix tree and some more nodes of the trie, that are all the nodes v of the trie such that s(v) is a node of the suffix tree. In this case for any arc of the form (v,v’) with label a in the trie we have an arc (v,x) with same label in our compacted trie T 2 (w), where x is v’, if v’ T 2 (w); the first node in T 2 (w) that is a descendant of v’ in the original trie, if v’ T 2 (w). In this second case, we consider that (v,x) represents the whole path from v to x in the suffix trie and we add a sign + to node x in order to maintain this information.
31
Ravello, 19-20-21/09C.E. To complete the definition of T 2 (w) we keep the suffix link function over these nodes. Notice that, by definition, for any node v of T 2 (w), s(v) is always a node of the suffix tree T(w) and hence it also belongs to T 2 (w). This new approach let us not to maintain the text aside.
32
Ravello, 19-20-21/09C.E. State of the art We have given compacting and decompacting algorithms; we have proved that the number of nodes in our compacted suffix tree is still linear; we have given an algorithm that can be used to check whether a pattern is present in a text, without “decompacting” the automaton; actually we are doing some experiments on the Calgary and Canterbury corpus.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.