Download presentation
Presentation is loading. Please wait.
1
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University
2
Classical Pattern Matching Input: - Pattern P = p 1 p 2 …p m - Text T = t 1 t 2 t 3... t n over alphabet Σ. m is the PATTERN size. n is the TEXT size. Output: locations of T where P appears.
3
Pattern Matching (eg.) Input: P=agca = {a,g,c,t} T=aaagcattagctagcagcat
4
Pattern Matching (eg.) Input: P=agca = {a,g,c,t} Output: 1 2 3 4 5 6 … 13... 16 3, 13, 16,… T=aaagcattagctagcagcat
5
“Dynamic” Pattern Matching A. Static Text and Dynamic Pattern. B. Dynamic Text and Dynamic Pattern. C. Dynamic Text and Static Pattern.
6
“Dynamic” Pattern Matching A. Static Text and Dynamic Pattern. a.k.a. - the indexing problem Solution: Preprocess text and answer pattern queries Preprocessing Data Structure: Suffix trees, [Wei73,McC75,Ukk95,Far97] Time: O(n) prepro. O(m) query time
7
“Dynamic” Pattern Matching A. Static Text and Dynamic Pattern. B. Dynamic Text and Dynamic Pattern. Time: O(n) preprocessing O(m) query time a.k.a. - the dynamic indexing problem Solution: sophisticated data structures [SV96,ABR00] Time: query - O(m + log 2 n) change - O(log 2 n)
8
“Dynamic” Pattern Matching A. Static Text and Dynamic Pattern. B. Dynamic Text and Dynamic Pattern. Time: O(n) preprocessing O(m) query time C. Dynamic Text and Static Pattern? Time: query - O(m + log 2 n) change - O(log 2 n)
9
Dynamic Text and Static Pattern Matching Pattern is non-changing Text changes over time Goal: report new occurrences of the pattern without performing a new search.
10
Motivation a 14 a 4 b 2 c 3 d 5 c 8 a 6 FAX 1.Intrusion detection systems 2. Info alerts 3. Two-dimensional run-length compressed matching problem, [ALS03]
11
Problem Definition Input: T and P over Σ ={1, …, m}. Output: 1. at start: all occurrences of P in T. 2. after change operation: a. report all new occurrences of P in T. b. discard all old occurrences of P in T. Change Operation: change one character in the text, e.g. location 5 from a to b.
12
Example Input: P=agagagc = (ag) 3 c = {a,g,c,t} T = g a g a g c t a g c g a g c a t
13
Example Input: P=agagagc = (ag) 3 c = {a,g,c,t} T = g a g a g c t a g a g a g c a t 10
14
Example Input: P=agagagc = (ag) 3 c = {a,g,c,t} T = g a g a g c t a g a g a g c a t 108 Output: {8}
15
Results O(log log m) time per replacement. After O(n log log m + ) preprocessing time,
16
“Dynamic” Pattern Matching A. Static Text and Dynamic Pattern. B. Dynamic Text and Dynamic Pattern. Time: O(n) preprocessing O(m) query time C. Dynamic Text and Static Pattern. Time: query - O(m + log 2 n) change - O(log 2 n) Time: change and announce O(log log m)
17
Static Stage To initially find all occurrences of P in T, use KMP [Knuth-Morris-Pratt ‘77]. All pattern occurrences in a text of length 2m can be stored in O(1) space.
18
Succinct Output Assumption: the text is of size 2m. (Break the text T into overlapping strings of length 2m-1. ) T 1 m 2m 3m 4m P
19
Succinct Output (cont.) P is periodic: A string p is periodic if it matches itself before position |P|/2. e.g. p = abcabcabca abcabcabca Store the output as a ‘chain’ of pattern occurrences. P is non-periodic: By definition, no more than two occurrences.
20
On-line Algorithm Following each replacement: Delete old matches that are no longer pattern occurrences. Find new matches.
21
Delete Old Matches Deleting is trivial since we store the matches in constant space: P is periodic: Truncate the chain of pattern occurrences. P is non-periodic: Discard all matches that are within distance -m of the replacement.
22
Find New Matches Challenge: How can we locate occurrences of P, following each replacement, without actually searching for P?
23
Main Idea - Text Covers We ‘cover’ the text with substrings of the pattern, i.e. store the text in terms of P. Pattern Text = g a g a g c t a g c g a g c a t = a g a g a g c g a g a g c [ 2,7] 1 2 3 4 5 6 7 a g c [5,7] g a g c a [4,7][1,1]Cover:
24
Text Cover (cont.) The text cover must satisfy two properties: Substring Property: each element of the cover is a substring of P, or a character not included in P. Maximality Property: no two adjacent elements can concatenate to form a substring of P.
25
Text Cover (cont.) How does a replacement in the text affect the text cover? Initially, in the static stage, we construct a text cover for T. We ensure that the cover satisfies both the substring and maximality property.
26
Text Cover following replacement Pattern = a g a g a g c Text = g a g a g c t a g c g a g c a t g a g a g c,a g c,g a g c, a Cover: (2,7) - (5,7) (4,7) (1,1) - 1 2 3 4 5 6 7 a (2,7) - (5, 6)(1,1) (4,7) (1,1) - (1,3) (1,7)
27
Updating the Text Cover At most 5 pieces can violate the maximality property.
28
Substring Concatenation Query Query: Given two substrings of P, P[i,j] and P[k,l]. Is their concatenation also a substring of P? Query time: O(log log m). Preprocessing time: (also uses - [BG00]) Hence, in O(log log m) we can update the cover satisfying both properties.
29
Find New Matches Given: a text cover which satisfies both the substring and maximality properties. Find: all new locations of the pattern in the text.
30
Key Observations A new match must begin within distance -m of the change. A new match can include at most one entire piece of the cover. It can span at most three pieces of the cover.
31
Furthermore A new match can begin in one of at most three pieces of the cover: –the piece with the change –the previous piece –the one previous to that P T
32
Simplified Problem Search starts within piece of cover. Simple O(m) time algorithm: –Check each location in X for a pattern start. –Use suffix trees and LCA queries to compare substrings in constant time. P T X
33
Improved Algorithm Really, we only have to check each suffix of X that is a pattern prefix. e.g. X = a g a g a The KMP automaton can give the necessary information. However, the time is still O(m) !
34
Improved Algorithm We can group the prefixes of P by their periods. Each group of prefixes can be checked in constant time! There are at most O(log m) groups.
35
Groups (eg.) Pattern = a g a g a g c 1 2 3 4 5 6 7 X = a g a g a There are three suffixes of X that are also pattern prefixes: { agaga, aga } { a } Prefixes with the same period fall into a single group.
36
Checking a group in Constant Time Pattern = a g a g a g c 1 2 3 4 5 6 7 X = a g a g a a g a g a a g t... a g a g a g a g a g c Idea: Match the period ‘ag’ as far as possible. As soon as (ag)* doesn’t match, check for a ‘c.’ g c...
37
Groups A string cannot have more than O(log m) border groups. Hence, the time of the algorithm is O(log m). [Intuition: each new group has a new period which has to be at least double the size of the old period. e.g. aagaagaa]
38
Even Better... We check only a constant number of groups. Choosing these O(1) groups takes O(log log m) time. Hence, our algorithm takes O(log log m) time per replacement.
39
Open Problems Allowing insertions and deletions to the text. Searching for a set of multiple static patterns.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.