Download presentation
Presentation is loading. Please wait.
Published byJane Henderson Modified over 9 years ago
1
Approximate schemas Michel de Rougemont, LRI, University Paris II Joint work with E. Fischer, Technion, F. Magniez, LRI
2
1.Distance between words (structures) Edit distance with moves 2.Distance between a word (structure) and a class of words (structures) 3.Distance between two languages (classes) 4. Applications: regular languages, DTDs Distances between languages
3
1.Tester for equality, constant time 2.Tester for w in L, constant time 3.Tester for approximate equivalence of regular languages, polynomial Equivalence tester Results
4
1.Satisfiability : Tree |= F 2.Approximate satisfiability Tree |= F 3.Approximate equivalence Image on a class K of trees 1. Approximate Satisfiability and Equivalence G
5
Let F be a property on a class K of structures U An ε -tester for F is a probabilistic algorithm A such that: If U |= F, A accepts If U is ε far from F, A rejects with high probability Time(A) independent of n. (Goldreich, Golwasser, Ron 1996, Rubinfeld, Sudan 1994) Tester usually implies a linear time corrector. Testers on a class K
6
History of Testers Self-testers and correctors for Linear Algebra,Blum & Kanan 1989 Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Testers for graph properties : k-colorability, Goldreich and al. 1996 graph properties have testers, Alon and al. 1999 Regular languages have testers, Alon and al. 2000s Testers for Regular tree languages, Mdr and Magniez, ICALP 2004
7
1.Classical Edit Distance: Insertions, Deletions, Modifications 2.Edit Distance with moves 0111000011110011001 0111011110000011001 3. Edit Distance with Moves generalizes to Trees 2. Equality tester
8
Statistics on words k k K t k-t Block statistics: b.stat Uniform statistics: u.stat Block Uniform statistics: bu.stat
9
Block statistics W=001010101110…… length n, subword of length k, n/k blocks For k=2, n/k=6
10
Uniform statistics W=001010101110 length n, subword of length k, n-k+1 blocks
11
Statistics and distance W=001010101110 (n=12, k=2) W1 =001101001110 W2 =110100001110 W3 =110100001111 dist( W,W’)= 3 dist( W,W’) /12=0.25 W’=110100001111
12
Goal: d 1 approximates the distance Let ε =1/k : For n>n 0 dist – ε.n < d 1 < dist + ε.n Practical application: ε=10 -2 hence k=100, stat dimension 2 100 Words of length n=10 9, d 1 is approximated by N samples and a good approximation after N=O(1/ε 3 ) trials. Remarks: 1.Distance with Moves. W =000….0001111…111 W’=1111…111000….000 2.Robustness to noise If W,W’ are noisy inputs (but ε-close), the method still works. 3.Random words are close with the moves, far without.
13
Classical complexity Edit distances: 1.P problem on words without the moves. Approximation? Sublinear algorithm? 2.NP-complete problem on words with the moves. O(1)-approximable 3.P problem on ordered trees without the moves 4.NP-complete problem on unordered trees and trees with the moves.
14
Basic tool: Chernoff bound Random variables: Markov bound Chebyshev bound Chernoff bound: sum of independent variables X i, whose average is μ Hoeffding bound k Prob[X=k] a.μ
15
Tester for equality of strings Edit distance with moves. NP-complete problem, but O(1)- approximable. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-ustat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Theorem 2. Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester: If |Y(w)-Y(w’)| <ε. accept, else reject.
16
2a. Regular words Definition: L is a regular language and A an automaton for L, Test w in L. Admissible Z= A word W is Z-feasible if there are two states init accept
17
Tester for regular words For every admissible path Z: else REJECT. Theorem: Tester(W,A, ε ) is an ε -tester for L(A). Tester. Input : W,A, ε
18
Proof schema of the Tester Theorem: Regular words are testable. Robustness lemma: If W is ε-far from L, then for every admissible path Z, there exists such that the number of Z-infeasible subwords Splitting lemma: if W is far from L there are many disjoint infeasible subwords. Amplifying lemma: If there are many infeasible words, there are many short ones.
19
Merging trees Merging lemma: Let Z be an admissible path, and let F be a Z- feasible cut of size h’. Then C CC C C C Take each word and split it along its connected components, removing single letters. Rearrange all the words of the same component in its Z-order. Add gluing words to obtain W’ in L:
20
Splitting Splitting lemma: If Z is an admissible path, W a word s.t. dist(W,L) > h, then W has Proof by contraposition:
21
2b. Regular trees a e b cd a e b c a e b c d f e Deletion Edge Insertion Node and Label Tree Edit distance with moves: a e b cd a e b cd 1 move Distance Problem is NP-complete, non-approximable.
22
Binary trees : Distance with moves allows permutations Tree-Edit-Distance on binary trees Distance(T1,T2) =4 m-Distance (T1,T2) =2
23
(q0, q0) q1 (q0,q1) q1 Tree automata q0 q1 q0 q1 q2 (q1,q1) q2 (q1,q0) q2 (q2,-) q2 (-,q2) q2
24
Fact. If then the number of infeasible subtrees of constant size is O(n). Infeasible subtrees
25
Tester for regular Trees Theorem: Tester(T,A, ε ) is an ε -tester for L(A). Tester. Input : T,A,
26
Proof schema of the Tester Theorem: Regular trees are testable. Robustness lemma: If T is ε-far from L, then for every admissible path Z, there exists such that the number of Z-infeasible i-subtrees Splitting lemma: if T is far from L there are many disjoint infeasible subtrees. Amplifying lemma: If there are many infeasible subtrees, there are many small ones.
27
Splitting and Merging C CC C C C Splitting and Merging on words: Splitting and Merging on trees:
28
Splitting and Merging trees C D D C C E Connected Components Corrected tree
29
Correction in practice: right branch tree http://www.lri.fr/~mdr/xml/ 2 moves, dist=2
30
1.Inclusion 2.Equivalence Equivalence tester 3. Equivalent testing of Regular Languages
31
Statistics on words k k Block statistics: b.stat Uniform statistics: u.stat Construction of tester for regular languages exponential in the size of the automaton We need a construction polynomial in the size of the automaton. For equivalence testing, we use b.stat
32
Automata for Regular languages Regular languages and automata Non-deterministic automaton A, let A k be the automaton accepting words of length k, reading v in Σ k Definition: v in Σ k is an A k loop if there are u,w such that Word u.v.w is accepted by A k State after u identical to the state after u.v A finite set of loops is A k -compatible if all loops can occur in an accepting word. Definition: Convex-hull:
33
Automata for Regular languages Basic property: Proposition: Caratheodory’s theorem: in dimension d, convex hull of N points can be decomposed into in the union of convex hulls of d+1 points Large loops can be decomposed. Small loops (less than m=|A|) suffice.
34
Approximate Parikh mapping Lemma: Find w’’ ε close to w Remove v, i.e. at most m block letters. Lemma: For every X in H, for every n, there exists w in L s. t.
35
Approximate Parikh mapping Lemma: For every X in H, w in L s. t. X. b-stat(w) w H is a fair representation of L
36
Construction of H Enumerate all loops: Number of b-stat is less : Some loops have same b-stat: ABBA and BBAA #partitions of a word of length m with « big blocks » Construct H by matrix iteration:
37
Construction of H Lemma: compute a set I of at most | Σ| k +1 compatible loops, Compute P t for t=1,…,m In the diagonals, find the b-stat of small loops, at most Consider subsets of at most | Σ| k +1 elements which are compatible.
38
Example Automaton A: Blocks, k=2, m=4, | Σ |=4, | Σ| k +1=17: Loops: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)} 12 34 a b b c a c d d aa ca H A ac cc bb dd
39
Equivalence tester Tester for w in L (regular): Compute b-stat(w) and H. Decide if dist(w,L)>ε.n Time is polynomial in m=|L|. Previous tester was exponential in m. Tester of 1.Compute H A and H B 2.Reject if H A and H B are different. Time polynomial in m=|A,B|
40
Generalizations Buchi Automata. Distance on infinite words: Two words are ε-close if A word is ε-close to a language L if there exists w’ in L s. t. W and w’ are ε-close. Statistics: set of accumulation points of H: compatible loops of connected components of accepting states Tester for Buchi Automata: Compute H A and H B Reject if H A and H B are different. Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.
41
Conclusion 1.Testers and Correctors 2.Constant algorithm for Edit Distance with moves 3a.Testers and Correctors for regular words 3b.Tester for regular trees and corrector for regular trees 4.Equivalence tester for automata Polynomial time algorithm Generalization to Buchi automata and Context-Free Tree regular languages
42
Let F be a property on a class K of structures U F is Equality Soundness: close structures have close statistics Robustness: far structures have far statistics Soundness and Robustness
43
Robustness of b.stat Robustness of b-stat:
44
Soundness of u.stat Soundness of u-stat: Simple edit: Move w=A.B.C.D, w’=A.C.B.D: Hence, for ε 2.n operations, Problem: robustness of u.stat ? Harder! You need an auxiliary distribution and two key lemmas.
45
Block Uniform Statistics Lemma 1:
46
Uniform Statistics A B Lemma 2:
47
Robustness of the uniform Statistics Robustness of u-stat: By Lemma 1: By Lemma 3:
48
Tester for the distance with moves NP-complete problem, but O(1)-approximable. Approximate u.stat: Sample N subwords of length k, compute Y: Y is a good approximation of u.stat (Chernoff), Uniform statistics is a good approximation of the distance by soundness and robustness. Tester: If Y<ε.n accept, else reject.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.