Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova.

Similar presentations


Presentation on theme: "Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova."— Presentation transcript:

1 Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova

2 Definitions Edit distance between two strings: Minimum number of edit operations needed to transform one string into another Edit operation: Insertion, deletion, or substitution of one character Edit distance = Levenstein metric

3 Edit Distance Important in: Computational biology Text processing Computational bottlenecks: Widely used algorithm takes quadratic time No efficient algorithm known for nearest neighbor computation New approach for dealing with edit distance: Embedding into a normed space

4 Embedding Definition: A mapping f: Strings→l p d, such that for any pair of strings s and s': Edit(s, s') ≤ ||f(s) -f(s')|| p ≤ c·Edit(s, s') The factor c is called distortion Useful to embed edit distance into a normed space because: Efficient algorithms working on normed spaces are known (e.g. nearest neighbor computation) Can compute (approximately) edit distance in subquadratic time, if computing the mapping takes subquadratic time

5 Embedding edit distance into a normed space Essentially nothing known If allow moving a contiguous block of characters as a single edit operation: can embed new metric into l 1 with distortion O(log d·log * d) [CPSV’00] (d – length of strings to embed)

6 Result in this paper A lower bound of 3/2 on the distortion of embedding into l 1 and (l 2 ) 2 The bound cannot be improved using our technique

7 Structure of the argument Will show that: Edit metric contains the shortest path metric over the K 2,n graph ( K 2,n –metric) as induced subgraph K 2,n –metric not embeddable into (l 2 ) 2 with low distortion Conclude that: Edit metric not embeddable into (l 2 ) 2 with distortion better than 3/2 Edit metric not embeddable into l 1 with distortion better than 3/2 since l 1 -metric can be embedded isometrically into (l 2 ) 2 [LLR94] Show that: The bound of 3/2 is tight for the considered graph

8 K 2,n metric – induced subgraph of edit metric Vertices of the graph are A 1, A 2, B 1, B 2, … B n Edges are (A i, B j ), where 1≤i≤2, 1≤j≤n The mapping: A 1 is mapped to the string (10) n A 2 is mapped to the string (10) n-1 B j is mapped to the string (10) j-1 1(10) n-j A1A1 A2A2 B1B1 B2B2 B4B4 10101010 101010 101010110110101101010 B3B3 1010110 n=4

9 Lower bound for embedding K 2,n graph into (l 2 ) 2 Theorem 1: for any ε>0, there exists some n such that K 2,n –metric cannot be embedded into (l 2 ) 2 with distortion less than (3/2-ε)

10 Proof of the theorem 1 Let: B -1 =A 1 and B 0 =A 2 f - some embedding of K 2,n –metric into (l 2 ) 2 with distortion c The metric over points f(B -1 ), … f(B n ) needs to satisfy negative type inequality: For any integers b -1,… b n that sum up to 0 : Σ -1≤i<j≤n b i b j ||f(B i )-f(B j )|| 2 2 ≤0 With suitable values for n and b i, inequality gives: c ≥ 3/2-ε

11 3/2 is a tight bound Will prove that 3/2 is a tight bound for embedding K 2,n –metric into l 1 Theorem 2: There exists an embedding f of K 2,n –metric into l 1 with distortion 3/2

12 Proof of the theorem 2 Will combine two embeddings f 1 and f 2 f 1 is: f 1 (A 1 )=(0,…0) f 1 (A 2 )=(1,…1)/2 n f 1 (B j )=(bin(0) j,…bin(2 n -1) j )/2 n, ( bin(i) j = j -th bit of the binary representation of integer i ) f 1 satisfies: ||f 1 (A 1 )-f 1 (A 2 )|| 1 =1 ||f 1 (A i )-f 1 (B j )|| 1 =1/2, for 1≤i≤2, 1≤j≤n ||f 1 (B i )-f 1 (B j )|| 1 =1/2, for 1≤i<j≤n

13 Proof of theorem 2 (cont) f 2 is: f 2 (A 1 )=f 2 (A 2 )=(0,…0) f 2 (B j )=e j /2 ( e j = vector with 1 at the j -th position and 0 elsewhere) f 2 satisfies: ||f 2 (A 1 )-f 2 (A 2 )|| 1 =0 ||f 2 (A i )-f 2 (B j )|| 1 =1/2, for 1≤i≤2, 1≤j≤n ||f 2 (B i )-f 2 (B j )|| 1 =1, for 1≤i<j≤n If f 1 and f 2 induce metrics D 1 and D 2 : 2D 1 +D 2 provides a distortion of 3/2

14 Computational Experiments Goal: raise lower bound (of 3/2 ) Tried following approaches: Optimal embedding of strings of length up to d into l 1 using cut-metric formulation into (l 2 ) 2 using semidefinite programming Lower bounds via expansion properties of metric

15 Optimal embedding into l 1 A metric embeddable into l 1 iff can be represented as a convex combination of cut metrics For computing optimal distortion can use linear programming Deficiency: number of variables is 2 |X|-1, where |X|=2 d+1 -1 Infeasible for d>3 For d=3, distortion is 4/3<3/2

16 Optimal embedding into (l 2 ) 2 Formulated as a semidefinite programming problem For d=5, obtained optimal distortion of ~1.30<3/2 Could not run for d=6 since would require ~2Gb of memory

17 Lower bounds via expansion Idea: To show that the graph underlying edit metric is a “good” expander Considered “two-layers” graph G : The graph of all strings of length d and d-1 Regular with added self-loop edges (up to degree Δ=3d-1 ) Shortest path metric over G = induced subgraph of edit metric

18 Expansion Goal: To find C such that for any set A of vertices: |e(A, V-A)|≥C|A||V-A|/n ( |e(A, B)| =set of edges between A and B) Then: Distortion ≥ S·C·avg(G)/Δ, where S=const avg(G) =average distance in G C ≥ “eigenvalue gap”

19 Eigenvalue gap Can compute eigenvalues efficiently Was not large enough: ~2.7 for d=4,8,12,16 for comparison: 2 for hypercube (embeddable isometrically into l 1 ) Gives lower bound for distortion <3/2 for d≤16

20 Conclusion Lower bound of 3/2 for distortion of embedding edit metric into l 1 and (l 2 ) 2 Using K 2,n -metric Tight bound for K 2,n -metric


Download ppt "Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova."

Similar presentations


Ads by Google