Download presentation
Presentation is loading. Please wait.
Published byBarry Gilbert Modified over 9 years ago
1
Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova
2
Definitions Edit distance between two strings: Minimum number of edit operations needed to transform one string into another Edit operation: Insertion, deletion, or substitution of one character Edit distance = Levenstein metric
3
Edit Distance Important in: Computational biology Text processing Computational bottlenecks: Widely used algorithm takes quadratic time No efficient algorithm known for nearest neighbor computation New approach for dealing with edit distance: Embedding into a normed space
4
Embedding Definition: A mapping f: Strings→l p d, such that for any pair of strings s and s': Edit(s, s') ≤ ||f(s) -f(s')|| p ≤ c·Edit(s, s') The factor c is called distortion Useful to embed edit distance into a normed space because: Efficient algorithms working on normed spaces are known (e.g. nearest neighbor computation) Can compute (approximately) edit distance in subquadratic time, if computing the mapping takes subquadratic time
5
Embedding edit distance into a normed space Essentially nothing known If allow moving a contiguous block of characters as a single edit operation: can embed new metric into l 1 with distortion O(log d·log * d) [CPSV’00] (d – length of strings to embed)
6
Result in this paper A lower bound of 3/2 on the distortion of embedding into l 1 and (l 2 ) 2 The bound cannot be improved using our technique
7
Structure of the argument Will show that: Edit metric contains the shortest path metric over the K 2,n graph ( K 2,n –metric) as induced subgraph K 2,n –metric not embeddable into (l 2 ) 2 with low distortion Conclude that: Edit metric not embeddable into (l 2 ) 2 with distortion better than 3/2 Edit metric not embeddable into l 1 with distortion better than 3/2 since l 1 -metric can be embedded isometrically into (l 2 ) 2 [LLR94] Show that: The bound of 3/2 is tight for the considered graph
8
K 2,n metric – induced subgraph of edit metric Vertices of the graph are A 1, A 2, B 1, B 2, … B n Edges are (A i, B j ), where 1≤i≤2, 1≤j≤n The mapping: A 1 is mapped to the string (10) n A 2 is mapped to the string (10) n-1 B j is mapped to the string (10) j-1 1(10) n-j A1A1 A2A2 B1B1 B2B2 B4B4 10101010 101010 101010110110101101010 B3B3 1010110 n=4
9
Lower bound for embedding K 2,n graph into (l 2 ) 2 Theorem 1: for any ε>0, there exists some n such that K 2,n –metric cannot be embedded into (l 2 ) 2 with distortion less than (3/2-ε)
10
Proof of the theorem 1 Let: B -1 =A 1 and B 0 =A 2 f - some embedding of K 2,n –metric into (l 2 ) 2 with distortion c The metric over points f(B -1 ), … f(B n ) needs to satisfy negative type inequality: For any integers b -1,… b n that sum up to 0 : Σ -1≤i<j≤n b i b j ||f(B i )-f(B j )|| 2 2 ≤0 With suitable values for n and b i, inequality gives: c ≥ 3/2-ε
11
3/2 is a tight bound Will prove that 3/2 is a tight bound for embedding K 2,n –metric into l 1 Theorem 2: There exists an embedding f of K 2,n –metric into l 1 with distortion 3/2
12
Proof of the theorem 2 Will combine two embeddings f 1 and f 2 f 1 is: f 1 (A 1 )=(0,…0) f 1 (A 2 )=(1,…1)/2 n f 1 (B j )=(bin(0) j,…bin(2 n -1) j )/2 n, ( bin(i) j = j -th bit of the binary representation of integer i ) f 1 satisfies: ||f 1 (A 1 )-f 1 (A 2 )|| 1 =1 ||f 1 (A i )-f 1 (B j )|| 1 =1/2, for 1≤i≤2, 1≤j≤n ||f 1 (B i )-f 1 (B j )|| 1 =1/2, for 1≤i<j≤n
13
Proof of theorem 2 (cont) f 2 is: f 2 (A 1 )=f 2 (A 2 )=(0,…0) f 2 (B j )=e j /2 ( e j = vector with 1 at the j -th position and 0 elsewhere) f 2 satisfies: ||f 2 (A 1 )-f 2 (A 2 )|| 1 =0 ||f 2 (A i )-f 2 (B j )|| 1 =1/2, for 1≤i≤2, 1≤j≤n ||f 2 (B i )-f 2 (B j )|| 1 =1, for 1≤i<j≤n If f 1 and f 2 induce metrics D 1 and D 2 : 2D 1 +D 2 provides a distortion of 3/2
14
Computational Experiments Goal: raise lower bound (of 3/2 ) Tried following approaches: Optimal embedding of strings of length up to d into l 1 using cut-metric formulation into (l 2 ) 2 using semidefinite programming Lower bounds via expansion properties of metric
15
Optimal embedding into l 1 A metric embeddable into l 1 iff can be represented as a convex combination of cut metrics For computing optimal distortion can use linear programming Deficiency: number of variables is 2 |X|-1, where |X|=2 d+1 -1 Infeasible for d>3 For d=3, distortion is 4/3<3/2
16
Optimal embedding into (l 2 ) 2 Formulated as a semidefinite programming problem For d=5, obtained optimal distortion of ~1.30<3/2 Could not run for d=6 since would require ~2Gb of memory
17
Lower bounds via expansion Idea: To show that the graph underlying edit metric is a “good” expander Considered “two-layers” graph G : The graph of all strings of length d and d-1 Regular with added self-loop edges (up to degree Δ=3d-1 ) Shortest path metric over G = induced subgraph of edit metric
18
Expansion Goal: To find C such that for any set A of vertices: |e(A, V-A)|≥C|A||V-A|/n ( |e(A, B)| =set of edges between A and B) Then: Distortion ≥ S·C·avg(G)/Δ, where S=const avg(G) =average distance in G C ≥ “eigenvalue gap”
19
Eigenvalue gap Can compute eigenvalues efficiently Was not large enough: ~2.7 for d=4,8,12,16 for comparison: 2 for hypercube (embeddable isometrically into l 1 ) Gives lower bound for distortion <3/2 for d≤16
20
Conclusion Lower bound of 3/2 for distortion of embedding edit metric into l 1 and (l 2 ) 2 Using K 2,n -metric Tight bound for K 2,n -metric
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.