Dynamic Programming Computation of Edit Distance
Definition of Edit Distance Edit Distance DE (X,Y) measures how close string X is to string Y. DE(X,Y) is the cost of the minimum cost transformation t : X t Y where t is a sequence of operations (insertion, equal substitution, unequal substitution, and deletion). The cost of t is the sum of the operation costs where each operation costs 1 except for equal substitution which costs 0. A B C The cost of this transformation is 3 which happens to be minimal.
Decomposition of Problem Decomposition : Last Operation Delete, Substitute, or Insert Atomic Problems : X prefix or Y prefix empty Table : Rows for 0 .. M for X prefix characters, Columns 0 .. N for Y prefix characters Table Entry : DE (Xi , Yj) Composition : = cost(Substitution) = 1 if xi != yj and 0 otherwise. DE (Xi ,Yj ) = min{ DE (Xi-1 ,Yj ) + 1, DE (Xl-1 ,Yj-1 ) + , DE (Xi ,Yj-1 ) + 1 }
Atomic Problems Yi requires i insertions at a cost of I Empty string transformed into a prefix of Y Xi requires i deletions at a cost of I A prefix of X transformed into the empty string
Computation of DE( ababaac, bababbc )