1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20,

Slides:



Advertisements
Similar presentations
Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen
Advertisements

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
1 Fast Parallel and Serial Approximate String Matching Journal of Algorithms, Vol.10 (1989), pp G. Landau and U. Vishkin Advisor: Prof. R. C.
Speaker: C. C. Lin Adviser: R. C. T. Lee
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Polymer Supported Reagents3-1 Polymer Supported Reagents Nucleophiles Electrophiles Oxidants Reducing Agents Coupling & Dehydrating Reagents Halogen Carriers.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Turing Machines.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
2 |SharePoint Saturday New York City
Green Eggs and Ham.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
How to create Magic Squares
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

1 String Matching with Errors The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359~373. Speaker: C. C. Lin Adviser: R. C. T. Lee

2 In the following, we will present a problem related to the notion of edit distance. Next, let us introduce edit distance.

3 In edit distance, there are three types of differences between two strings X and Y: Insertion: a symbol of Y is missing in X at a corresponding position, with its cost being 1. Substitution: symbols at corresponding positions are distinct, with its cost being 1. Deletion: a symbol of X is missing in Y at a corresponding position, with its cost being 1. X: G C A Y: G A X : A C C Y : T C C X : A T Y : A G T

4 Given two strings X and Y, the edit distance between X and Y is the minimum number of insertions, deletions and substitutions needed to transform Y to X.

5 String X ATGAATCTTACCGCCTCG String Y ATGAGGCTCTGGCCCCTG Transformation (from string Y to string X) String X:A T G A A – – T C T T A C C G C C T C G String Y:A T G A G G C T C T G G C C – C C T – G EDIT(X, Y)=7 (2 insertions, 2 deletions and 3 changes).

6 Next, we will introduce a dynamic programming method to compute the edit distance between two strings X and Y.

7 Dynamic Programming for Edit Distance: (Delete) (Insert) (Substitute)

8 abcabba c b a b a c Given X=abcabba Y=cbabac

9 abcabba c b a b a c Given X=abcabba Y=cbabac

10 abcabba c b a b a c Given X=abcabba Y=cbabac

11 abcabba c b a b a c Given X=abcabba Y=cbabac

12 abcabba c b a b a c Given X=abcabba Y=cbabac

13 abcabba c b a b a c Given X=abcabba Y=cbabac

14 abcabba c b a b a c EDIT(X, Y)=4 a c Given X=abcabba Y=cbabac Substitute

15 abcabba c b a b a c EDIT(X, Y)=4 ba ac Given X=abcabba Y=cbabac Substitute

16 abcabba c b a b a c EDIT(X, Y)=4 bba bac Given X=abcabba Y=cbabac Match

17 abcabba c b a b a c EDIT(X, Y)=4 abba abac Given X=abcabba Y=cbabac Match

18 abcabba c b a b a c EDIT(X, Y)=4 cabba –abac Given X=abcabba Y=cbabac Insert

EDIT(X, Y)=4 bcabba b–abac Given X=abcabba Y=cbabac c a b a b c abbacba Match 0

EDIT(X, Y)=4 abcabba cb–abac Given X=abcabba Y=cbabac abcabba c b a b a c Substitute

21 abcabba c b a b a c Given X=abcabba Y=cbabac EDIT(X, Y)=4 abcabba- cb–ab-ac Substitute Match Insert Match Insert Match Delete

22 abcabba c b a b a c Given X=abcabba Y=cbabac EDIT(X, Y)=4 abcabba- cb–a-bac

23 We can recognize the time complexity of computing edit distance by the above algorithm to be O(mn) and space complexity O(mn) where n and m are the size of text and pattern, respectively.

24 In the following, we will introduce the topic, called the string matching with errors problem.

25 The definition of the problem: Given a pattern P of length m and a text T of length n, find a substring S of T such that EDIT(S, P) is minimal. Given: T=abcabba P=cbabac Find: S=cabba EDIT(S, P)=3 P= cbabac S= c–abba Given: T=abcabba P=cbabac Ts substring K=bcabb EDIT(K, P)=4 P= –cbabac K= bc–ab–b

26 Dynamic Programming for the String Matching with Error Problem:

27 The difference between EDIT[i, j] is that the EDIT[0, j]=j for the edit distance finding problem and SE[0,j]=0 for the string with error problem. The dynamic programming approach for the edit distance problem:

28 In the edit distance problem, we have EDIT[0, j]=j. In the string matching with error problem, we set SE[0, j]=0.

29 abcabba c b a b a c T=abcabba P=cbabac Since this path starts at the bottom row and ends at the top row with SE(0, j)=0, this shows that there exists a substring S in T such that EDIT(S, P)=3.

30 We find the lowest value of the last row and trace back from the point. Our output may be several strings.

31 abcabba c b a b a c S=cabba T=abcabba P=cbabac T: abc–abba P: cbabac

T=abcabba P=cbabac EDIT(S, P)=3 edit distance cabba c b a b a c S: c–abba P: cbabac

33 abcabba c b a b a c T=abcabba P=cbabac S: cabba– P: cbabac EDIT(S, P)=3

34 abcabba c b a b a c T=abcabba P=cbabac S: c-ab-- P: cbabac EDIT(S, P)=3

35 abcabba c b a b a c T=abcabba P=cbabac S: --ab-c P: cbabac EDIT(S, P)=3

36 References For Edit Distance Computation: [NW70] Neddleman, S.B., and Wunsch, C.D., A general method applicable to the search for similarities in the aminoacid sequence of two proteins, Journal of Molecular Biology 48 (1970): For String matching with error: [S80] The Theory and Computation of Evolutionary Distances: Pattern Recognition, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359~373.

37 Thank you