Speaker: C. C. Lin Adviser: R. C. T. Lee

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
1 Rules for Approximate String Matching R.C.T. Lee.
On-line Construction of Suffix Trees Chairman : Prof. R.C.T. Lee Speaker : C. S. Wu ( ) June 10, 2004 Dept. of CSIE National Chi Nan University.
1 Faster algorithms for string matching with k mismatches Adviser : R. C. T. Lee Speaker: C. C. Yen Journal of Algorithms, Volume 50, Issue 2, February.
UNITED NATIONS Shipment Details Report – January 2006.
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt RhymesMapsMathInsects.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
1 The Blue Café by Chris Rea My world is miles of endless roads.
EU market situation for eggs and poultry Management Committee 20 October 2011.
Bright Futures Guidelines Priorities and Screening Tables
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Green Eggs and Ham.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
How to convert a left linear grammar to a right linear grammar
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

Speaker: C. C. Lin Adviser: R. C. T. Lee String Matching with k Mismatches by Using Kangaroo Method Efficient string with k mismatches, Landau, G.M., and Vishkin, U., Theoret. Comput Sci 43, 1986, pp. 239-249 Speaker: C. C. Lin Adviser: R. C. T. Lee

Input: A text T with length n , a pattern P with Problem definition: Input: A text T with length n , a pattern P with length m and a mismatching threshold k. Output: All sub-strings of T with length m matching P with k maximal number of mismatches. If k = 2 k: 1 4 3 2 T = A G C T G C D C A C G I A B... P = A G C C P = A G C C P = A G C C P = A G C C

The concept of the Kangaroo method can be explained as the following figure. Assume that it is known before hand there t1t2…ta=p1p2…pa and ta+1 is not equal to pa+1. Thus we do not have to examine t1t2…ta+1 with p1p2…pa+1 and jump directly to match the suffixes beginning from ta+2 and pa+2. Text: t1 t2… ta ta+1 ta+2 ta+3…tk………… Pattern: p1p2…pa pa+1 pa+2pa+3...pk………… mismatch

Kangaroo method will process as follows. P = ETBDBCCDFDC T = ABCCABDADBDETADBAADFDAAEERDXTDADCT… start

Kangaroo method will process as follows. P = ETBDBCCDFDC T = ABCCABDADBDETADBAADFDAAEERDXTDADCT…

Kangaroo method will process as follows. P = ETBDBCCDFDC T = ABCCABDADBDETADBAADFDAAEERDXTDADCT…

Kangaroo method will process as follows. P = ETBDBCCDFDC T = ABCCABDADBDETADBAADFDAAEERDXTDADCT…

Kangaroo method will process as follows. P = ETBDBCCDFDC T = ABCCABDADBDETADBAADFDAAEERDXTDADCT…

We continue the above process. Whenever we come to the situation that it is known a substring of T exactly matching with a substring of P, we skip this substring. This process is stopped when k+1 mismatches have been found. Input: T=ABAABBCCDD, P=ACDCB and k=2. T=ABAABCCDD P=ACDCB k=3, we stop and discard ABAAB, then we start to compare “BAADB” and “ACDCB”.

Before we introduce the Kangaroo algorithm, we shall first introduce the suffix tree and the lowest common ancestor of two nodes. The properties of suffix tree and the lowest common ancestor of two nodes will be used in Kangaroo algorithm.

S = ABCDEADDBE Suffix tree of a string with length n can be constructed in O(n). Weiner, 1973 McCreight, 1976 Ukkonen, 1995

The lowest common ancestor of two leaf nodes can be found in O(1) by O(n) preprocessing in constructing time. Harel and Tarjan, 1984

The Kangaroo method constructs a suffix tree for text T and pattern P. Let the leaf node corresponding to the substring starting from the location be denoted as X. Let the leaf corresponding to the pattern be denoted as Y. The Kangaroo Method finds the lowest common ancestor of X and Y to verify a text location with k mismatches in O(k). Let us consider the next page to figure out the Kangaroo method.

Two suffix strings: ANBECF$ ANCEC$ Then we can know that they have the same prefix “AN” and a mismatch “B” and “C”. ANBECF$ ANCEC$ ANBECF$ ANCEC$ ANBECF$ ANCEC$ We now have to find whether there is any mismatches between ECF and EC. mismatches=1

We get remaining suffix strings: ECF$ EC$ Then we can know that they have the same prefix “EC” and because we touch $, we finish the verification. ECF$ EC$ Thus we could know that the mismatches between “ANBECF” and “ANCEC” is 1. ECF$ EC$ mismatches=1

We will not have to compare all characters by using the finding of the lowest common ancestor of two strings of text and pattern in the suffix tree. This is useful if there are many equivalent characters between the text and the pattern because we will not have to compare those equivalent characters. Finding the lowest common ancestor between two suffixes is to find the next mismatch between two strings.

Input: T=ABCCBDCDBC, P=ABCD and k=2 The suffix tree of T and P is:

The lowest common ancestor of “ABCD” and “ABCCBDCDBC”. T=ABCCBDCDBC P=ABCD k=1, return “ABCC”.

The lowest common ancestor of “ABCD” and “BCCBDCDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “CCBDCDBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “CD” and “CBDCDBC”. T=ABCCBDCDBC P=ABCD k=3, discard “BCCB”.

The lowest common ancestor of “ABCD” and “CCBDCDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “CBDCDBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “CD” and “BDCDBC”. T=ABCCBDCDBC P=ABCD k=3, discard “CCBD”.

The lowest common ancestor of “ABCD” and “CBDCDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “BDCDBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “D” and “CDBC”. T=ABCCBDCDBC P=ABCD k=3, discard “CBDC”.

The lowest common ancestor of “ABCD” and “BDCDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “DCDBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “CD” and “CDBC”. T=ABCCBDCDBC P=ABCD k=2, return “BDCD”.

The lowest common ancestor of “ABCD” and “DCDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “CDBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “CD” and “DBC”. T=ABCCBDCDBC P=ABCD k=3, discard “DCDB”.

The lowest common ancestor of “ABCD” and “CDBC”. T=ABCCBDCDBC P=ABCD k=1.

The lowest common ancestor of “BCD” and “DBC”. T=ABCCBDCDBC P=ABCD k=2.

The lowest common ancestor of “CD” and “BC”. T=ABCCBDCDBC P=ABCD k=3, discard “CDBC”.

Input: T=ABCCBDCDBC, P=ABCD and k=2. Output: “ABCC” and “BDCD”.

In order to use Kangaroo method, we construct a suffix tree for the text T with the length n and the pattern p with the length m in O(n+m). By using Kangaroo method, we take O(1) time to find one mismatch. We stop when there are more than k mismatches. Therefore, we take O(k) time to find at most k mismatches.

Thus, the time complexity of finding out all locations of text T with k maximal mismatches with the pattern P is O(nk).

References For Construction of Suffix trees: [M76] McCreight, E.M., A Space-Economical Suffix Tree Construction Algorithm, J. ACM 23 (1976): 262-272. [U95] Ukkonen, E., On-line Construction of Suffix Trees, Algorithmica 41 (1995): 249-260. For Finding Lowest Common Ancestor: [HT84] Harel, D. and Tarjan, R.E., Fast Algorithms for Finding Nearest Common Ancestor, SIAM Journal on Computing 13 (1984): 338-355.

References For String Matching with k Mismatches: [LV86] Landau, G.M., and Vishkin, U., Efficient string with k mismatches, Theoret. Comput Sci 43 (1986): 239-249.

Thank you