Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tècniques i Eines Bioinformàtiques

Similar presentations


Presentation on theme: "Tècniques i Eines Bioinformàtiques"— Presentation transcript:

1 Tècniques i Eines Bioinformàtiques
22/02/2019 Bioinformatics, Sequence and Genome Analysis David W. Mount Flexible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot Algorithms on strings (2001) M. Crochemore, C. Hancart and T. Lecroq

2 Algorismes i estructures eficients de cerca
22/02/2019 String matching: definition of the problem (text,pattern) Exact matching: depends on what we have: text or patterns The patterns ---> Data structures for the patterns 1 pattern ---> The algorithm depends on |p| and || k patterns ---> The algorithm depends on k, |p| and || The text ----> Data structure for the text (suffix tree, ...) Approximate matching: Dynamic programming Sequence alignment (pairwise and multiple)

3 Approximate string matching
22/02/2019 For instance, given the sequence CTACTACTACGTGACTAATACTGATCGTAGCTAC… search for the pattern ACTGA allowing one error… … but what is the meaning of “one error”? As you have seen this morning ....

4 Edit distance The edit distance d between two strings is the
22/02/2019 We accept three types of errors: 1. Mismatch: ACCGTGAT ACCGAGAT 2. Insertion: ACCGTGAT ACCGATGAT Indel 3. Deletion: ACCGTGAT ACCGGAT The edit distance d between two strings is the minimum number of substitutions,insertions and deletions needed to transform the first string into the second one As you have seen this morning .... d(ACT,ACT)= d(ACT,AC)= d(ACT,C)= d(ACT,)= d(AC,ATC)= d(ACTTG,ATCTG)=

5 Edit distance The edit distance d between two strings is the
22/02/2019 We accept three types of errors: 1. Mismatch: ACCGTGAT ACCGAGAT 2. Insertion: ACCGTGAT ACCGATGAT Indel 3. Deletion: ACCGTGAT ACCGGAT The edit distance d between two strings is the minimum number of substitutions,insertions and deletions needed to transform the first string into the second one As you have seen this morning .... d(ACT,ACT)= d(ACT,AC)= d(ACT,C)= d(ACT,)= d(AC,ATC)= d(ACTTG,ATCTG)= 1 2 3 1 2

6 Edit distance and alignment of strings
22/02/2019 The Edit distance is related with the best alignment of strings Given d(ACT,ACT)= d(ACT,AC)= d(ACTTG,ATCTG)=2 which is the best alignment in every case? ACT and ACT : ACT ACT ACT and AT: ACT A -T ACTTG and ATCTG: As you have seen this morning .... ACTTG ATCTG ACT - TG A - TCTG Then, the alignment suggest the substitutions, insertions and deletions to transform one string into the other

7 Edit distance and alignment of strings
22/02/2019 But which is the distance between the strings ACGCTATGCTATACG and ACGGTAGTGACGC? … and the best alignment between them? 1966 was the first time this problem was discussed… and the algorithm was proposed in 1968,1970,… As you have seen this morning .... using the technique called “Dynamic programming”

8 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G As you have seen this morning ....

9 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G As you have seen this morning ....

10 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G The cell contains the distance between AC and CTACT. As you have seen this morning ....

11 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G ? As you have seen this morning ....

12 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G ? As you have seen this morning ....

13 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T 0 1 A C T G ? - C As you have seen this morning ....

14 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G ? - - CT As you have seen this morning ....

15 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G CTACTA As you have seen this morning ....

16 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A ? C ? T ? G A As you have seen this morning ....

17 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A 1 C 2 T 3 G… A ACT - - - As you have seen this morning ....

18 Edit distance and alignment of strings
22/02/2019 C T A C T A C T A C G T A C T G C T A C T A C T A C G T A 1 C 2 T 3 G A BA(AC,CTA) - C d(AC,CTA)+1 As you have seen this morning .... BA(A,CTA) C BA(AC,CTAC)= best d(A,CTA) d(AC,CTAC)=min BA(A,CTAC) C - d(A,CTAC)+1

19 Edit distance and alignment of strings
22/02/2019 Connect to and use the global method.

20 Edit distance and alignment of strings
22/02/2019 How this algorithm can be applied to the approximate search? to the K-approximate string searching?

21 K-approximate string searching
22/02/2019 C T A C T A C T A C G T A C T G G T G A A … A C T G This cell …

22 K-approximate string searching
22/02/2019 C T A C T A C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters

23 K-approximate string searching
22/02/2019 C T A C T A C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters

24 K-approximate string searching
22/02/2019 * * * * * * C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters… …no matter where they appears in the text, then…

25 K-approximate string searching
22/02/2019 * * * * * * C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters… …no matter where they appears in the text, then…

26 K-approximate string searching
22/02/2019 * * * * * * C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters… …no matter where they appears in the text, then…

27 K-approximate string searching
22/02/2019 C T A C T A C T A C G T A C T G G T G A A … A C T G This cell gives the distance between (ACTGA, CT…GTA)… …but we only are interested in the last characters… …no matter where they appears in the text, then

28 K-approximate string searching
22/02/2019 Connect to and use the semi-global method.

29 Pairwise and multiple alignment
Bioinformatics 22/02/2019 Pairwise and multiple alignment

30 Pairwise alignment Edit distance: match=0 mismatch=1 indel=1
22/02/2019 Edit distance: match=0 mismatch=1 indel=1 d(A,CTAC)+1 d(AC,CTACT)=minimum d(A,CTA)….+1 d(AC,CTA)+1 Similarity: match=1 mismatch=-1 indel=-2 As you have seen this morning .... s(A,CTAC)-2 s(AC,CTACT)=maximum s(A,CTA) 1 s(AC,CTA)-2 - +

31 Pairwise alignment Connect to http://alggen.lsi.upc.es
22/02/2019 Connect to Links to TEACHING EMBER LePA

32 Pairwise to multiple alignment
22/02/2019 What happens with three strings? Let n be their lenght, then the cost becomes S3 S2 S1 A C -1 __ O(n3) “O(23)” “O(32)” And with k strings? O(nk 2k k2)

33 Multiple alignment 22/02/2019 Programs of multialignment use different heuristics: Clustal (Progressive alignment) TCoffee (Progressive alignment + data bases) HMM (Hidden Markov Models)

34 Multiple alignment Connect to http://alggen.lsi.upc.es/
22/02/2019 Connect to and follow the links TEACHING EMBER.


Download ppt "Tècniques i Eines Bioinformàtiques"

Similar presentations


Ads by Google