Download presentation
Presentation is loading. Please wait.
Published byEric McKinney Modified over 9 years ago
1
Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short sequences ( up to 10.000bps) Dot Matrix Pairwise align. Multiple align. Hash alg. 3. Comparison of large sequences ( more that 10.000bps) Data structures Suffix treesMUMs 4. String matching
2
Comparison of large sequences First part: Alignment of large sequences
3
Dynamic programming What about genomes? Quadratic cost of space and time. accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t Short sequences (up to 10.000 bps) can be aligned using dynamic programming Quadratic cost of space and time. acc.................................agt | | |.................................|xx acc.................................a--
4
Genomic sequences In which case Dynamic Programming can be applied? The length of sequences is 1000 times longer. Genomic sequences have millions of base pairs. The running time is 1.000.000 times higher ! (1 second becomes 11 days) (1 minute becomes 2 years)
5
First assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………… Genome B ……………………………. Genome A
6
Realistic assumption? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………………………………………. ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B
7
Realistic assumptions? But, now is it a real case? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A ………………………………………………………………… ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B
8
Preview in a real case Chlamidia muridarum: 1.084.689bps Chlamidia Thrachomatis:1057413bps
9
Preview in a real case Pyrococcus abyssis: 1.790.334 bps Pyrococcus horikoshu: 1.763.341 bps
10
Methodology of an alignment 1st: 2nd: 3th: (Linear cost) Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. (Linear cost)
11
Methodology of an alignment (Linear cost) Make a preview: ……………………..…. …………………...…. 1st: 2nd: 3th: Identify the portions that can be aligned. Make the alignment: …..………… ………………. ?
12
Preview-Revisited … a a t g….c t g... … c g t g….c c c... MatchingUniqueMaximal MUM Connect to MALGENMALGEN
13
Methodology of an alignment 1st: 2nd: 3th: Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. How can MUMs be found? With CLUSTALW, TCOFFEE,… How can these portions be determined? Linear cost with Suffix trees
14
Comparison of large sequences M-GCAT Todd Treangen
15
Homework 1.Javier14. Alexis 2.Dmitry15. Ramon 3.Ana Iris 4.David 5.Patricia 6.Rogeli 7.Atif 8.Aina 9.Isaac 10.Maria Merce 11.Romina 12.Guillem 13.Raul
16
Bioinformatics PhD. Course Second part: Introducing Suffix trees
17
Suffix trees Given string ababaas: 1: ababaas 2: babaas 3: abaas 4: baas 5: aas 6: as 7: s as,3 s,6 as,5 s,7 as,4 ba baas,2 a ba baas,1 a ba baas,1 ba baas,2 as,3as,4 s,6 as,5 s,7 Suffixes: What kind of queries?
18
Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
19
Quadratic insertion algorithm Given the string …………………………...... P1: the leaves of suffixes from have been inserted and the suffix-tree …... Invariant Properties:
20
Quadratic insertion algorithm Given the string ababaabbs ababaabbs,1
21
Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1
22
Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1 aba baabbs,1
23
Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3
24
Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3 ba baabbs,2
25
Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 ba baabbs,2 abbs,4
26
Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1
27
Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5
28
Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5
29
Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1
30
Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6
31
Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6
32
Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7
33
Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,8
34
Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,7s,9
35
Generalizad suffix tree The suffix tree of many strings … and it is the suffix tree of the concatenation of strings. the generalized suffix tree of ababaabb and aabaat … is the suffix tree of ababaabαaabaatβ, : is called the generalized suffix tree … For instance,
36
Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given the suffix tree of ababaabα : Construction of the suffix tree of ababaabbαaabaaβ :
37
Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Construction of the suffix tree of ababaabbαaabaaβ :
38
Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1
39
Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1
40
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :
41
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :
42
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3
43
Construction of the suffix tree of ababaabbαaabaaβ : Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3
44
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :
45
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :
46
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :
47
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :
48
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Construction of the suffix tree of ababaabbαaabaaβ :
49
Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Generalized suffix tree of ababaabbαaabaaβ :
50
Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?
51
Applications of Suffix trees 2. The substring problem for a database of strings DB Does the DB contain any ocurrence of patterns abab, aab, and ab? a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6
52
Applications of Suffix trees 3. The longest common substring of two strings a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6
53
Applications of Suffix trees 4. Finding the maximal repeats. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6
54
Applications of Suffix trees 5. Finding MUMs. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6
55
Bioinformatics PhD. Course Third part: Suffix links
56
a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa
57
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
58
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
59
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
60
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
61
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
62
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
63
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
64
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa ?
65
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa
66
Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa
67
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a
68
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a
69
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings
70
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings aab in S 2 [1] = S 1 [5..6-7] in S 2 [1]
71
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]
72
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]
73
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]
74
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]
75
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]
76
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]
77
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3]
78
Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6]
79
From UMs to MUMs Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6] Array of UMs 1 2 3 6-8 4 6-8 5 8 6 8 7 8 8 9 and S 1 = a b a b a a b b α MUM: S 1 [3..6-8] in S 2 [2]
80
Bioinformatics PhD. Course Third part: Linear insertion algorithm
81
Quadratic insertion algorithm Given the string …………………………...... P1: the leaves of suffixes from have been inserted and the suffix-tree …... Invariant Properties:
82
Linear insertion algorithm Given the string …………………………...... P2: the string is the longest string that can be spelt through the tree. P1: the leaves of suffixes from have been inserted and the suffix-tree …... Invariant Properties:
83
Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4 aa
84
Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4 6 7 8
85
Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4 6 7 8 Given the string ababaababb...
86
Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4 6 7 89 Given the string ababaababb...
87
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 baababb...,1 ba baababb...,2 ababb...,4 Given the string ababaababb... 6 7 89 baababb...,1 b b...,6 aababb...,1
88
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb... 7 89 b b...,6 aababb...,1
89
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb... 7 89 b b...,6 aababb...,1
90
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb... 7 89 b b...,6 aababb...,1
91
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb... 7 89 b b...,6 aababb...,1 baababb...,2 b aababb...,2
92
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb... 7 8… b b...,6 aababb...,1 baababb...,2 b b...,7 aababb...,2
93
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb... 89 b b...,6 aababb...,1 b b...,7 aababb...,2
94
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb... 89 b b...,6 aababb...,1 b b...,7 aababb...,2
95
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb... 89 b b...,6 aababb...,1 b b...,7 aababb...,2
96
Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb... 89 b b...,6 aababb...,1 b b...,7 aababb...,2
97
Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb... 89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a
98
Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb... 89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8
99
Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8
100
Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8
101
Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a
102
Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9
103
Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9
104
Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9
105
Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb... 9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9
106
Index Suffix arrays Suffix-arrays: a new method for on-line string searches, G. Myers, U. Manber
107
Suffix arrays Given string ababaa#: 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 7: # Suffixes:… but lexicographically sorted 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: # 12345671234567 Which is the cost?O(n log(n))
108
Applications of suffix arrays 1. Exact string matching Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab? 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: #12345671234567 Binary search O(log(n) |P|) … which is the cost? O(log(n)+|P|) ? Can it be improved to …
109
Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β 1 2 … n Suffix array P2: matches pref( query)
110
Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β γ Algorithm: 1 2 … n Suffix array P2: matches pref( query) If suff( γ )<suff(query) then α = γ else β = γ
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.