Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.

Similar presentations


Presentation on theme: "Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short."— Presentation transcript:

1 Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short sequences ( up to 10.000bps) Dot Matrix Pairwise align. Multiple align. Hash alg. 3. Comparison of large sequences ( more that 10.000bps) Data structures Suffix treesMUMs 4. String matching

2 Comparison of large sequences First part: Alignment of large sequences

3 Dynamic programming What about genomes? Quadratic cost of space and time. accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t Short sequences (up to 10.000 bps) can be aligned using dynamic programming Quadratic cost of space and time. acc.................................agt | | |.................................|xx acc.................................a--

4 Genomic sequences In which case Dynamic Programming can be applied? The length of sequences is 1000 times longer. Genomic sequences have millions of base pairs. The running time is 1.000.000 times higher ! (1 second becomes 11 days) (1 minute becomes 2 years)

5 First assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………… Genome B ……………………………. Genome A

6 Realistic assumption? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A …………………………………………………………………. ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B

7 Realistic assumptions? But, now is it a real case? Unrealistic assumption! More realistic assumption ………………………………………………………………. ………………………….………………...…………...…. Genome B Genome A ………………………………………………………………… ………………………………………………...…………...…. Genome A Genome B ………………… ……………… Genome A Genome B

8 Preview in a real case Chlamidia muridarum: 1.084.689bps Chlamidia Thrachomatis:1057413bps        

9 Preview in a real case Pyrococcus abyssis: 1.790.334 bps Pyrococcus horikoshu: 1.763.341 bps      

10 Methodology of an alignment 1st: 2nd: 3th: (Linear cost) Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. (Linear cost)

11 Methodology of an alignment (Linear cost) Make a preview: ……………………..…. …………………...…. 1st: 2nd: 3th: Identify the portions that can be aligned. Make the alignment: …..………… ………………. ?

12 Preview-Revisited … a a t g….c t g... … c g t g….c c c... MatchingUniqueMaximal MUM Connect to MALGENMALGEN

13 Methodology of an alignment 1st: 2nd: 3th: Identify the portions that can be aligned. Make a preview: ……………………..…. …………………...…. Make the alignment: …..………… ………………. How can MUMs be found? With CLUSTALW, TCOFFEE,… How can these portions be determined? Linear cost with Suffix trees

14 Comparison of large sequences M-GCAT Todd Treangen

15 Homework 1.Javier14. Alexis 2.Dmitry15. Ramon 3.Ana Iris 4.David 5.Patricia 6.Rogeli 7.Atif 8.Aina 9.Isaac 10.Maria Merce 11.Romina 12.Guillem 13.Raul

16 Bioinformatics PhD. Course Second part: Introducing Suffix trees

17 Suffix trees Given string ababaas: 1: ababaas 2: babaas 3: abaas 4: baas 5: aas 6: as 7: s as,3 s,6 as,5 s,7 as,4 ba baas,2 a ba baas,1 a ba baas,1 ba baas,2 as,3as,4 s,6 as,5 s,7 Suffixes: What kind of queries?

18 Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

19 Quadratic insertion algorithm Given the string …………………………...... P1: the leaves of suffixes from  have been inserted and the suffix-tree …...  Invariant Properties:

20 Quadratic insertion algorithm Given the string ababaabbs ababaabbs,1

21 Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1

22 Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 ababaabbs,1 aba baabbs,1

23 Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3

24 Quadratic insertion algorithm Given the string ababaabbs babaabbs,2 aba baabbs,1 abbs,3 ba baabbs,2

25 Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 ba baabbs,2 abbs,4

26 Quadratic insertion algorithm Given the string ababaabbs aba baabbs,1 abbs,3 abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1

27 Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5

28 Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 abbs,3 ba a baabbs,1 abbs,5

29 Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1

30 Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6

31 Quadratic insertion algorithm Given the string ababaabbs abbs,4 ba baabbs,2 abbs,4 a abbs,5 b a abbs,3 baabbs,1 bs,6

32 Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7

33 Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,8

34 Quadratic insertion algorithm Given the string ababaabbs a abbs,5 b a abbs,3 baabbs,1 bs,6 a baabbs,2 b abbs,4 bs,7 s,7s,9

35 Generalizad suffix tree The suffix tree of many strings … and it is the suffix tree of the concatenation of strings. the generalized suffix tree of ababaabb and aabaat … is the suffix tree of ababaabαaabaatβ, : is called the generalized suffix tree … For instance,

36 Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given the suffix tree of ababaabα : Construction of the suffix tree of ababaabbαaabaaβ :

37 Generalizad suffix tree a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Construction of the suffix tree of ababaabbαaabaaβ :

38 Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1

39 Generalizad suffix tree Construction of the suffix tree of ababaabbαaabaaβ : a bα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1

40 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :

41 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 Construction of the suffix tree of ababaabbαaabaaβ :

42 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3

43 Construction of the suffix tree of ababaabbαaabaaβ : Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 ab aaβ,1 a β,2 a β,3

44 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :

45 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 Construction of the suffix tree of ababaabbαaabaaβ :

46 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :

47 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 Construction of the suffix tree of ababaabbαaabaaβ :

48 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Construction of the suffix tree of ababaabbαaabaaβ :

49 Generalizad suffix tree a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6 Generalized suffix tree of ababaabbαaabaaβ :

50 Applications of Suffix trees a ba baas,1 as,3 ba baas,2 as,4 s,6 as,5 s,7 1. Exact string matching ………………………… Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab?

51 Applications of Suffix trees 2. The substring problem for a database of strings DB Does the DB contain any ocurrence of patterns abab, aab, and ab? a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

52 Applications of Suffix trees 3. The longest common substring of two strings a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

53 Applications of Suffix trees 4. Finding the maximal repeats. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

54 Applications of Suffix trees 5. Finding MUMs. a bα,5 b a bbα,3 baabbα,1 bα,6 a baabbα,2 b bbα,4 bα,7 α,8 α,9 b aaβ,1 a β,2 a β,3 a β,4 β,5 β,6

55 Bioinformatics PhD. Course Third part: Suffix links

56 a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

57 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

58 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

59 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

60 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

61 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

62 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

63 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

64 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa  ?

65 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

66 Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 aa 

67 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a

68 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a

69 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings

70 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a aa in S 2 [1] Unique matchings aab in S 2 [1] = S 1 [5..6-7] in S 2 [1]

71 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]

72 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a Unique matchings S 1 [5..6-7] in S 2 [1]

73 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

74 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

75 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

76 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-…] in S 2 [2]

77 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..6-7] in S 2 [1] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3]

78 Traversal using Suffix links a abbα,5 b a abbα,3 baabbα,1 bα,6 a baabbα,2 b abbα,4 bα,7 α,8α,9 Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6]

79 From UMs to MUMs Given S 2 = a a b a a b b a Unique matchings S 1 [5..8] in S 2 [4] S 1 [3..6-8] in S 2 [2] S 1 [4..6-8] in S 2 [3] S 1 [6..8] in S 2 [5] S 1 [7..8] in S 2 [6] Array of UMs 1 2 3 6-8 4 6-8 5 8 6 8 7 8 8 9 and S 1 = a b a b a a b b α MUM: S 1 [3..6-8] in S 2 [2]

80 Bioinformatics PhD. Course Third part: Linear insertion algorithm

81 Quadratic insertion algorithm Given the string …………………………...... P1: the leaves of suffixes from  have been inserted and the suffix-tree …...  Invariant Properties:

82 Linear insertion algorithm Given the string …………………………...... P2: the string  is the longest string that can be spelt through the tree. P1: the leaves of suffixes from  have been inserted and the suffix-tree  …...   Invariant Properties:

83 Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4   aa 

84 Linear insertion algorithm: example Given the string ababaababb... ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4   6 7 8

85 Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4  6 7 8 Given the string ababaababb... 

86 Linear insertion algorithm: example ba baababb...,2 a ababb...,5 ba ababb...,3 baababb...,1 ababb...,4  6 7 89 Given the string ababaababb... 

87 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 baababb...,1 ba baababb...,2 ababb...,4 Given the string ababaababb...   6 7 89 baababb...,1 b b...,6 aababb...,1

88 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

89 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

90 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1

91 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 89 b b...,6 aababb...,1 baababb...,2 b aababb...,2

92 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba baababb...,2 ababb...,4 Given the string ababaababb...   7 8… b b...,6 aababb...,1 baababb...,2 b b...,7 aababb...,2

93 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

94 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

95 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

96 Linear insertion algorithm: example a ababb...,5 ba ababb...,3 ba ababb...,4 Given the string ababaababb...   89 b b...,6 aababb...,1 b b...,7 aababb...,2

97 Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a

98 Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   89 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

99 Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

100 Linear insertion algorithm: example a ababb...,5 b ba ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8

101 Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a 

102 Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 aababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9 

103 Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

104 Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...  9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

105 Linear insertion algorithm: example a ababb...,5 b b ababb...,4 Given the string ababaababb...   9 ababb...,3 b b...,6 ababb...,1 b b...,7 aababb...,2 a b...,8 a b...,9

106 Index Suffix arrays Suffix-arrays: a new method for on-line string searches, G. Myers, U. Manber

107 Suffix arrays Given string ababaa#: 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 7: # Suffixes:… but lexicographically sorted 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: # 12345671234567 Which is the cost?O(n log(n))

108 Applications of suffix arrays 1. Exact string matching Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab? 1: ababaa# 2: babaa# 3: abaa# 4: baa# 5: aa# 6: a# 1: #12345671234567 Binary search O(log(n) |P|) … which is the cost? O(log(n)+|P|) ? Can it be improved to …

109 Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β 1 2 … n Suffix array P2: matches pref( query)

110 Fast search with cost O(log(n)+|P|) Query: Invariant Properties: P1: α < query ≤ β α β γ Algorithm: 1 2 … n Suffix array P2: matches pref( query) If suff( γ )<suff(query) then α = γ else β = γ


Download ppt "Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short."

Similar presentations


Ads by Google