Download presentation
Presentation is loading. Please wait.
1
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Tunis, March 2007
2
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching http://www.gobics.de/ burkhard/teaching/tunis_07.php
3
6/3/2015Burkhard Morgenstern, Tunis 2007 www.gobics.de/burkhard/ teaching/tunis_07.php
4
6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell
5
6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell Idea: Sequence -> Structure -> Function
6
6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell
7
6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell gap between sequence and structure/function data Lots of data available at the sequence level Fewer data at the structure and function level
8
6/3/2015Burkhard Morgenstern, Tunis 2007 Exponential growth of data bases
9
6/3/2015Burkhard Morgenstern, Tunis 2007 Major goal of bioinformatics: close the gap between sequence information and structure/function information Most important tool for sequence analysis: sequence comparison Simple approach: dot plot, more advanced approach: sequence alignment
10
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot
11
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Gibbs and McIntyre (1970)
12
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y Two sequences to be compared
13
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y Comparison matrix
14
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V M R E Q Y Search pairs of identical residues
15
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Dot plot: dot ( X ) for all pairs of identical residues
16
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X
17
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Homologies as diagonal lines from top-left to bottom-right corner
18
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Inversions as diagonals from bottom left to top right
19
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X Repeats as parallel diagonals
20
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X
21
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Advantages: 1. Various types of similarity detectable (repeats, inversions) 2. Useful for large-scale analysis Use filtering for long sequeces: dots represent matching segments instead of matching single residues
22
6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot
23
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Evolutionary or structurally related sequences: alignment possible Sequence homologies represented by inserting gaps
24
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Two input sequences
25
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Comparison matrix for two sequences
26
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X Dot plot for two sequences
27
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X Similarities in same relative order over entire seqences
28
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X A Q X Y X Global alignment of sequences possible
29
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Alignment corresponds to path through comparison matrix
30
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Matches (red), mis-matches (green), gaps (blue)
31
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Matches (red), mis-matches (green), gaps (blue)
32
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment (global) alignment: write sequences on top of each other, gaps represented by dash symbols
33
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Input sequences
34
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – alignment of input sequences
35
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y - alignment consists matches (red), mismatches (green) and gaps (blue)
36
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Basic task: Find ‘best’ alignment of two sequences = alignment that reflects structural and evolutionary relations
37
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Questions: 1. What is a good alignment? 2. How to find the best alignment?
38
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Idea: consider alignment as hypothesis about evolution of sequences. gaps correspond to insertions/deletions mismatches correspond to substitutions
39
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A - Q Y Problem: astronomical number of possible alignments
40
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I - V M R E A Q Y Problem: astronomical number of possible alignments
41
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – Problem: astronomical number of possible alignments stupid computer has to find out: which alignment is best ??
42
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches
43
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – General assumption: sequences not too distantly related. In this case: mismatches (substitutions) and gaps (insertions/deletions) unlikely Consequence: good alignment should reduce gaps and mismatches
44
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C I - V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches
45
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches
46
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches
47
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C I - V M R E A Q Y – Second (simplified) rule: minimize number of gaps
48
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V - A R E - Q Y E C I - V M - R E A Q Y – Second (simplified) rule: minimize number of gaps Parsimony principle: minimize number of evolutionary events
49
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment For protein sequences: different degrees of similarity among amino acids. counting matches/mismatches oversimplistic
50
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T L V Protein sequences to be aligned
51
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T L - V Possible alignment
52
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Alternative alignment
53
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Some amino acid residues are more similar to each other than others Therefore: similarity among amino acid residues has to be taken into account.
54
6/3/2015Burkhard Morgenstern, Tunis 2007
55
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V To assess quality of protein alignments: use similarity scores for amino acids s(a,b) similarity score for amino acids a and b
56
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Similarity measured by substitution matrices based on substitution probabilities Important substitution matrices: PAM (M. Dayhoff) BLOSUM (S. Henikoff / J. Henikoff)
57
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment The PAM matrix: Consider probability p a,b of substitution a → b (or b → a) for amino acids a and b Define for amino acids a and b similarity score S(a,b) based on probability p a,b First task: find out p a,b for every pair of amino acids a, b
58
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment The PAM matrix: Use closely related protein families – no alignment problem, no double substitutions Construct phylogenetic tree with parsimony method Count substitution frequencies/probabilities Normalize substitution probabilities Extrapolate probabilities for larger evolutionary distances
59
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Finally: define similarity score S(a,b) = log (p a,b / q a q b ) q a = (relative) frequency of amino acid a
60
6/3/2015Burkhard Morgenstern, Tunis 2007
61
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Given a similarity score s(a,b) for pairs of amino acids, define quality score of alignment as: sum of similarity values s(a,b) of aligned residues minus gap penalty g for each residue aligned with a gap
62
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Example: Score = s(T,T) + s(I,L) + s (V,V) - g
63
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Next question: find alignment with best score Dynamic-programming algorithm finds alignment with best score. (Needleman and Wunsch, 1970)
64
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Alignment corresponds to path through comparison matrix
65
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X
66
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E X X C X I X V X M X R X E X X Q X Y X X
67
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Alignment corresponds to path through comparison matrix
68
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V - R E A Q I - C I V M R E - H Y
69
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Score of alignment: Sum of similarity values of aligned residues minus gap penatly T W L V - R E A Q I - C I V M R E - H Y
70
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Example: S = - g + s(W,C) + s(L,L) + s(V,V) - g + s(R,R) … T W L V - R E A Q I - C I V M R E - H Y
71
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - H Y
72
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R E A Q Y I X X C X Alignment corresponds I X to path through V X comparison matrix M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - H Y
73
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X Dynamic programming: C X Calculate scores S(i,j) I X of optimal alignment of V X prefixes up to positions M X i and j. j R X E H Y T W L V - R - C I V M R
74
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X S(i,j) can be calculated from I X possible predecessors V X S(i-1,j-1), S(i,j-1), S(i-1,j). M X j R X E H Y T W L V - R - C I V M R
75
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from top left = V X M X S(i-1,j-1) + s(R,R) j R X E H Y T W L V - R - C I V M R
76
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from above = V X j-1 M X S(i,j-1) – g j R X E H Y T W L V R - - C I V M R
77
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i-1 i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from left = V X M X S(i-1,j) – g j R X X E H Y T W L - - V R - C I V M R -
78
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i-1 i T W L V R E A Q Y I X X C X Score of optimal path = I X V X Maximum of these three M X values j R X X E H Y T W L - - V R - C I V M R -
79
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for global alignment: For sequences x and y
80
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R C I V M R E H Y
81
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:
82
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:
83
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:
84
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:
85
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x H x x Y x x Fill matrix from top left to bottom right:
86
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x Y x x Fill matrix from top left to bottom right:
87
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x Fill matrix from top left to bottom right:
88
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:
89
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:
90
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:
91
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:
92
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x x V x x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:
93
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Fill matrix from top left to bottom right:
94
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Find optimal alignment by trace-back procedure
95
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x I x V x M x R x E x H x Y x Initial matrix entries?
96
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R X X C X Entries S(i,j) scores I X of optimal alignment of j V X prefixes up to positions M i and j. R E H Y T W L V - C I V
97
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R j X X X X X C Entries S(i,0) scores I of optimal alignment of V prefix up to positions M i and empty prefix. R E Score = - i* g H Y T W L V - - - -
98
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R C I V M R E H Y Initial matrix entries: Example, g = 2
99
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R 0 -2 -4 -6 -8 -10 C -2 I -4 V -6 M -8 R -10 E -12 H -14 Y -16 Initial matrix entries: Example, g = 2
100
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise global alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y
101
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise global alignment Computational complexity: how does program run time and memory depend on size of input data? l 1 and l 2 length of sequences: Computing time and memory proportional to l 1 * l 2 Time and memory complexity = O(l 1 * l 2 )
102
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment More realistic gap penalty: affine-linear instead of linear Penalty for gap of length l: c 0 + (l-1)* c 1 c 0 = ‘gap-opening penalty’ c 0 = ‘gap-extension penalty’
103
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment So far: global alignment considered: sequences aligned over their entire length. But: sequences often share only local sequence similarity (conserved genes or domains) Most important application: database searching
104
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - F Y
105
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y
106
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Problem: Find pair of segments with maximal alignment score (not necessarily part of optimal global alignment!) Equivalent: find path starting and ending anywhere in the matrix.
107
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y
108
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Recursion formula for global alignment: S(i,j) = max { S(i-1,j-i)+s(a i,b j ), S(i-1,j) – g, S(i,j-i) – g }
109
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Recursion formula for local alignment: S(i,j) = max { 0, S(i-1,j-i)+s(a i,b j ), S(i-1,j) – g, S(i,j-i) – g }
110
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R 0 0 0 0 0 0 C 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 Initial matrix entries = 0
111
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R 0 0 0 0 0 0 C 0 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 s(C,T) = -2
112
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for global alignment:
113
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for local alignment:
114
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment For trace-back: Store positions i max and j max with S(i max,jmax ) maximal
115
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y
116
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Algorithm by Smith and Waterman (1983) Implementation: e.g. BestFit in GCG package
117
6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Complexity: l 1 and l 2 length of sequences:computing time and memory proportional to l 1 * l 2 Time and space complexity = O(l 1 * l 2 ) Too slow for data base searching! Therefore tools like BLAST necessary for database searching
118
6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) New BLAST version (1997) Two-hit strategy Gapped BLAST Position-Specific Iterative BLAST (PSI BLAST)
119
6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) PSI BLAST: 1. search database with standard BLAST 2. take best hits and create multiple alignment 3. calculate profile from multiple alignment 4. search database again with profile as query
120
6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST)
121
6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) profile for sequence family or motif: table of amino acid/nucleotide frequencies at any position in alignment.
122
6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) Profile: frequencies of nucleotides at every position. seq1 A T T G – A T seq2 C T T G T A G seq3 A - - G T A T seq4 A T G G T G T seq5 A C T G T A C A 80 0 0 0 0 80 0 T 0 75 75 0 100 0 60 C 20 25 0 0 0 0 20 G 0 0 25 100 0 20 20
123
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 T Y I M R E A Q Y E S A Q s2 T C I V M R E A Y E s3 Y I M Q E V Q Q E R s4 W R Y I A M R E Q Y E
124
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -
125
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -
126
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -
127
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - General information in multiple alignment: Functionally important regions more conserved than non-functional regions Local sequence conservation indicates functionality!
128
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - For phylogeny reconstruction: Estimate pairwise distances between sequences (distance-based methods for tree reconstruction) Estimate evloutionary events in evolution (parsimony and maximum likelihood methods)
129
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!
130
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!
131
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Computer has to decide: which one is best??
132
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Questions in development of multiple-alignment programs (as in pairwise alignment): (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !
133
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Traditional Objective functions: Define Score of alignments as Sum of individual similarity scores S(a,b) Gap penalties Needleman-Wunsch scoring system (1970)
134
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Traditional Objective functions Can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment) Needleman-Wunsch algorithm can also be generalized to multiple alignment, but: Very time and memory consuming! -> Heuristics needed
135
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment First question: how to score multiple alignments? Possible scoring scheme: Sum-of-pairs score
136
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
137
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
138
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
139
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
140
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: Use sum of scores of these p.a. 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......
141
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment
142
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment
143
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Complexity: For sequences of length l 1 * l 2 * l 3 O( l 1 * l 2 * l 3 ) For n sequences ( average length l ): O( l n ) Exponential complexity!
144
6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment Optimal solution not feasible: -> Heuristics necessary
145
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP
146
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree
147
6/3/2015Burkhard Morgenstern, Tunis 2007 ` Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Idea: align closely related sequences first!
148
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
149
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
150
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
151
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
152
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment “Greedy” algorithm: Consider partial solution of bigger problem search best partial solution, fix solution search second-best partial solution that is consistent with first solution, fix solution Search third-best partial solution … etc. E.g.: Rucksack-Problem
153
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”
154
6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680 (~ 18.000 citations in the literature)
155
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.
156
6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif
157
6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS
158
6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S
159
6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S
160
6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Important methods for local multiple alignment: PIMA MEME/MAST Idea: expectation maximation.
161
6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Traditional alignment approaches: Either global or local methods!
162
6/3/2015Burkhard Morgenstern, Tunis 2007 New question: sequence families with multiple local similarities Neither local nor global methods appliccable
163
6/3/2015Burkhard Morgenstern, Tunis 2007 New question: sequence families with multiple local similarities Alignment possible if order conserved
164
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, 12098-12103 Combination of global and local methods Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)
165
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
166
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
167
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
168
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
169
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
170
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
171
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
172
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
173
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
174
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
175
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!
176
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa
177
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Define score of fragment f: l(f) = length of f s(f) = sum of matches (similarity values) P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences. Score w(f) = -ln P(f)
178
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Define score of fragment f: Define score of alignment as sum of scores of involved fragments No gap penalty!
179
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Goal in fragment-based alignment approach: find Consistent collection of fragments with maximum sum of weight scores
180
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment:
181
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment: recursive algorithm finds optimal chain of fragments.
182
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Pair-wise alignment: recursive algorithm finds optimal chain of fragments.
183
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming: Standard fragment-chaining algorithm Space-efficient algorithm
184
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
185
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaccctgaattgaagagtatcacataa (1) Calculate all optimal pair-wise alignments
186
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments
187
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments
188
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent
189
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
190
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
191
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
192
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
193
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
194
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
195
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent 1. Sort fragments according to scores 2. Include them one-by-one into growing multiple alignment – as long as they are consistent (greedy algorithm, comparable to knapsack problem)
196
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
197
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
198
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
199
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
200
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem
201
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem
202
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions
203
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions
204
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagt taaactcccccgtgcttag Cagtgcgtgtattact aacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions
205
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taata-----gttaaactcccccgtgcttag Cagtgcgtgtatta-----ctaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions
206
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions site x = [i,p] (sequence i, position p)
207
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions Calculate upper bound b l (x,i) and lower bound b u (x,i) for each x and sequence i
208
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions b l (x,i) and b u (x,i) updated for each new fragment in alignment
209
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment Efficient algorithm (Abdeddaim and Morgenstern, 2002)
210
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods
211
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach DIALIGN is available Online at BiBiServ (Bielefeld Bioinformatics Server) Downloadable UNIX/LINUX executables at BiBiServ Source code (email to BM)
212
6/3/2015Burkhard Morgenstern, Tunis 2007 Program input Program usage: > dialign2-2 [options] = multi-sequence file in FASTA-format
213
6/3/2015Burkhard Morgenstern, Tunis 2007 Program output DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: bmorgen@gwdg.de Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. For more information, please visit the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/ program call:./dialign2-2 -nt -anc s Aligned sequences: length: ================== ======= 1) dog_il4 300 2) bla 200 3) blu 200 Average seq. length: 233.3 Please note that only upper-case letters are considered to be aligned.
214
6/3/2015Burkhard Morgenstern, Tunis 2007 Program output Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777
215
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
216
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
217
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
218
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
219
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
220
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
221
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
222
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa
223
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
224
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaac----------ggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
225
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
226
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag------ cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa--
227
6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa
228
6/3/2015Burkhard Morgenstern, Tunis 2007 Program output Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777
229
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.
230
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Problem with “progressive” approaches: Strictly global alignments Use only pair-wise comparison
231
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Idea: Start with local and global pair-wise alignments (“primary library” of alignments) Construct “scondary library” of residues that are indirectly aligned by primary library. Re-score residue pairs Construct final alignment with “progressive” method
232
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Advantage: Combination of local and global approaches Less sensitive against mis-alignments in progressive proceedure
233
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE
234
6/3/2015Burkhard Morgenstern, Tunis 2007
235
6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE T-COFFEE and DIALIGN: Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL
236
6/3/2015Burkhard Morgenstern, Tunis 2007 Most multi-alignment approaches automated, i.e. based on algorithmic rules. Two components: Objective function: assess alignment quality Optimization algorithm: find optimal or near-optimal alignment
237
6/3/2015Burkhard Morgenstern, Tunis 2007 Fully automated alignment programs necessary f no expert knowledge available if large amounts of data to be analyzed But: Often no biologically reasonable results Often additional information about homologies etc. available
238
6/3/2015Burkhard Morgenstern, Tunis 2007 Idea for improved alignment Use expert knowledge to influence alignment procedure DIALIGN with user-defined anchor points
239
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences Alignment of large genomic sequences to identify functional elements (phylogenetic footprinting) Göttgens et al., 2000, 2001, 2002, … Pollard et al., 2004 DIALIGN, MGA, PipMaker, LAGAN, AVID, Mummer, WABA, …
240
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)
241
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences
242
6/3/2015Burkhard Morgenstern, Tunis 2007 DIALIGN alignment of human and murine genomic sequences
243
6/3/2015Burkhard Morgenstern, Tunis 2007 DIALIGN alignment of tomato and Thaliana genomic sequences
244
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)
245
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster:
246
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but
247
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but Entire genes totally mis-aligned
248
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but Entire genes totally mis-aligned Reason for mis-alignment: duplications !
249
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences The Hox gene cluster: 4 Hox gene clusters in pufferfish. 14 genes, different genes in different clusters!
250
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences The Hox gene cluster: Complete mis-alignment of entire genes!
251
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2
252
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Conserved motivs; no similarity outside motifs
253
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences
254
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences
255
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences
256
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Mis-alignment would have lower score!
257
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence
258
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence
259
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence Possible mis-alignment
260
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3
261
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3
262
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3
263
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3
264
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Consistency problem S3S3
265
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 More plausible alignment – and higher score : S3S3
266
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Consistency problem S3S3
267
6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Alternative alignment; probably biologically wrong; lower numerical score! S3S3
268
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment Biologically meaningful alignment not possible by automated approaches. Idea: use expert knowledge to guide alignment procedure User defines a set anchor points that are to be „respected“ by the alignment procedure
269
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
270
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT
271
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Use known homology as anchor point
272
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Use known homology as anchor point Anchor point = anchored fragment (gap-free pair of segments) Remainder of sequences aligned automatically
273
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Alignment of anchored positions a and b not enforced – a and b may be un-aligned –, but: a is only residue that can be aligned to b Residues left of a aligned with residues left of b
274
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment -------NLF VALYDFVASG DNTLSITKGE klrvlgynhn iihredkGVI YALWDYEPQN DDELPMKEGD cmt------- Anchored alignment
275
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment
276
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQND DELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment
277
6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment -------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT------- -------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS-- Anchored multiple alignment
278
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Goal: Find optimal alignment (=consistent set of fragments) under costraints given by user- specified anchor points!
279
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions
280
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS
281
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions
282
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences Algorithmic questions
283
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions Algorithmic questions
284
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length Algorithmic questions
285
6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length score Algorithmic questions
286
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Requirements: Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points
287
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
288
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
289
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Inconsistent anchor points!
290
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaat---agttaaactcccccgtgcttag Cagtgcgtgtattac-taacggttcaatcgcg caaagagtatcacccctgaattgaataa Inconsistent anchor points!
291
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Requirements: Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points Find alignment under constraints given by anchor points!
292
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Use data structures from multiple alignment
293
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa
294
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Greedy procedure for multiple alignment
295
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Greedy procedure for multiple alignment
296
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Question: which positions are still alignable ?
297
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x For each position x and each sequence S i exist an upper bound ub(x,i) and a lower bound lb(x,i) for residues y in S i that are alignable with x
298
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x For each position x and each sequence S i exist an upper bound ub(x,i) and a lower bound lb(x,i) for residues y in S i that are alignable with x
299
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure
300
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i)
301
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure
302
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure
303
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Anchor points treated like fragments in greedy algorithm: Sorted according to user-defined scores Accepted if consistent with previously accepted anchors ub(x,i) and lb(x,i) updated during greedy procedure Resulting values of ub(x,i) and lb(x,i) used as initial values for alignment procedure
304
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i)
305
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i) calculated using anchor points
306
6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Ranking of anchor points to prioritize anchor points, e.g. anchor points from verified homologies -- higher priority automatically created anchor points (using CHAOS, BLAST, … ) -- lower priority
307
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster
308
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Use gene boundaries as anchor points
309
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Use gene boundaries as anchor points + CHAOS / BLAST hits
310
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster no anchoring anchoring Ali. Columns 2 seq 2958 3674 3 seq 668 1091 4 seq 244 195 Score 1166 1007 CPU time 4:22 0:19
311
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Example: Teleost Hox gene cluster: Score of anchored alignment 15 % higher than score of non-anchored alignment ! Conclusion: Greedy optimization algorithm does a bad job!
312
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Improvement of Alignment programs Two possible reasons for mis-alignments: Wrong objective function: Biologically correct alignment gets bad numerical score Bad optimization algorithms: Biologically correct alignment gets best numerical score, but algorithm fails to find this alignment
313
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Improvement of Alignment programs Two possible reasons for mis-alignments: Anchored alignments can help to decide
314
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment
315
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aa----CCCC AGC---GUAa gucgcuaucc a cacucuCCCA AGC---GGAG Aac------- - ccg----CCA AaagauGGCG Acuuga---- - non-anchored alignment
316
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aa----CCCC AGC---GUAa gucgcuaucc a cacucuCCCA AGC---GGAG Aac------- - ccg----CCA AaagauGGCG Acuuga---- - structural motif mis-aligned
317
6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aaCCCCAGCG UAAGUCGCUA UCca-- --CACUCUCC CAAGCGGAGA AC---- ----CCGCCA AAAGAUGGCG ACuuga 3 conserved nucleotides as anchor points
318
6/3/2015Burkhard Morgenstern, Tunis 2007 WWW interface at GOBICS (Göttingen Bioinformatics Compute Server)
319
6/3/2015Burkhard Morgenstern, Tunis 2007 WWW interface at GOBICS (Göttingen Bioinformatics Compute Server)
320
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes
321
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes Goal: find location and structure of protein-coding genes in eukaryotic genome sequences.
322
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatct gtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgt agtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagctt cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacg tacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagc atctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggc tagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagt cttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtag tcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtct atggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatt tttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtat gctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgta gtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatg gctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtattttt ctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt agctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgta gtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgta gtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatc tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag
323
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatct gtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgt agtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagctt cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacg tacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagc atctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggc tagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagt cttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtag tcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtct atggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatt tttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtat gctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgta gtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatg gctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtattttt ctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt agctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgta gtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgta gtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatc tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag
324
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes
325
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes Three different approaches to computational gene- finding: Intrinsic: use statistical information about known genes (Hidden Markov Models) Extrinsic: compare genomic sequence with known proteins / genes Cross-species sequence comparison: search for similarities among genomes
326
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Generative probabilistic model for sequence of observations („symbols“). Finite set of states States can emit symbols Transitions between states possible Sequence generated by path between states
327
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Example: The occasionally dishonest casino. 3 5 6 6 6 4 6 5 1 6 5 1 2 F F U U U U U F F F F F F Possible states: fair (F); unfair (U); begin (B); end (E)
328
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Assumptions: Emission probabilities known; depend only on current state. Transition probabilities known, depend only on current state
329
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction F U EB
330
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction 3 5 6 6 6 4 6 5 1 6 5 1 2 s B F F U U U U U F F F F F F E φ For sequence s and parse φ: P(φ) probability of φ P(φ,s) joint probability of φ and s = P(φ) * P(s|φ) P(φ|s) a-posteriori probability of φ
331
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction 3 5 6 6 6 4 6 5 1 6 5 1 2 B F F U U U U U F F F F F F E Goal: find path φ with maximum a-posteriori probability P(φ|s) Idea: find path that maximizes joint probability P(φ,s) by dynamic programming (Viterbi algorithm)
332
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Application to gene prediction: A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse) Introns, exons etc modeled as states in GHMM („generalized HMM“) Given sequence s, find parse that maximizes P(φ|s) (S. Karlin and C. Burge, 1997)
333
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Application to gene prediction: A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse) Introns, exons etc modeled as states in GHMM („generalized HMM“) Given sequence s, find parse that maximizes P(φ|s) (S. Karlin and C. Burge, 1997)
334
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Basic model for GHMM-based intrinsic gene finding comparable to GenScan (M. Stanke)
335
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS
336
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS
337
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Features of AUGUSTUS: Intron length model Initial pattern for exons Similarity-based weighting for splice sites Interpolated HMM Internal 3’ content model
338
6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction A T A A T G C C T A G T C s (DNA) Z Z Z E E E E I I I I φ (parse) Explicit intron length model computationally expensive.
339
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Intron length model: Explicit length distribution for short introns Geometric tail for long introns Intron (fixed) Exon Intron (expl.) Exon Intron (geo.)
340
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS
341
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Extension of AUGUSTUS using include extrinsic information: Protein sequences EST sequences Syntenic genomic sequences User-defined constraints
342
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Comparison of genomic sequences (human and mouse)
343
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
344
6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting
345
6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting
346
6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Standard score: Consider length, # matches, compute probability of random occurrence Gene prediction by phylogenetic footprinting
347
6/3/2015Burkhard Morgenstern, Tunis 2007 Translation option: catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting
348
6/3/2015Burkhard Morgenstern, Tunis 2007 Translation option: L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I DNA segments translated to peptide segments; fragment score based on peptide similarity: Calculate probability of finding a fragment of the same length with (at least) the same sum of BLOSUM values Gene prediction by phylogenetic footprinting
349
6/3/2015Burkhard Morgenstern, Tunis 2007 P-fragment (in both orientations) L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg For each fragment f three probability values calculated; Score of f based on smallest P value. Gene prediction by phylogenetic footprinting
350
6/3/2015Burkhard Morgenstern, Tunis 2007 P-fragment (in both orientations) L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg P-fragments associated with strand and reading frame! Gene prediction by phylogenetic footprinting
351
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
352
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting AGenDA: Alignment-based Gene Detection Algorithm
353
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Fragments in DIALIGN alignment
354
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Build cluster of fragments
355
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Identify conserved splice sites
356
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Candidate exons bounded by conserved splice sites Find optimal chain of candidate exons
357
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
358
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
359
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
360
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting
361
6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting AGenDA GenScan 64 % 12 % 17 %
362
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Extended GHMM using extrinsic information Additional input data: collection h of `hints’ about possible gene structure φ for sequence s Consider s, φ and h result of random process. Define probability P(s,h,φ) Find parse φ that maximizes P(φ|s,h) for given s and h.
363
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints created using Alignments to EST sequences Alignments to protein sequences Combined EST and protein alignment (EST alignments supported by protein alignments) Alignments of genomic sequences User-defined hints
364
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment to EST: hint to (partial) exon EST G1
365
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ EST alignment supported by protein: hint to exon (part), start codon EST G1 Protein
366
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment to ESTs, Proteins: hints to introns, exons ESTs, Protein G1
367
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment of genomic sequences: hint to (partial) exon G2 G1
368
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Consider different types of hints: type of hints: start, stop, dss, ass, exonpart, exon, introns Hint associated with position i in s (exons etc. associated with right end position) max. one hint of each type allowed per position in s Each hint associated with a grade g that indicates its source or reliability.
369
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ h i,t = information about hint of type t at position i h i,t = $ if no hint of type t available at i h i,t = [grade, strand, (length, reading frame)] if hint available (hints created by protein alignments or DIALIGN contain information about reading frame)
370
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Standard program version, without hints A T A A T G C C T A G T C s (sequence) Z Z Z E E E E E E I I I I φ (parse) Find parse that maximizes P(φ|s)
371
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ AUGUSTUS+ using hints A T A A T G C C T A G T C s (sequence) $ $ $ $ $ $ $ X $ $ $ $ $ h (type 1) $ $ $ $ $ $ $ $ $ $ $ $ $ h (type 2) $ $ $ $ X $ $ $ $ $ $ $ $ h (type 3).. Z Z Z E E E E E E I I I I φ (parse) Find parse that maximizes P(φ|s,h)
372
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ As in standard HMM theory: maximize joint probability P(φ,s,h) How to define P(φ,s,h) ?
373
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).
374
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).
375
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).
376
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Assumption: P(h i,t |φ,s) depends on type t, grade g and whether h i,t is compatible with φ or s. Example: h i,t hint to exon E h i,t compatible with parse φ if E part of φ. h i,t compatible with sequence s if start and stop codons exist according to E and if no internal stop codon in E exists
377
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ For given g and t: 3 possible values for P(h i,t |φ,s) P(h i,t |φ,s) = q + (t,g) if h i,t compatible with φ P(h i,t |φ,s) = q - (t,g) if h i,t compatible with s but not compatible with φ P(h i,t |φ,s) = 0 if h i,t not compatible with s Values learned from training data
378
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Results: Gene (sub-)structures supported by hints receive bonus compared to non-supported structures Gene (sub-)structures not supported by hints receive malus (M. Stanke et al. 2006, BMC Bioinformatics)
379
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
380
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ h, h’ collections of hints; h’ i,t = h i,t for (i,t) ≠ (I,T) h’ I,T ≠ h I,T = $; g grade of h’ I,T φ +, φ - gene structures on s h’ IT compatible with φ +, but not with φ -
381
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
382
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
383
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
384
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
385
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
386
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
387
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+
388
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Result: i.e. structure φ +, which is compatible with additional hint h’ IT receives relative bonus
389
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Results (gene level) on data set sag178 % SN % SP Augustus 42 38 GenScan 18 14 GeneID 17 17 HMMGene 20 7 Aug. + EST 49 46 Aug. + prot 71 68 Aug. combined 68 65 Aug. all 82 79 GenomeScan 37 38 TwinScan 20 25
390
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Using hints from DIALIGN alignments: 1. Obtain large human/mouse sequence pairs (up to 50kb) from UCSC 2. Run CHAOS to find anchor points 3. Run DIALIGN using CHAOS anchor points 4. Create hints h from DIALIGN fragments 5. Run AUGUSTUS with hints
391
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints from DIALIGN fragments: Segment covered by peptide fragment minus 33 bp at both ends defines exon part hint on all 6 reading frames.
392
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints from DIALIGN fragments: Consider fragments with score ≥ 20 Distinguish high scores (≥ 45) from low scores Consider reading frame given by DIALIGN Consider strand given by DIALIGN => 2*2*2 = 8 grades
393
6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ AUGUSTUS best ab-initio method at EGASP
394
6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results
395
6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results
396
6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results
397
6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results
398
6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results
399
6/3/2015Burkhard Morgenstern, Tunis 2007 Ongoing projects Brugia malayi (TIGR) Aedes aegypti (TIGR) Schistosoma mansoni (TIGR) Tetrahymena thermophilia (TIGR) Galdieria Sulphuraria (Michigan State Univ.) Coprinus cinereus (Univ. Göttingen) Tribolium castaneum (Univ. Göttingen)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.