Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and.

Similar presentations


Presentation on theme: "6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and."— Presentation transcript:

1 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Tunis, March 2007

2 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching http://www.gobics.de/ burkhard/teaching/tunis_07.php

3 6/3/2015Burkhard Morgenstern, Tunis 2007 www.gobics.de/burkhard/ teaching/tunis_07.php

4 6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell

5 6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell Idea: Sequence -> Structure -> Function

6 6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell

7 6/3/2015Burkhard Morgenstern, Tunis 2007 Information flow in the cell gap between sequence and structure/function data Lots of data available at the sequence level Fewer data at the structure and function level

8 6/3/2015Burkhard Morgenstern, Tunis 2007 Exponential growth of data bases

9 6/3/2015Burkhard Morgenstern, Tunis 2007 Major goal of bioinformatics: close the gap between sequence information and structure/function information Most important tool for sequence analysis: sequence comparison Simple approach: dot plot, more advanced approach: sequence alignment

10 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot

11 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Gibbs and McIntyre (1970)

12 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y Two sequences to be compared

13 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y Comparison matrix

14 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V M R E Q Y Search pairs of identical residues

15 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Dot plot: dot ( X ) for all pairs of identical residues

16 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

17 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Homologies as diagonal lines from top-left to bottom-right corner

18 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X Inversions as diagonals from bottom left to top right

19 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X Repeats as parallel diagonals

20 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X

21 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot Advantages: 1. Various types of similarity detectable (repeats, inversions) 2. Useful for large-scale analysis Use filtering for long sequeces: dots represent matching segments instead of matching single residues

22 6/3/2015Burkhard Morgenstern, Tunis 2007 The dot plot

23 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Evolutionary or structurally related sequences: alignment possible Sequence homologies represented by inserting gaps

24 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Two input sequences

25 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Comparison matrix for two sequences

26 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X Dot plot for two sequences

27 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X X A X Q X Y X X Similarities in same relative order over entire seqences

28 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I X V X M R X E X A Q X Y X Global alignment of sequences possible

29 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Alignment corresponds to path through comparison matrix

30 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Matches (red), mis-matches (green), gaps (blue)

31 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C X X I X V X M X R X E X A X Q X Y X X Matches (red), mis-matches (green), gaps (blue)

32 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment (global) alignment: write sequences on top of each other, gaps represented by dash symbols

33 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I V M R E A Q Y Input sequences

34 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – alignment of input sequences

35 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y - alignment consists matches (red), mismatches (green) and gaps (blue)

36 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Basic task: Find ‘best’ alignment of two sequences = alignment that reflects structural and evolutionary relations

37 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Questions: 1. What is a good alignment? 2. How to find the best alignment?

38 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A Q Y – Idea: consider alignment as hypothesis about evolution of sequences. gaps correspond to insertions/deletions mismatches correspond to substitutions

39 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C - I V M R E A - Q Y Problem: astronomical number of possible alignments

40 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E Q Y E C I - V M R E A Q Y Problem: astronomical number of possible alignments

41 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – Problem: astronomical number of possible alignments stupid computer has to find out: which alignment is best ??

42 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches

43 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – General assumption: sequences not too distantly related. In this case: mismatches (substitutions) and gaps (insertions/deletions) unlikely Consequence: good alignment should reduce gaps and mismatches

44 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C I - V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches

45 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches

46 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E - C I V M R E A Q Y – First (simplified) rules: 1. minimize number of mismatches 2. maximize number of matches

47 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E - Q Y E C I - V M R E A Q Y – Second (simplified) rule: minimize number of gaps

48 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V - A R E - Q Y E C I - V M - R E A Q Y – Second (simplified) rule: minimize number of gaps Parsimony principle: minimize number of evolutionary events

49 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment For protein sequences: different degrees of similarity among amino acids. counting matches/mismatches oversimplistic

50 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T L V Protein sequences to be aligned

51 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T L - V Possible alignment

52 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Alternative alignment

53 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Some amino acid residues are more similar to each other than others Therefore: similarity among amino acid residues has to be taken into account.

54 6/3/2015Burkhard Morgenstern, Tunis 2007

55 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V To assess quality of protein alignments: use similarity scores for amino acids s(a,b) similarity score for amino acids a and b

56 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Similarity measured by substitution matrices based on substitution probabilities Important substitution matrices: PAM (M. Dayhoff) BLOSUM (S. Henikoff / J. Henikoff)

57 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment The PAM matrix: Consider probability p a,b of substitution a → b (or b → a) for amino acids a and b Define for amino acids a and b similarity score S(a,b) based on probability p a,b First task: find out p a,b for every pair of amino acids a, b

58 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment The PAM matrix: Use closely related protein families – no alignment problem, no double substitutions Construct phylogenetic tree with parsimony method Count substitution frequencies/probabilities Normalize substitution probabilities Extrapolate probabilities for larger evolutionary distances

59 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Finally: define similarity score S(a,b) = log (p a,b / q a q b ) q a = (relative) frequency of amino acid a

60 6/3/2015Burkhard Morgenstern, Tunis 2007

61 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Given a similarity score s(a,b) for pairs of amino acids, define quality score of alignment as: sum of similarity values s(a,b) of aligned residues minus gap penalty g for each residue aligned with a gap

62 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Example: Score = s(T,T) + s(I,L) + s (V,V) - g

63 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V T - L V Next question: find alignment with best score Dynamic-programming algorithm finds alignment with best score. (Needleman and Wunsch, 1970)

64 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Alignment corresponds to path through comparison matrix

65 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

66 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E X X C X I X V X M X R X E X X Q X Y X X

67 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T Y I V A R E A Q Y E - C I V M R E - Q Y – Alignment corresponds to path through comparison matrix

68 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V - R E A Q I - C I V M R E - H Y

69 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Score of alignment: Sum of similarity values of aligned residues minus gap penatly T W L V - R E A Q I - C I V M R E - H Y

70 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Example: S = - g + s(W,C) + s(L,L) + s(V,V) - g + s(R,R) … T W L V - R E A Q I - C I V M R E - H Y

71 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - H Y

72 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R E A Q Y I X X C X Alignment corresponds I X to path through V X comparison matrix M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - H Y

73 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X Dynamic programming: C X Calculate scores S(i,j) I X of optimal alignment of V X prefixes up to positions M X i and j. j R X E H Y T W L V - R - C I V M R

74 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X S(i,j) can be calculated from I X possible predecessors V X S(i-1,j-1), S(i,j-1), S(i-1,j). M X j R X E H Y T W L V - R - C I V M R

75 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from top left = V X M X S(i-1,j-1) + s(R,R) j R X E H Y T W L V - R - C I V M R

76 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from above = V X j-1 M X S(i,j-1) – g j R X E H Y T W L V R - - C I V M R

77 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i-1 i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from left = V X M X S(i-1,j) – g j R X X E H Y T W L - - V R - C I V M R -

78 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i-1 i T W L V R E A Q Y I X X C X Score of optimal path = I X V X Maximum of these three M X values j R X X E H Y T W L - - V R - C I V M R -

79 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for global alignment: For sequences x and y

80 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R C I V M R E H Y

81 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

82 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

83 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

84 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

85 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x H x x Y x x Fill matrix from top left to bottom right:

86 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x Y x x Fill matrix from top left to bottom right:

87 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x Fill matrix from top left to bottom right:

88 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

89 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

90 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

91 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x x V x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

92 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x C x x x x I x x x x V x x x x M x x x R x x x E x x x H x x x Y x x x Fill matrix from top left to bottom right:

93 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Fill matrix from top left to bottom right:

94 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Find optimal alignment by trace-back procedure

95 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R x x x x x x C x I x V x M x R x E x H x Y x Initial matrix entries?

96 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R X X C X Entries S(i,j) scores I X of optimal alignment of j V X prefixes up to positions M i and j. R E H Y T W L V - C I V

97 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment i T W L V R j X X X X X C Entries S(i,0) scores I of optimal alignment of V prefix up to positions M i and empty prefix. R E Score = - i* g H Y T W L V - - - -

98 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R C I V M R E H Y Initial matrix entries: Example, g = 2

99 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment T W L V R 0 -2 -4 -6 -8 -10 C -2 I -4 V -6 M -8 R -10 E -12 H -14 Y -16 Initial matrix entries: Example, g = 2

100 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise global alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y

101 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise global alignment Computational complexity: how does program run time and memory depend on size of input data? l 1 and l 2 length of sequences: Computing time and memory proportional to l 1 * l 2 Time and memory complexity = O(l 1 * l 2 )

102 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment More realistic gap penalty: affine-linear instead of linear Penalty for gap of length l: c 0 + (l-1)* c 1 c 0 = ‘gap-opening penalty’ c 0 = ‘gap-extension penalty’

103 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment So far: global alignment considered: sequences aligned over their entire length. But: sequences often share only local sequence similarity (conserved genes or domains) Most important application: database searching

104 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X T W L V - R E A Q I - C I V M R E - F Y

105 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y

106 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Problem: Find pair of segments with maximal alignment score (not necessarily part of optimal global alignment!) Equivalent: find path starting and ending anywhere in the matrix.

107 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y

108 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Recursion formula for global alignment: S(i,j) = max { S(i-1,j-i)+s(a i,b j ), S(i-1,j) – g, S(i,j-i) – g }

109 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Recursion formula for local alignment: S(i,j) = max { 0, S(i-1,j-i)+s(a i,b j ), S(i-1,j) – g, S(i,j-i) – g }

110 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R 0 0 0 0 0 0 C 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 Initial matrix entries = 0

111 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R 0 0 0 0 0 0 C 0 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 s(C,T) = -2

112 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for global alignment:

113 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment Recursion formula for local alignment:

114 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise sequence alignment For trace-back: Store positions i max and j max with S(i max,jmax ) maximal

115 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X T W L V - R E A Q I - C I V M R E - F Y

116 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Algorithm by Smith and Waterman (1983) Implementation: e.g. BestFit in GCG package

117 6/3/2015Burkhard Morgenstern, Tunis 2007 Pair-wise local alignment Complexity: l 1 and l 2 length of sequences:computing time and memory proportional to l 1 * l 2 Time and space complexity = O(l 1 * l 2 ) Too slow for data base searching! Therefore tools like BLAST necessary for database searching

118 6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) New BLAST version (1997) Two-hit strategy Gapped BLAST Position-Specific Iterative BLAST (PSI BLAST)

119 6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) PSI BLAST: 1. search database with standard BLAST 2. take best hits and create multiple alignment 3. calculate profile from multiple alignment 4. search database again with profile as query

120 6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST)

121 6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) profile for sequence family or motif: table of amino acid/nucleotide frequencies at any position in alignment.

122 6/3/2015Burkhard Morgenstern, Tunis 2007 The Basic Local Alignment Search Tool (BLAST) Profile: frequencies of nucleotides at every position. seq1 A T T G – A T seq2 C T T G T A G seq3 A - - G T A T seq4 A T G G T G T seq5 A C T G T A C A 80 0 0 0 0 80 0 T 0 75 75 0 100 0 60 C 20 25 0 0 0 0 20 G 0 0 25 100 0 20 20

123 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 T Y I M R E A Q Y E S A Q s2 T C I V M R E A Y E s3 Y I M Q E V Q Q E R s4 W R Y I A M R E Q Y E

124 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -

125 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -

126 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - -

127 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - General information in multiple alignment: Functionally important regions more conserved than non-functional regions Local sequence conservation indicates functionality!

128 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - For phylogeny reconstruction: Estimate pairwise distances between sequences (distance-based methods for tree reconstruction) Estimate evloutionary events in evolution (parsimony and maximum likelihood methods)

129 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

130 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

131 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment s1 - T Y I - M R E A Q Y E S A Q s2 - T C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - Computer has to decide: which one is best??

132 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Questions in development of multiple-alignment programs (as in pairwise alignment): (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !

133 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Traditional Objective functions: Define Score of alignments as Sum of individual similarity scores S(a,b) Gap penalties Needleman-Wunsch scoring system (1970)

134 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Traditional Objective functions Can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment) Needleman-Wunsch algorithm can also be generalized to multiple alignment, but: Very time and memory consuming! -> Heuristics needed

135 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment First question: how to score multiple alignments? Possible scoring scheme: Sum-of-pairs score

136 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

137 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

138 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

139 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

140 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Multiple alignment implies pairwise alignments: Use sum of scores of these p.a. 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

141 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment

142 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment

143 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Complexity: For sequences of length l 1 * l 2 * l 3 O( l 1 * l 2 * l 3 ) For n sequences ( average length l ): O( l n ) Exponential complexity!

144 6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple sequence alignment Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment Optimal solution not feasible: -> Heuristics necessary

145 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

146 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

147 6/3/2015Burkhard Morgenstern, Tunis 2007 ` Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Idea: align closely related sequences first!

148 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

149 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

150 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

151 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

152 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment “Greedy” algorithm: Consider partial solution of bigger problem search best partial solution, fix solution search second-best partial solution that is consistent with first solution, fix solution Search third-best partial solution … etc. E.g.: Rucksack-Problem

153 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

154 6/3/2015Burkhard Morgenstern, Tunis 2007 `Progressive´ Alignment Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680 (~ 18.000 citations in the literature)

155 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.

156 6/3/2015Burkhard Morgenstern, Tunis 2007 Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif

157 6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS

158 6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S

159 6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S

160 6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Important methods for local multiple alignment: PIMA MEME/MAST Idea: expectation maximation.

161 6/3/2015Burkhard Morgenstern, Tunis 2007 Local sequence alignment Traditional alignment approaches: Either global or local methods!

162 6/3/2015Burkhard Morgenstern, Tunis 2007 New question: sequence families with multiple local similarities Neither local nor global methods appliccable

163 6/3/2015Burkhard Morgenstern, Tunis 2007 New question: sequence families with multiple local similarities Alignment possible if order conserved

164 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, 12098-12103 Combination of global and local methods Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

165 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

166 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

167 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

168 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

169 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

170 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

171 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

172 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

173 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

174 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

175 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!

176 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa

177 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Define score of fragment f: l(f) = length of f s(f) = sum of matches (similarity values) P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences. Score w(f) = -ln P(f)

178 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Define score of fragment f: Define score of alignment as sum of scores of involved fragments No gap penalty!

179 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Score of an alignment: Goal in fragment-based alignment approach: find Consistent collection of fragments with maximum sum of weight scores

180 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment:

181 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment: recursive algorithm finds optimal chain of fragments.

182 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Pair-wise alignment: recursive algorithm finds optimal chain of fragments.

183 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming: Standard fragment-chaining algorithm Space-efficient algorithm

184 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

185 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaccctgaattgaagagtatcacataa (1) Calculate all optimal pair-wise alignments

186 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

187 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

188 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent

189 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

190 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

191 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

192 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

193 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

194 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

195 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent 1. Sort fragments according to scores 2. Include them one-by-one into growing multiple alignment – as long as they are consistent (greedy algorithm, comparable to knapsack problem)

196 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

197 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

198 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

199 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

200 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem

201 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem

202 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

203 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

204 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagt taaactcccccgtgcttag Cagtgcgtgtattact aacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

205 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taata-----gttaaactcccccgtgcttag Cagtgcgtgtatta-----ctaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

206 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions site x = [i,p] (sequence i, position p)

207 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions Calculate upper bound b l (x,i) and lower bound b u (x,i) for each x and sequence i

208 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions b l (x,i) and b u (x,i) updated for each new fragment in alignment

209 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment Efficient algorithm (Abdeddaim and Morgenstern, 2002)

210 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods

211 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach DIALIGN is available Online at BiBiServ (Bielefeld Bioinformatics Server) Downloadable UNIX/LINUX executables at BiBiServ Source code (email to BM)

212 6/3/2015Burkhard Morgenstern, Tunis 2007 Program input Program usage: > dialign2-2 [options] = multi-sequence file in FASTA-format

213 6/3/2015Burkhard Morgenstern, Tunis 2007 Program output DIALIGN 2.2.1 ************* Program code written by Burkhard Morgenstern and Said Abdeddaim e-mail contact: bmorgen@gwdg.de Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. For more information, please visit the DIALIGN home page at http://bibiserv.techfak.uni-bielefeld.de/dialign/ program call:./dialign2-2 -nt -anc s Aligned sequences: length: ================== ======= 1) dog_il4 300 2) bla 200 3) blu 200 Average seq. length: 233.3 Please note that only upper-case letters are considered to be aligned.

214 6/3/2015Burkhard Morgenstern, Tunis 2007 Program output Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

215 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

216 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

217 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

218 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

219 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

220 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

221 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

222 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

223 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

224 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaac----------ggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

225 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

226 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag------ cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa--

227 6/3/2015Burkhard Morgenstern, Tunis 2007 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

228 6/3/2015Burkhard Morgenstern, Tunis 2007 Program output Alignment (DIALIGN format): =========================== dog_il4 1 cagg------ ----GTTTGA atctgataca ttgc------ ---------- bla 1 ctga------ ---------- ---------- --------GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG 0000000000 0000000000 0000000000 0000000011 1111111111 dog_il4 25 ---------- --ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC 0000000000 0000000000 0000000000 0000000000 0000000000 dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla 63 ---------- ---------- ---TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat- ---------- 0000000000 0000000000 0009999999 9999999888 8888888888 dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu 140 ---------- ---------- ---------- ---GGGGTGG CCTTAGGCTC 8888888888 8888888800 0000000000 0007777777 7777777777

229 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.

230 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Problem with “progressive” approaches: Strictly global alignments Use only pair-wise comparison

231 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Idea: Start with local and global pair-wise alignments (“primary library” of alignments) Construct “scondary library” of residues that are indirectly aligned by primary library. Re-score residue pairs Construct final alignment with “progressive” method

232 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE Advantage: Combination of local and global approaches Less sensitive against mis-alignments in progressive proceedure

233 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE

234 6/3/2015Burkhard Morgenstern, Tunis 2007

235 6/3/2015Burkhard Morgenstern, Tunis 2007 T-COFFEE T-COFFEE and DIALIGN: Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL

236 6/3/2015Burkhard Morgenstern, Tunis 2007 Most multi-alignment approaches automated, i.e. based on algorithmic rules. Two components: Objective function: assess alignment quality Optimization algorithm: find optimal or near-optimal alignment

237 6/3/2015Burkhard Morgenstern, Tunis 2007 Fully automated alignment programs necessary f no expert knowledge available if large amounts of data to be analyzed But: Often no biologically reasonable results Often additional information about homologies etc. available

238 6/3/2015Burkhard Morgenstern, Tunis 2007 Idea for improved alignment Use expert knowledge to influence alignment procedure DIALIGN with user-defined anchor points

239 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences Alignment of large genomic sequences to identify functional elements (phylogenetic footprinting) Göttgens et al., 2000, 2001, 2002, … Pollard et al., 2004 DIALIGN, MGA, PipMaker, LAGAN, AVID, Mummer, WABA, …

240 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

241 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences

242 6/3/2015Burkhard Morgenstern, Tunis 2007 DIALIGN alignment of human and murine genomic sequences

243 6/3/2015Burkhard Morgenstern, Tunis 2007 DIALIGN alignment of tomato and Thaliana genomic sequences

244 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004)

245 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster:

246 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but

247 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but Entire genes totally mis-aligned

248 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences DIALIGN used by tracker for phylogenetic footprinting (Prohaska et al., 2004) Alignment of Hox gene cluster: DIALIGN able to identify small regulatory elements, but Entire genes totally mis-aligned Reason for mis-alignment: duplications !

249 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences The Hox gene cluster: 4 Hox gene clusters in pufferfish. 14 genes, different genes in different clusters!

250 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of large genomic sequences The Hox gene cluster: Complete mis-alignment of entire genes!

251 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2

252 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Conserved motivs; no similarity outside motifs

253 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences

254 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences

255 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in two sequences

256 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Mis-alignment would have lower score!

257 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence

258 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence

259 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence Possible mis-alignment

260 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3

261 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3

262 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3

263 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Duplication in one sequence S3S3

264 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Consistency problem S3S3

265 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 More plausible alignment – and higher score : S3S3

266 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Consistency problem S3S3

267 6/3/2015Burkhard Morgenstern, Tunis 2007 Alignment of sequence duplications S1S1 S2S2 Alternative alignment; probably biologically wrong; lower numerical score! S3S3

268 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment Biologically meaningful alignment not possible by automated approaches. Idea: use expert knowledge to guide alignment procedure User defines a set anchor points that are to be „respected“ by the alignment procedure

269 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

270 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

271 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Use known homology as anchor point

272 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Use known homology as anchor point Anchor point = anchored fragment (gap-free pair of segments) Remainder of sequences aligned automatically

273 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT Alignment of anchored positions a and b not enforced – a and b may be un-aligned –, but: a is only residue that can be aligned to b Residues left of a aligned with residues left of b

274 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment -------NLF VALYDFVASG DNTLSITKGE klrvlgynhn iihredkGVI YALWDYEPQN DDELPMKEGD cmt------- Anchored alignment

275 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment

276 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQND DELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS Anchor points in multiple alignment

277 6/3/2015Burkhard Morgenstern, Tunis 2007 Anchored sequence alignment -------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT------- -------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS-- Anchored multiple alignment

278 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Goal: Find optimal alignment (=consistent set of fragments) under costraints given by user- specified anchor points!

279 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions

280 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

281 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Algorithmic questions

282 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences Algorithmic questions

283 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions Algorithmic questions

284 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length Algorithmic questions

285 6/3/2015Burkhard Morgenstern, Tunis 2007 Additional input file with anchor points: 1 3 215 231 5 4.5 2 3 34 78 23 1.23 1 4 317 402 8 8.5 Sequences start positions length score Algorithmic questions

286 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Requirements: Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points

287 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

288 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

289 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Inconsistent anchor points!

290 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaat---agttaaactcccccgtgcttag Cagtgcgtgtattac-taacggttcaatcgcg caaagagtatcacccctgaattgaataa Inconsistent anchor points!

291 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Requirements: Anchor points need to be consistent! – if necessary: select consistent subset from user-specified anchor points Find alignment under constraints given by anchor points!

292 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Use data structures from multiple alignment

293 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

294 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Greedy procedure for multiple alignment

295 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Greedy procedure for multiple alignment

296 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Question: which positions are still alignable ?

297 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x For each position x and each sequence S i exist an upper bound ub(x,i) and a lower bound lb(x,i) for residues y in S i that are alignable with x

298 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x For each position x and each sequence S i exist an upper bound ub(x,i) and a lower bound lb(x,i) for residues y in S i that are alignable with x

299 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure

300 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i)

301 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure

302 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x ub(x,i) and lb(x,i) updated during greedy procedure

303 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Anchor points treated like fragments in greedy algorithm: Sorted according to user-defined scores Accepted if consistent with previously accepted anchors ub(x,i) and lb(x,i) updated during greedy procedure Resulting values of ub(x,i) and lb(x,i) used as initial values for alignment procedure

304 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i)

305 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions atctaatagttaaactcccccgtgcttag S i cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa x Initial values of lb(x,i), ub(x,i) calculated using anchor points

306 6/3/2015Burkhard Morgenstern, Tunis 2007 Algorithmic questions Ranking of anchor points to prioritize anchor points, e.g. anchor points from verified homologies -- higher priority automatically created anchor points (using CHAOS, BLAST, … ) -- lower priority

307 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster

308 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Use gene boundaries as anchor points

309 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Use gene boundaries as anchor points + CHAOS / BLAST hits

310 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster no anchoring anchoring Ali. Columns 2 seq 2958 3674 3 seq 668 1091 4 seq 244 195 Score 1166 1007 CPU time 4:22 0:19

311 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Hox gene cluster Example: Teleost Hox gene cluster: Score of anchored alignment 15 % higher than score of non-anchored alignment ! Conclusion: Greedy optimization algorithm does a bad job!

312 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Improvement of Alignment programs Two possible reasons for mis-alignments: Wrong objective function: Biologically correct alignment gets bad numerical score Bad optimization algorithms: Biologically correct alignment gets best numerical score, but algorithm fails to find this alignment

313 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: Improvement of Alignment programs Two possible reasons for mis-alignments: Anchored alignments can help to decide

314 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment

315 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aa----CCCC AGC---GUAa gucgcuaucc a cacucuCCCA AGC---GGAG Aac------- - ccg----CCA AaagauGGCG Acuuga---- - non-anchored alignment

316 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aa----CCCC AGC---GUAa gucgcuaucc a cacucuCCCA AGC---GGAG Aac------- - ccg----CCA AaagauGGCG Acuuga---- - structural motif mis-aligned

317 6/3/2015Burkhard Morgenstern, Tunis 2007 Application: RNA alignment aaCCCCAGCG UAAGUCGCUA UCca-- --CACUCUCC CAAGCGGAGA AC---- ----CCGCCA AAAGAUGGCG ACuuga 3 conserved nucleotides as anchor points

318 6/3/2015Burkhard Morgenstern, Tunis 2007 WWW interface at GOBICS (Göttingen Bioinformatics Compute Server)

319 6/3/2015Burkhard Morgenstern, Tunis 2007 WWW interface at GOBICS (Göttingen Bioinformatics Compute Server)

320 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes

321 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes Goal: find location and structure of protein-coding genes in eukaryotic genome sequences.

322 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatct gtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgt agtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagctt cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacg tacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagc atctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggc tagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagt cttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtag tcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtct atggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatt tttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtat gctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgta gtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatg gctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtattttt ctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt agctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgta gtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgta gtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatc tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag

323 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatct gtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgt agtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagctt cgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacg tacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagc atctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggc tagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatttttct aggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcttagtcgtgtagt cttgatctacgtacgtatttttctagagcttcgtagtctatggctagtcgtagtcgtag tcgttagcatctgtatgctgttagctgtacgtacgtatttttctagagcttcgtagtct atggctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtatt tttctaggggagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtat gctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgta gtcgtagtcgttagcatctgtatgtacgtacgtatttttctagagcttcgtagtctatg gctagtcgtagtcgtagtcgttagcatctgtatgctgttagctgtacgtacgtattttt ctagagcttcgtagtctatggctagtcgtagtcgtagtcgttagcatctgtatgctgtt agctgtacgtacgtatttttctaggggagcttcgtagtctatggctagtcgtagtcgta gtcgttagcatctgtatgctgttagctgtacgtacgtatttttctaggggagcttcgta gtctatggctagtcgtagtcgtagtcgttagcatctgtatggtcgtagtcgttagcatc tgtatgctgttagctgtacgtacgtatttttctaggggagcttcgtagtctatggctag

324 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes

325 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene predictions for eukaryotes Three different approaches to computational gene- finding: Intrinsic: use statistical information about known genes (Hidden Markov Models) Extrinsic: compare genomic sequence with known proteins / genes Cross-species sequence comparison: search for similarities among genomes

326 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Generative probabilistic model for sequence of observations („symbols“). Finite set of states States can emit symbols Transitions between states possible Sequence generated by path between states

327 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Example: The occasionally dishonest casino. 3 5 6 6 6 4 6 5 1 6 5 1 2 F F U U U U U F F F F F F Possible states: fair (F); unfair (U); begin (B); end (E)

328 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Assumptions: Emission probabilities known; depend only on current state. Transition probabilities known, depend only on current state

329 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction F U EB

330 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction 3 5 6 6 6 4 6 5 1 6 5 1 2 s B F F U U U U U F F F F F F E φ For sequence s and parse φ: P(φ) probability of φ P(φ,s) joint probability of φ and s = P(φ) * P(s|φ) P(φ|s) a-posteriori probability of φ

331 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction 3 5 6 6 6 4 6 5 1 6 5 1 2 B F F U U U U U F F F F F F E Goal: find path φ with maximum a-posteriori probability P(φ|s) Idea: find path that maximizes joint probability P(φ,s) by dynamic programming (Viterbi algorithm)

332 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Application to gene prediction: A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse) Introns, exons etc modeled as states in GHMM („generalized HMM“) Given sequence s, find parse that maximizes P(φ|s) (S. Karlin and C. Burge, 1997)

333 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction Application to gene prediction: A T A A T G C C T A G T C s (DNA) Z Z Z E E E E E E I I I I φ (parse) Introns, exons etc modeled as states in GHMM („generalized HMM“) Given sequence s, find parse that maximizes P(φ|s) (S. Karlin and C. Burge, 1997)

334 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Basic model for GHMM-based intrinsic gene finding comparable to GenScan (M. Stanke)

335 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS

336 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS

337 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Features of AUGUSTUS: Intron length model Initial pattern for exons Similarity-based weighting for splice sites Interpolated HMM Internal 3’ content model

338 6/3/2015Burkhard Morgenstern, Tunis 2007 Hidden-Markov-Models (HMM) for gene prediction A T A A T G C C T A G T C s (DNA) Z Z Z E E E E I I I I φ (parse) Explicit intron length model computationally expensive.

339 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Intron length model: Explicit length distribution for short introns Geometric tail for long introns Intron (fixed) Exon Intron (expl.) Exon Intron (geo.)

340 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS

341 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS Extension of AUGUSTUS using include extrinsic information: Protein sequences EST sequences Syntenic genomic sequences User-defined constraints

342 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Comparison of genomic sequences (human and mouse)

343 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

344 6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting

345 6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting

346 6/3/2015Burkhard Morgenstern, Tunis 2007 catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Standard score: Consider length, # matches, compute probability of random occurrence Gene prediction by phylogenetic footprinting

347 6/3/2015Burkhard Morgenstern, Tunis 2007 Translation option: catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Gene prediction by phylogenetic footprinting

348 6/3/2015Burkhard Morgenstern, Tunis 2007 Translation option: L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I DNA segments translated to peptide segments; fragment score based on peptide similarity: Calculate probability of finding a fragment of the same length with (at least) the same sum of BLOSUM values Gene prediction by phylogenetic footprinting

349 6/3/2015Burkhard Morgenstern, Tunis 2007 P-fragment (in both orientations) L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg For each fragment f three probability values calculated; Score of f based on smallest P value. Gene prediction by phylogenetic footprinting

350 6/3/2015Burkhard Morgenstern, Tunis 2007 P-fragment (in both orientations) L S Y V catcatatc tta tct tac gtt aactcccccgt cagtgcgtg ata gcc cat atc cgg I A H I N-fragment catcatatc ttatcttacgtt aactcccccgtgct || | | | cagtgcgtg atagcccatatc cg P-fragments associated with strand and reading frame! Gene prediction by phylogenetic footprinting

351 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

352 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting AGenDA: Alignment-based Gene Detection Algorithm

353 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Fragments in DIALIGN alignment

354 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Build cluster of fragments

355 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Identify conserved splice sites

356 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting Candidate exons bounded by conserved splice sites Find optimal chain of candidate exons

357 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

358 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

359 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

360 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting

361 6/3/2015Burkhard Morgenstern, Tunis 2007 Gene prediction by phylogenetic footprinting AGenDA GenScan 64 % 12 % 17 %

362 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Extended GHMM using extrinsic information Additional input data: collection h of `hints’ about possible gene structure φ for sequence s Consider s, φ and h result of random process. Define probability P(s,h,φ) Find parse φ that maximizes P(φ|s,h) for given s and h.

363 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints created using Alignments to EST sequences Alignments to protein sequences Combined EST and protein alignment (EST alignments supported by protein alignments) Alignments of genomic sequences User-defined hints

364 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment to EST: hint to (partial) exon EST G1

365 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ EST alignment supported by protein: hint to exon (part), start codon EST G1 Protein

366 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment to ESTs, Proteins: hints to introns, exons ESTs, Protein G1

367 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Alignment of genomic sequences: hint to (partial) exon G2 G1

368 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Consider different types of hints: type of hints: start, stop, dss, ass, exonpart, exon, introns Hint associated with position i in s (exons etc. associated with right end position) max. one hint of each type allowed per position in s Each hint associated with a grade g that indicates its source or reliability.

369 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ h i,t = information about hint of type t at position i h i,t = $ if no hint of type t available at i h i,t = [grade, strand, (length, reading frame)] if hint available (hints created by protein alignments or DIALIGN contain information about reading frame)

370 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Standard program version, without hints A T A A T G C C T A G T C s (sequence) Z Z Z E E E E E E I I I I φ (parse) Find parse that maximizes P(φ|s)

371 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ AUGUSTUS+ using hints A T A A T G C C T A G T C s (sequence) $ $ $ $ $ $ $ X $ $ $ $ $ h (type 1) $ $ $ $ $ $ $ $ $ $ $ $ $ h (type 2) $ $ $ $ X $ $ $ $ $ $ $ $ h (type 3).. Z Z Z E E E E E E I I I I φ (parse) Find parse that maximizes P(φ|s,h)

372 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ As in standard HMM theory: maximize joint probability P(φ,s,h) How to define P(φ,s,h) ?

373 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

374 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

375 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ General assumption: Hints of different types t and at different positions i independent of each other (for redundant hints: ignore „weaker“ types).

376 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Assumption: P(h i,t |φ,s) depends on type t, grade g and whether h i,t is compatible with φ or s. Example: h i,t hint to exon E h i,t compatible with parse φ if E part of φ. h i,t compatible with sequence s if start and stop codons exist according to E and if no internal stop codon in E exists

377 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ For given g and t: 3 possible values for P(h i,t |φ,s) P(h i,t |φ,s) = q + (t,g) if h i,t compatible with φ P(h i,t |φ,s) = q - (t,g) if h i,t compatible with s but not compatible with φ P(h i,t |φ,s) = 0 if h i,t not compatible with s Values learned from training data

378 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Results: Gene (sub-)structures supported by hints receive bonus compared to non-supported structures Gene (sub-)structures not supported by hints receive malus (M. Stanke et al. 2006, BMC Bioinformatics)

379 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

380 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ h, h’ collections of hints; h’ i,t = h i,t for (i,t) ≠ (I,T) h’ I,T ≠ h I,T = $; g grade of h’ I,T φ +, φ - gene structures on s h’ IT compatible with φ +, but not with φ -

381 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

382 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

383 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

384 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

385 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

386 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

387 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+

388 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Result: i.e. structure φ +, which is compatible with additional hint h’ IT receives relative bonus

389 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Results (gene level) on data set sag178 % SN % SP Augustus 42 38 GenScan 18 14 GeneID 17 17 HMMGene 20 7 Aug. + EST 49 46 Aug. + prot 71 68 Aug. combined 68 65 Aug. all 82 79 GenomeScan 37 38 TwinScan 20 25

390 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Using hints from DIALIGN alignments: 1. Obtain large human/mouse sequence pairs (up to 50kb) from UCSC 2. Run CHAOS to find anchor points 3. Run DIALIGN using CHAOS anchor points 4. Create hints h from DIALIGN fragments 5. Run AUGUSTUS with hints

391 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints from DIALIGN fragments: Segment covered by peptide fragment minus 33 bp at both ends defines exon part hint on all 6 reading frames.

392 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ Hints from DIALIGN fragments: Consider fragments with score ≥ 20 Distinguish high scores (≥ 45) from low scores Consider reading frame given by DIALIGN Consider strand given by DIALIGN => 2*2*2 = 8 grades

393 6/3/2015Burkhard Morgenstern, Tunis 2007 AUGUSTUS+ AUGUSTUS best ab-initio method at EGASP

394 6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results

395 6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results

396 6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results

397 6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results

398 6/3/2015Burkhard Morgenstern, Tunis 2007 EGASP test results

399 6/3/2015Burkhard Morgenstern, Tunis 2007 Ongoing projects Brugia malayi (TIGR) Aedes aegypti (TIGR) Schistosoma mansoni (TIGR) Tetrahymena thermophilia (TIGR) Galdieria Sulphuraria (Michigan State Univ.) Coprinus cinereus (Univ. Göttingen) Tribolium castaneum (Univ. Göttingen)


Download ppt "6/3/2015Burkhard Morgenstern, Tunis 2007 Multiple Alignment and Motif Searching Burkhard Morgenstern Universität Göttingen Institute of Microbiology and."

Similar presentations


Ads by Google