Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November.

Similar presentations


Presentation on theme: "Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November."— Presentation transcript:

1 Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November 2007

2 Tools for multiple sequence alignment Multiple alignment basis of (almost) all methods for sequence analysis in bioinformatics

3 Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E

4 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E

5 Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E Y I M Q E V Q Q E Y I A M R E Q Y E

6 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E Y - I - M Q E V Q Q E Y – I A M R E - Q Y E

7 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!

8 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!

9 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Which one is the best ???

10 Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? objective function (`score) (2) How to find a good alignment? optimization algorithm

11 Tools for multiple sequence alignment What is a biologically good alignment ??

12 Tools for multiple sequence alignment Criteria for alignment quality: 1. 3D-Structure: align residues at corresponding positions in 3D structure of protein! 2. Evolution: align residues with common ancestors!

13 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M - R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!

14 Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!

15 Tools for multiple sequence alignment Compute for amino acids a and b Probability p a,b of substitution a b (or b a), Frequency q a of a Define similarity score s(a,b) based on p a,b, q a Result: similarity matrix (substitution matrix), e.g. PAM (Dayhoff matrix), BLOSUM, …

16

17 Tools for multiple sequence alignment

18 Traditional objective functions: Define Score of alignments as Sum of individual similarity scores s(a,b) of aligned amino acid residues Gap penalty g for each gap in alignment Optimal alignment can be calculated for two sequences but in practice not for > 8 sequences

19 T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g

20 Tools for multiple sequence alignment Most commonly used heuristic for multiple alignment: Progressive alignment (mid 1980s): Idea: calculate multiple alignment as series of pairwise alignments of sequences and profiles Use guide tree to determine order of pairwise alignments

21 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

22 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

23 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap

24 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap

25 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap

26 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, once a gap - always a gap

27 CLUSTAL W Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994, Nuc. Acids Res.) (22,327 citations in the literaterature!, Oct 2007)

28 Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.

29 Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif

30 Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS

31 Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S

32 Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S

33 Gibbs Motive Sampler Local multiple alignment without gaps: E.g. Gibbs sampling C.E. Lawrence et al. (1993, Science)

34 Traditional alignment approaches: Either global or local methods!

35 New question: sequence families with multiple local similarities Neither local nor global methods appliccable

36 New question: sequence families with multiple local similarities Alignment possible if order conserved

37 The DIALIGN approach Morgenstern, Dress, Werner (1996, Proc Natl. Acad. Sci.) Combination of global and local methods Assemble multiple alignment from gap-free local pairwise alignments (,,fragments)

38 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

39 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

40 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

41 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

42 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

43 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

44 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

45 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

46 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

47 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

48 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!

49 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa

50 The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods

51 T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000, J. Mol. Biol.) Combination of global and local methods

52 T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT

53 T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT

54 T-COFFEE

55 Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist

56

57 T-COFFEE T-COFFEE Idea: 1. Build library of pairwise alignments 2. Alignment from seq i, j and seq j, k supports alignment from seq i, k.

58 T-COFFEE T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL

59 Evaluation of multi-alignment methods Alignment evaluation by comparison to trusted benchmark alignments. `True alignment known by information about structure or evolution.

60 1aboA 1.NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1.drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Key alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE Reference alignments Evaluation of multi-alignment methods

61 Result: DIALIGN best method for distantly related sequences, T-Coffee best for globally related proteins

62 Evaluation of multi-alignment methods Conclusion: no single best multi alignment program! Advice: try different methods!

63 Tools for phylogeny reconstruction Two approaches covered in this course: Distance methods, e.g. Neighbour-Joining Maximum Likelihood Other important methods (not covered in this course): Maximum parsimony Bayesian approaches

64 Tools for phylogeny reconstruction Phylogenetic trees: rooted trees unrooted trees Many methods produce unrooted trees: find root using outgroup!

65 Biological Question: Are Sponges mono-/paraphyletic? Phylogenetic Reconstuction: An Example Organims of interest: Sponge

66 Build Dataset Dataset Query Sequence DNA/Protein Sequence from Sponge Gene Search for Homologs using e.g BLAST Hits from Search: putative homologs

67 Sequence alignment Dataset Sequence Alignment Hits from Search: putative homologs Alignment tools: -Clustalw -T-Coffee -Dialign...many more Use to bring sequences in relation

68 Alignment Phylogenetic Tree Phylogeny Methods: Distance-based: ---Nj ---UPGMA Parsimony: ---Max.Parsimony(Phylip/Paup) Statistical: ---Max.Likelihood (Phyml) ---Bayesian Inf. (MrBayes) Estimate Phylogeny

69 Interpretate results Hypothesis: Sponges are monophyletic

70 Tools for phylogeny reconstruction Distance methods: For N sequences S 1, … S N : Calculate distance d(i,j) for any two sequences S i and S j Goal find tree that represents all distances d(i,j) as closely as possible To calculate distances d(i,j) : construct multiple alignment of input sequences, consider substitutions implied by alignment

71 Matrix of pairwise distances d(i,j)

72 Find tree that corresponds to distances d(i,j)

73 Tools for phylogeny reconstruction Maximum likelihood: Consider evolution of sequences as random process. Stochastical model assigns probabilities to substitutions. Consider tree T as hypothesis about observed sequence data D Search tree with highest likelihood P(D|T)

74 Tools for phylogeny reconstruction Assumptions: Positions in sequences (colums in alignment) independent of each other Events on different branches of tree independent of each other Result: probabilities can be multiplied

75 Probability P(D|T) for given residues at internal nodes

76

77

78 Consider all possible residues for internal nodes

79 Testing the reliability of a tree (or parts of it): the bootstrap approach Bootstrap in general: repeat statistical test after random re-sampling, i.e. by drawing additional sample data. In phylogeny: 1. Select randomly columns from Alignment and repeat tree reconstruction with the same method (e.g. 1000 times) 2. Calculate for every branch: how often is it observed in newly constructed trees?

80


Download ppt "Methods course Multiple sequence alignment and Reconstruction of phylogenetic trees Burkhard Morgenstern, Fabian Schreiber Göttingen, October/November."

Similar presentations


Ads by Google