Presentation is loading. Please wait.

Presentation is loading. Please wait.

Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007.

Similar presentations


Presentation on theme: "Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007."— Presentation transcript:

1 Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007

2 `Progressive´ Alignment Most popular approach to (global) multiple sequence alignment: Progressive Alignment Since mid-Eighties: Feng/Doolittle, Higgins/Sharp, Taylor, …

3 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

4 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

5 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

6 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

7 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

8 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

9 `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Most important implementation: CLUSTAL W

10 `Progressive´ Alignment CLUSTAL W; Thompson et al., 1994 (~17.000 citations) Pairwise distances as 1 - percentage of identity Calculate un-rooted tree with Neighbor Joining Define root as central position in tree Define sequence weights based on tree Gap penalties calculated based on various parameters

11 Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.

12 Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif

13 Local sequence alignment Find common motif in sequences; ignore the rest EYENS ERYENS ERYAS

14 Local sequence alignment Find common motif in sequences; ignore the rest E-YENS ERYENS ERYA-S

15 Local sequence alignment Find common motif in sequences; ignore the rest – Local alignment E-YENS ERYENS ERYA-S

16 Local sequence alignment Traditional alignment approaches: Either global or local methods!

17 New question: sequence families with multiple local similarities Neither local nor global methods appliccable

18 New question: sequence families with multiple local similarities Alignment possible if order conserved

19 The DIALIGN approach

20

21

22

23

24

25

26

27

28

29

30 Consistency!

31 The DIALIGN approach

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49 T-COFFEE C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol. Problem: progressive alignment can go wrong if mistakes are made at an early stage. Example …

50 T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT

51 T-COFFEE SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT

52 T-COFFEE

53 Idea: consider different pairwise alignments (local and global) check how these alignments support each other

54 T-COFFEE

55

56 T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL

57 Evaluation of multi-alignment methods Alignment evaluation by comparison to trusted benchmark alignments. `True’ alignment known by information about structure or evolution.

58 Evaluation of multi-alignment methods For protein alignment: M. McClure et al. (1994): 4 protein families, known functional sites J. Thompson et al. (1999): Benchmark data base, 130 known 3D structures (BAliBASE) T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)

59 Evaluation of multi-alignment methods

60 Alignment evaluation by comparison to trusted benchmark alignments. `True’ alignment known by information about structure or evolution.

61 Evaluation of multi-alignment methods

62 1aboA 1.NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1.drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Key alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE Reference alignments Evaluation of multi-alignment methods

63 5 categories of benchmark sequences (globally related, internal gaps, end gaps) CLUSTAL W, RPPR perform well on globally related sequences, DIALIGN superior for local similarities Conclusion: no single best multi alignment program!

64 Evaluation of multi-alignment methods T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)

65

66 Result: DIALIGN best for distantly related sequences, TCOFFEE best for closely related sequences


Download ppt "Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007."

Similar presentations


Ads by Google