Download presentation
Presentation is loading. Please wait.
Published byLaurence Strickland Modified over 9 years ago
1
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics
2
Multiple Sequence Alignment One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. Very informative
3
Definition A global alignment of a set of sequences is obtained by –inserting into each sequence gap characters so that –the resulting sequences are of the same length and so that –no “column” has only gap characters
4
Example: Chromo domains aligned
5
Use of alignments High sequence similarity usually means significant structural and/or functional similarity. The reverse does not need to be true Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site. Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two. Multiple alignment can be used to derive evolutionary history.
6
Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important
7
Conserved positions
8
Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements
9
Helix pattern
10
Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements –“gappy” regions loops/variable regions
11
Loop?
12
Use of Alignments - make patterns/profiles Can make a profile or a pattern that can be used to match against a sequence database and identify new family members Profiles/patterns can be used to predict family membership of new sequences Databases of profiles/patterns –PROSITE –PFAM –PRINTS –...
13
Prosite: Motifs for classification Protein sequence Prosite pattern 1 Prosite pattern 2 Prosite pattern n Family 1Family 2Family n Pattern Regular expression Profile
14
Pattern from alignment [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]
15
Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules
16
For homologous proteins Two residues should be aligned (on top of each other) –if they are homologous (evolved from the same residue in a common ancestor protein) –if they are structurally equivalent
17
Automatic approach Need a way of scoring alignments –fitness function which for an alignment quantifies its “goodness” Need an algorithm for finding alignments with good scores Not all methods provide a scoring function for the final alignment!
18
Analysis of fitness function One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences For example, if the structure of (some of) the proteins are known.
19
Align by use of dynamic programming Dynamic programming finds best alignment of k sequences with given scoring scheme For two sequences there are three different column types For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x Time complexity of O(n k ) (sequence lengths = n)
20
Use of dynamic programming Dynamic programming finds best alignment of k sequences given scoring scheme
21
Algorithm for dynamic programming
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.