Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Heuristic alignment algorithms and cost matrices
Sequence analysis course
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures CLUSTAL W Algorithm Lecturer:
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple alignment: heuristics
Multiple sequence alignment
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST and Multiple Sequence Alignment
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Sequence Alignment and Database Searching.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Sequence Alignment.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Multiple Sequence Alignment
Basic Local Alignment Search Tool
Presentation transcript:

Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may be altered by changing your search criteria. Understand usefulness of multiple sequence alignment. Become familiar with Clustal W.

MPRCLCQRDNCBA P B R C K C R N D C D A Answer to pairwise sequence alignment workshop P-RCLCQRDNCBA | || | |::|:| PBRCKC-RNDCDA Similarity score: 60 %similarity: 10/12 x 100 = 83.3%

P – R C L C Q R D N C B A | | | | | : : | : | P B R C K C – R N D C D A SUM = 60

Multiple Sequence Alignment 1 Collection of three or more protein (or nucleic acid) sequences partially or completely aligned. Aligned residues tend to occupy corresponding positions in the 3-D structure of each aligned protein.

Practical use of MSA Places proteins into a group of related proteins (paralogs and orthologs). Identifies conserved domains and motifs Identifies sequencing errors in nucleotide sequences Identifies important regulatory regions in the promoters of genes.

Practical uses Create Alignment If possible, edit the alignment to ensure that regions of functional or structural similarity are preserved Phylogenetic Analysis Structural Analysis Find conserved motifs to deduce function Design of PCR primers

Clustal W (Thompson et al., 1994) CLUSTAL=Cluster alignment The underlying concept is that groups of sequences are phylogenetically related. If they can be aligned, then one can construct a tree. Step1-pairwise alignments Step2-create a guide tree Step3-progressive alignment

Flowchart of computation steps in Clustal W (Thompson et al., 1994) Pairwise Alignment: Calculation of distance matrix Creation of unrooted Neighbor-Joining Tree Create rooted NJ Tree (Guide Tree) and calculate sequence weights Progressive alignment following the Guide Tree

Step 1-Pairwise alignments Compare each sequence with each other and pairwise alignment scores SeqA Name Len(aa) SeqB Name Len(aa) Score 1 human 60 2 dog human 60 3 mouse dog 60 3 mouse Human EYSGSSEKIDLLASDPHEALICKSERVHSKSVESNIEDKIFGKTYRKKASLPNLSHVTEN 480 Dog EYSGSSEKIDLMASDPQDAFICESERVHTKPVGGNIEDKIFGKTYRRKASLPKVSHTTEV 477 Mouse GGFSSSRKTDLVTPDPHHTLMCKSGRDFSKPVEDNISDKIFGKSYQRKGSRPHLNHVTE 476

Step 1-Pairwise alignments Compare each sequence with each other and calculate similarity scores. H - D 76 - M H D M The highest similarity score is between sequence H and sequence D Different sequences

Step 2-Create Guide Tree Use the similarity scores to create a guide tree to determine the “order” of the sequences to be aligned. H - D 76 - M H D M Different sequences D = -ln(S eff ) S eff = S real(ij) – S rand(ij) S ident(ij) – S rand(ij) x 100 ( human: , dog: , mouse: ); S real(ij) describes the similarity score for two aligned sequences i and j. S ident(ij) is the average of the two scores for the two sequences compared to themselves S rand(ij) is the mean alignment score derived from many random shufflings D ranges from 0 to 1

Step 2-Create Guide Tree This branch length is proportional to the estimated divergence between the Dog sequence and an “average” sequence common to all three sequences (human: , dog: , mouse: ); Guide Tree human: dog: mouse:0.3494

Step 3-Progressive Alignment Align human and dog first. Then add mouse to the previous alignment. In closely aligned sequences, gaps are given lower penalty than penalty for gaps in more diver- gent sequences. General rule: “once a gap always a gap” Why is there a lower penalty for the closely aligned sequences? In closely aligned sequences, those gaps suggest that they are located in areas between functional or structural domains. In more divergent sequences gaps may be located in areas where the sequences that are dissimilar regardless of whether they break up functional or structural domains. Guide Tree human: dog: mouse:0.3494

Other Gap treatment Short stretches of 5 hydrophilic residues often indicate loop or random coil regions (not essential for structure) and therefore gap penalties are lowered if they separate such areas. Alignments of proteins of known structure show that proteins gaps do not occur more frequently than every eight residues. Therefore penalties for gaps increase when required at 8 residues or less for alignment. This gives a lower alignment score in that region. A gap penalty is assigned after each aa according to the frequency that such a gap naturally occurs after that aa to align known homologs.

Amino acid weight matrices As we know, there are many scoring matrices that one can use depending on the relatedness of the aligned proteins. In CLUSTAL W, as the alignment proceeds to longer branches the aa scoring matrices are automatically changed to more divergent scoring matrices (lower BLOSUM Scoring Matrices). The length of the branch is used to determine which matrix to use.

Example of Sequence Alignment using Clustal W

Multiple Alignment Considerations Quality of guide tree. It would be good to have a set of closely related sequences in the alignment to set the pattern for more divergent sequences. If the initial alignments have a problem, the problem is magnified in subsequent steps. CLUSTAL W is best when aligning sequences that are related to each other over their entire lengths. Do not use when there are variable N- and C- terminal regions.