Sequence Alignments Revisited

Slides:



Advertisements
Similar presentations
Substitution matrices
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Bayesian Evolutionary Distance P. Agarwal and D.J. States. Bayesian evolutionary distance. Journal of Computational Biology 3(1):1— 17, 1996.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Dot plots Dynamic Programming
Sequences Alignment Statistics
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Fa05CSE 182 CSE182-L4: Scoring matrices, Dictionary Matching.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Introduction to bioinformatics
Sequence similarity.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Scoring the Alignment of Amino Acid Sequences Constructing PAM and Blosum Matrices.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Class 3: Estimating Scoring Rules for Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Scoring matrices Identity PAM BLOSUM.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
1 CAP5510 – Bioinformatics Substitution Patterns Tamer Kahveci CISE Department University of Florida.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
An Introduction to Bioinformatics
Substitution Numbers and Scoring Matrices
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
BLAST Workshop Maya Schushan June 2009.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
The Blosum scoring matrices Morten Nielsen BioSys, DTU.
Pairwise Sequence Analysis-III
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Sequence Alignment.
Construction of Substitution matrices
Blosum matrices What are they? Morten Nielsen BioSys, DTU
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Scoring the Alignment of Amino Acid Sequences Constructing PAM and Blosum Matrices.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Tutorial 3 – Protein Scoring Matrices PAM & BLOSUM
Alignment IV BLOSUM Matrices
Presentation transcript:

Sequence Alignments Revisited Scoring nucleotide sequence alignments was easier Match score Possibly different scores for transitions and transversions For amino acids, there are many more possible substitutions How do we score which substitutions are highly penalized and which are moderately penalized? Physical and chemical characteristics Empirical methods Protein-Related Algorithms

Scoring Mismatches Physical and chemical characteristics V  I – Both small, both hydrophobic, conservative substitution, small penalty V  K – Small  large, hydrophobic  charged, large penalty Requires some expert knowledge and judgement Empirical methods How often does the substitution V  I occur in proteins that are known to be related? Scoring matrices: PAM and BLOSUM Protein-Related Algorithms

PAM matrices PAM = “Point Accepted Mutation” interested only in mutations that have been “accepted” by natural selection Starts with a multiple sequence alignment of very similar (>85% identity) proteins. Assumed to be homologous Compute the relative mutability, mi, of each amino acid e.g. mA = how many times was alanine substituted with anything else? Protein-Related Algorithms

Relative mutability ACGCTAFKI GCGCTAFKI ACGCTAFKL GCGCTGFKI GCGCTLFKI ASGCTAFKL ACACTAFKL Across all pairs of sequences, there are 28 A  X substitutions There are 10 ALA residues, so mA = 2.8 Protein-Related Algorithms

Pam Matrices, cont’d Construct a phylogenetic tree for the sequences in the alignment Calculate substitution frequences FX,X Substitutions may have occurred either way, so A  G also counts as G  A. FG,A = 3 Protein-Related Algorithms

Mutation Probabilities Mi,j represents the probability of J  I substitution. = 2.025 Protein-Related Algorithms

The PAM matrix The entries, Ri,j are the Mi,j values divided by the frequency of occurrence, fi, of residue i. fG = 10 GLY / 63 residues = 0.1587 RG,A = log(2.025/0.1587) = log(12.760) = 1.106 The log is taken so that we can add, rather than multiply entries to get compound probabilities. Log-odds matrix Diagonal entries are 1– mj Protein-Related Algorithms

Interpretation of PAM matrices PAM-1 – one substitution per 100 residues (a PAM unit of time) Multiply them together to get PAM-100, etc. “Suppose I start with a given polypeptide sequence M at time t, and observe the evolutionary changes in the sequence until 1% of all amino acid residues have undergone substitutions at time t+n. Let the new sequence at time t+n be called M’. What is the probability that a residue of type j in M will be replaced by i in M’?” Protein-Related Algorithms

PAM matrix considerations If Mi,j is very small, we may not have a large enough sample to estimate the real probability. When we multiply the PAM matrices many times, the error is magnified. PAM-1 – similar sequences, PAM-1000 very dissimilar sequences Protein-Related Algorithms

BLOSUM matrix Starts by clustering proteins by similarity Avoids problems with small probabilities by using averages over clusters Numbering works opposite BLOSUM-62 is appropriate for sequences of about 62% identity, while BLOSUM-80 is appropriate for more similar sequences. Protein-Related Algorithms