Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.

Slides:



Advertisements
Similar presentations
Substitution matrices
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
Heuristic alignment algorithms and cost matrices
Sequence analysis course
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Introduction to bioinformatics
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Scoring matrices Identity PAM BLOSUM.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Substitution matrices
Protein Sequence Comparison Patrice Koehl
1 Lesson 3 Aligning sequences and searching databases.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
An Introduction to Bioinformatics
Substitution Numbers and Scoring Matrices
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
©CMBI 2005 Transfer of information The main topic of this course is transfer of information. A month in the lab can easily save you an hour in front of.
Pairwise Sequence Analysis-III
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Pairwise Sequence Alignment and Database Searching
Blast Basic Local Alignment Search Tool
Pairwise Sequence Alignment (cont.)
Sequence comparison: Significance of similarity scores
Alignment IV BLOSUM Matrices
Presentation transcript:

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al

Introduction New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non- homologs Existing software uses the following to compute an alignment score:

Number of times AA i is aligned with AA j Number of gaps in alignment Number of residues in each gap beyond one Score function / Substitution matrix Contribution to score for AA match/mismatch Contribution to score for gap initialization Contribution to score for gap extension

Current Methods to Calculate Homology p(S r > x): probability that a random pair of proteins of the same length would have that score E: expected number of random proteins in the db that would have at least that score P: probability that there is at least one random pair with a higher score As p(S r > x), E, P increase, the likelihood that the given pair is homologous decreases

Current Score Matrices PAM (percent accepted mutations) – Dayhoff GCB, JTT: used to apply to larger sequence datasets BLOSUM62 – Henikoff & Henikoff, constructed using a dataset of aligned sequence blocks STR – protein sequences aligned based on their observed structures

Limitations of Current Score Functions Current score functions assume independent evolution of each location, overlooking correlations Score functions derived from a db of properly aligned proteins, not on alignments between random sequences Gap penalty a priori

Theory Z score for alignment: Characterize the significance of alignment score by calculating the likelihood that this score or higher would be obtained by a random match Account for variations in E with the length of the proteins

Theory Score function optimized by maximizing the confidence over the training set Avoids dependence on extreme E values (easily detected or overly distant homologies) Eliminates contribution of falsely identified homologies (overly distant)

Database Preparation Use set of known homologs whose homology cannot be reliably determined with standard pairwise comparison, in order to optimize score function for detection of distant homologs Training set: 900 pairs of protein in same COG with < 25% sequence identity

Optimization of Score Function Align using BLOSOM62 matrix Calculate Z and C for each pair of homologs, then averaged over pairs in training set to yield Generate initial alignments using gap penalties that yielded highest C values ~10 cycles of optimization and realignments until score function converged

Results Small changes in gap penalties: most of the improvement cones from refinements of OPTIMA: resulting score function –has significantly improved average confidence value compared with other score matrices – x)>, significantly decreased

Summary Aim: optimize score matrix to discriminate between homologs and non-homologs OPTIMA score function: more successful at discriminating between homologs and non- homologs compared with standard score matrices Gap penalties treated as additional parameters to be optimized