Download presentation
Presentation is loading. Please wait.
1
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al
2
Introduction New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non- homologs Existing software uses the following to compute an alignment score:
3
Number of times AA i is aligned with AA j Number of gaps in alignment Number of residues in each gap beyond one Score function / Substitution matrix Contribution to score for AA match/mismatch Contribution to score for gap initialization Contribution to score for gap extension
4
Current Methods to Calculate Homology p(S r > x): probability that a random pair of proteins of the same length would have that score E: expected number of random proteins in the db that would have at least that score P: probability that there is at least one random pair with a higher score As p(S r > x), E, P increase, the likelihood that the given pair is homologous decreases
5
Current Score Matrices PAM (percent accepted mutations) – Dayhoff GCB, JTT: used to apply to larger sequence datasets BLOSUM62 – Henikoff & Henikoff, constructed using a dataset of aligned sequence blocks STR – protein sequences aligned based on their observed structures
6
Limitations of Current Score Functions Current score functions assume independent evolution of each location, overlooking correlations Score functions derived from a db of properly aligned proteins, not on alignments between random sequences Gap penalty a priori
7
Theory Z score for alignment: Characterize the significance of alignment score by calculating the likelihood that this score or higher would be obtained by a random match Account for variations in E with the length of the proteins
8
Theory Score function optimized by maximizing the confidence over the training set Avoids dependence on extreme E values (easily detected or overly distant homologies) Eliminates contribution of falsely identified homologies (overly distant)
9
Database Preparation Use set of known homologs whose homology cannot be reliably determined with standard pairwise comparison, in order to optimize score function for detection of distant homologs Training set: 900 pairs of protein in same COG with < 25% sequence identity
10
Optimization of Score Function Align using BLOSOM62 matrix Calculate Z and C for each pair of homologs, then averaged over pairs in training set to yield Generate initial alignments using gap penalties that yielded highest C values ~10 cycles of optimization and realignments until score function converged
11
Results Small changes in gap penalties: most of the improvement cones from refinements of OPTIMA: resulting score function –has significantly improved average confidence value compared with other score matrices – x)>, significantly decreased
12
Summary Aim: optimize score matrix to discriminate between homologs and non-homologs OPTIMA score function: more successful at discriminating between homologs and non- homologs compared with standard score matrices Gap penalties treated as additional parameters to be optimized
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.