Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.

Similar presentations


Presentation on theme: "Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al."— Presentation transcript:

1 Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al

2 Introduction New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non- homologs Existing software uses the following to compute an alignment score:

3 Number of times AA i is aligned with AA j Number of gaps in alignment Number of residues in each gap beyond one Score function / Substitution matrix Contribution to score for AA match/mismatch Contribution to score for gap initialization Contribution to score for gap extension

4 Current Methods to Calculate Homology p(S r > x): probability that a random pair of proteins of the same length would have that score E: expected number of random proteins in the db that would have at least that score P: probability that there is at least one random pair with a higher score As p(S r > x), E, P increase, the likelihood that the given pair is homologous decreases

5 Current Score Matrices PAM (percent accepted mutations) – Dayhoff GCB, JTT: used to apply to larger sequence datasets BLOSUM62 – Henikoff & Henikoff, constructed using a dataset of aligned sequence blocks STR – protein sequences aligned based on their observed structures

6 Limitations of Current Score Functions Current score functions assume independent evolution of each location, overlooking correlations Score functions derived from a db of properly aligned proteins, not on alignments between random sequences Gap penalty a priori

7 Theory Z score for alignment: Characterize the significance of alignment score by calculating the likelihood that this score or higher would be obtained by a random match Account for variations in E with the length of the proteins

8 Theory Score function optimized by maximizing the confidence over the training set Avoids dependence on extreme E values (easily detected or overly distant homologies) Eliminates contribution of falsely identified homologies (overly distant)

9 Database Preparation Use set of known homologs whose homology cannot be reliably determined with standard pairwise comparison, in order to optimize score function for detection of distant homologs Training set: 900 pairs of protein in same COG with < 25% sequence identity

10 Optimization of Score Function Align using BLOSOM62 matrix Calculate Z and C for each pair of homologs, then averaged over pairs in training set to yield Generate initial alignments using gap penalties that yielded highest C values ~10 cycles of optimization and realignments until score function converged

11 Results Small changes in gap penalties: most of the improvement cones from refinements of OPTIMA: resulting score function –has significantly improved average confidence value compared with other score matrices – x)>, significantly decreased

12 Summary Aim: optimize score matrix to discriminate between homologs and non-homologs OPTIMA score function: more successful at discriminating between homologs and non- homologs compared with standard score matrices Gap penalties treated as additional parameters to be optimized


Download ppt "Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al."

Similar presentations


Ads by Google