ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES Advisor: Predrag Radivojac School of Informatics BLOOMINGTON
Overview Shotgun Proteomics Protein Inference Problem Protein Identification Using Peptide Detectability Results Limitations and Improvements
Degenerate Peptides Rat Sample/Rat IPI Database 60% Nesvizhskii, A.I. and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell Proteomics, 4, 1419–1440.
Protein Inference Problem Solution 1 * (A, E) * Solution 2 * (B, C, D) * * * * Minimum Protein Set 11 Possible Solutions Nesvizhskii, A.I. and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell Proteomics, 4, 1419–1440.
Identified Peptides Proteins GMPSA Greedy Minimum Protein Set Algorithm Nesvizhskii, A.I. and Aebersold, R. (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell Proteomics, 4, 1419–1440.
Resolving Ambiguity detectability of a peptide – the probability that the peptide will be observed in a standard sample analyzed by a standard proteomics routine Tang, H., Arnold, R. J., Alves, P., Xun, Z., Clemmer, D. E., Novotny, M. V., Reilly, J. P. & Radivojac, P. (2006). A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics, (2006) 22 (14): e481-e488
Factors affecting Peptide Detection Four classes of factors 1)Chemical properties of the peptide (and parent protein) 2)Limitations of peptide identification protocol 3)Abundance of the peptide in the sample 4)Presence of other peptides that compete for detection Mean Accuracy :71% Mean AUC :78% Synthetic : ~30% of peptides identified Real :~10% of peptides identified Peptide Detectability Prediction
Identified Peptides Proteins Minimum Missed Peptides Missed peptide MDAP
Identified Peptides ProteinsLDFA
RESULTS GMPSALDFA Synthetic Sample with 12 Proteins 7 correct proteins 10 correct proteins 5 tied proteins 1 tied protein 1 incorrect tied protein
GMPSA vs LDFA in a R. norvegicus sample GMPSALDFA Rat Sample/Rat IPI Database Indistinguishable pairs
GMPSA vs LDFA GMPSALDFA Total proteins identified 62%81% Percent of proteins assigned with no ties Total assignments with no ties 149 Proteins assigned due to unique peptides 4 75 Total unambiguous assignments excluding the proteins with unique peptides Identified Proteins Unambiguously Identified Proteins
Limitations and Improvements Include missed-cleavage peptides Include lower scoring peptides to aid in the differentiation of tied proteins Include peptides identified with charges +1 and +3 Train on other analytical platforms Study the effects of detectability prediction on algorithm results
Publications PSB 2007 –Alves, P., Arnold, R., Novotny, M., Radivojac, P., Reilly, J., Tang, H. (2007). Advancement in Protein Inference from Shotgun Proteomics Using Peptide Detectability. Pac. Symp. Biocomput., (2007) 12: ISMB 2006 –Tang, H., Arnold, R. J., Alves, P., Xun, Z., Clemmer, D. E., Novotny, M. V., Reilly, J. P. & Radivojac, P. (2006). A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics, (2006) 22 (14): e481-e488.
Acknowledgements Predrag Radivojac Haixu Tang Randy Arnold IU School of Informatics IU Chemistry Dept.