Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.

Slides:



Advertisements
Similar presentations
The Proteomics Core at Wayne State University
Advertisements

ProteinPilot ™ Software © 2008 Applera Corporation and MDS Inc.
1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th Relative Intensity Fill Times Scan Times “shotgun sequencing”
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE associated tools: Practical exercise 1 PRIDE team, Proteomics Services Group PANDA.
2 3 J. Proteome Res., 2011, 10 (1), pp 153–160 DOI: /pr100677g.
1336 SW Bertha Blvd, Portland OR 97219
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Scaffold Download free viewer:
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics Workshop Part III: Protein Quantitation
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
The dynamic nature of the proteome
Human Proteome Project? Màster en bioquímica, biologia molecular i biomedicina Mòdul 4: Genòmica i Proteòmica Núria Colomé Calls.
HPP Preliminary Results La Cristalera, August 2012 Montserrat Carrascal, Joan Villanueva, Joaquín Abián LP-CSIC/UAB.
Center for Human Health and the Environment
© 2010 SRI International - Company Confidential and Proprietary Information Quantitative Proteomics: Approaches and Current Capabilities Pathway Tools.
Protein Sequence Databases, Peptides to Proteins, and Statistical Significance Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Identification by Database Searching John Cottrell Matrix Science.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
FIRST APPROACH TO THE SHOTGUN UNKNOWN PROTEINS sHPP CHROMOSOME 16 MEETING August, 28, 2012 La Cristalera, Miraflores de la Sierra.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
What is proteomics? Richard Mbasu and Ben Richards.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
Cedar: A Multi-Tiered Protein Identification Scheme for Shotgun Proteomics Terry Farrah (1); Eric Deutsch (1); Gilbert Omenn (2,1); Ruedi Aebersold (3),
Considerations for multi-omics data integration Michael Tress CNIO,
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Algorithms and Computation: Bottom-Up Data Analysis Workflows
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Protein Identification via Database searching
Mass spectrometry-based proteomics
Peptide & Protein Identification by MS/MS
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Kuen-Pin Wu Institute of Information Science Academia Sinica
Viewing your results from the PAW Pipeline
Presentation transcript:

Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute

Story which we experienced at my lab 1.Sequest search for the biomarker discovery after the 1 st experiment of a sample 2.Decoy approach within 5% error rate 3.Preliminary biological interpretation for the 1 st dataset 4.We installed high-performance Mascot server. 5.Mascot search for the 1 st data 6.Very different protein list from the previous Sequest result  presumed markers disppeared.  7.Need to change biological interpretation for Mascot result 8.Two more experiments for the confirmation 9. We had to select which search engine will be used for the 2 nd and 3 rd dataset analysis. Which one is the correct interpretation?

Factor which affects on the Proteome Analysis Experimental dependence ▫Instrumentation : ESI/MS, MALDI/MS, … ▫Reagent : enzyme for proteolysis, isotope tag for quantitation, affinity tag for enrichment, …. ▫Protocol : MudPIT, IPAS,… Informatics dependence ▫Software : Mascot, Sequest, PEAKS, X!Tandem, OMSSA, Lutefisk… ▫Data analysis protocol : decoy, peptideProphet, … ▫Sequence database : SwissProt, IPI, NCBI nr, OWL, … Different result by different method

Different experiment P.A. Kirkland et al., J. Proteome Res. 2008, 7(11),

9% 19%7% 34% 5% 4%22% SEQUEST X!Tandem Mascot Each search engine identifies about the same number of spectra, But the overlap is surprisingly small. Different search engines match different spectra. But the overlap is surprisingly small. Different search engines match different spectra. B.C. Searle, Improving Sensitivity by Combining Results from Multiple Search Methodologies, Proteome Software Inc. Search engines

For Each Spectrum Get Mascot IDs Get SEQUEST IDs Get X!Tandem IDs Calculate SEQUEST Probability Calculate Mascot Probability Calculate X!Tandem Probability Calculate Combined Peptide Probability Peptide Prophet* Scaffold Merger Calculate Protein Probabilities Protein Prophet* … *Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, Scaffold uses Nesvizhskii’s algorithm to convert SEQUEST and Mascot scores to peptide probabilities Scaffold uses another algorithm by Nesvizskii to combine peptide probabilities. B.C. Searle, Improving Sensitivity by Combining Results from Multiple Search Methodologies, Proteome Software Inc.

Factors which effect on the Proteome Analysis Experimental dependence ▫Instrumentation : ESI/MS, MALDI/MS, … ▫Reagent : enzyme for proteolysis, isotope tag for quantitation, affinity tag for filtering, …. ▫Protocol : MudPIT, IPAS,… Data analysis method dependence ▫Software : Mascot, Sequest, PEAKS, X!Tandem, OMSSA, Lutefisk… ▫Data analysis protocol : decoy, peptideProphet, … ▫Sequence database : SwissProt, IPI, NCBI nr, OWL, … Solution : integration Expensive way to many small laboratories

Suggest to group proteins

As usual : Grouping the identified proteins X. Li, et al., ‘Comparison of alternative analytical techniues for the characterisation of the human serum proteome in HUPO Plasma Proteome Project ‘, Proteomics 2005, 5, Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 6 Peptide 5 Peptide 7 Protein D Protein E Protein F Protein A Protein B Protein C

Smaller database is fast but may miss many true sequences.  less true positives Idea : Optimize the database size ( Rope walking ) Larger database will include more true sequences.  It is not fast and it may also include many false positives

Sequence Database IPI human, IPI mouse (EBI) : HUPO recommendation SwissProt (EBI) nr (NCBI) EST database

Similar sequences Grouping proteins of sequence database before database search 1 st search 2 nd search 3 rd search

1 st search with IPI human representative database IPI kDa protein IPI Gamma-globin IPI Guanine deaminase IPI Guanine deaminase IPI kDa protein IPI Malate dehydrogenase

IPI Beta-globin IPI Hemoglobin  IPI Gamma-globin IPI Hemoglobin  IPI kDa protein IPI Hemoglobin  IPI Hemoglobin  -1 IPI Hemoglobin  -2 IPI Malate dehydrogenase IPI Guanine deaminase IPI Guanine deaminase IPI Guanine deaminase IPI Guanine deaminase 2 nd search with groups of IPI human database selected from the IPI representative DB search IPI Hemoglobin  IPI kDa protein IPI Beta-globin gene IPI Hemoglobin  IPI Hemoglobin Lepore-Baltimore IPI kDa protein IPI Delta-hemoglobin IPI Gamma-G globin

3 rd search with groups of NCBI nr human database selected from the IPI representative DB search gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| gi| IPI kDa protein gi| IPI Gamma-globin gi| gi| gi| gi| IPI Guanine deaminase gi| IPI Guanine deaminase IPI kDa protein IPI Malate dehydrogenase gi| gi| gi| gi| gi| gi| gi| gi| gi|

Keratin Keratin: type I cytoskeletal,epide rmal type I, type I cuticular Cell division protein kinase, tyrosin- protein kinase, Serine/threoni ne-protein kinase, Fibroblast growth factor receptor Guanine nucleotide binding protein Septin Ras-related proteins 1 st search with representative DB 2 nd search with group DB of identified representative proteins

Result of the iterative MS/MS ion search databaseIPI human database IPI human representative IPI human selected groups NCBI nr human selected groups Number of proteins48,19324,1206,86032,916 Redundant proteins identified 5,5842,3365,28822,895 Non-redundant proteins identified 2,9442,1362,9344,090 Redundant peptides identified 10,4865,58511,06617,500 Non-redundant peptides identified 6,1245,1776,5696,580 Material : membranous fraction of human brain temporal lobe tissue Experimental Methods : Multidimensional separation / LTQ-MS/MS (ThermoFinnigan) Database Analysis : TurboSEQUEST(ThermoFinnigan), DTASelect (Scripps Institute) Database : IPI.HUMAN.v , NCBI nr human (283, 548 proteins)

Mascot vs. Sequest IPI, Sprot, nr, IPI-representative, Sprot- representptive, IPI-IDedGroup, Sprot-IDedGroup

Mascot = Sequest Mascot only

representative DB approach works differently. Sequest only

Compare Mascot, Sequest with IPI, Sprot, nr, representative DB (result from MudPIT analysis of one 1D-gel band of human cell line) Lower XcorrLower Xcorr

Lower XcorrLower Xcorr

Advantage of representative DB approach It mines more peptide sequences without consuming more time and more search engines. This method can connect different databases. Additionally, we expect that it can give more reliable information for PTM by selecting more probable proteins before PTM search.

Thank you. Why I have given this presentation at HUPO PSI?