Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.

Slides:



Advertisements
Similar presentations
Big Data & the CPTAC Data Portal Nathan Edwards, Peter McGarvey Mauricio Oberti, Ratna Thangudu Shuang Cai, Karen Ketchum Georgetown University & ESAC.
Advertisements

MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
1336 SW Bertha Blvd, Portland OR 97219
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Sangtae Kim Ph.D. candidate University of California, San Diego
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.
Top-down characterization of proteins in bacteria with unsequenced genomes Nathan Edwards Georgetown University Medical Center.
MS/MS Libraries of Identified Peptides and Recurring Spectra in Protein Digests Lisa Kilpatrick, Jeri Roth, Paul Rudnick, Xiaoyu Yang, Steve Stein Mass.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Meta-Search and Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Protein bioinformatics and systems biology Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Panel Discussion: Reference Databases Nathan Edwards Georgetown University Medical Center.
Protein Identification by Database Searching John Cottrell Matrix Science.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center  Peptide sequence databases, meta-search engine, machine-learning.
Improving the Sensitivity of Peptide Identification by Meta-Search, Grid-Computing, and Machine-Learning Nathan Edwards Georgetown University Medical Center.
Improving the Sensitivity of Peptide Identification for Genome Annotation Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
PeptideShaker Overview What makes PeptideShaker special? - proteomics: shaken, not stirred! 1)Free, open-source and platform independent! 2)Focus on user-friendliness.
Mascot Example Slides. MS/MS Database Search Example Data: BSAonespectra.mgf (one spectra) Database: bovine Fixed modifications: Carboxymethyl(C )
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
10/30/2013BCHB Edwards Project/Review BCHB Lecture 17.
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Algorithms and Computation: Bottom-Up Data Analysis Workflows
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.
Proteomics Informatics David Fenyő
Proteomics Informatics –
Protein Identification Using Tandem Mass Spectrometry
Protein Identification Using Mass Spectrometry
Top-down protein identification.
Presentation transcript:

Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search engine is publicly available, free of charge, on-line from: Improving the Sensitivity of Peptide Identification from Tandem Mass Spectra using Meta-Search, Grid-Computing, and Machine-Learning. For five columns, line up guides with these boxes Introduction Automatic search engine configuration and execution, parameterized by: Instrument & proteolytic agent Fixed and variable modifications Protein sequence database & MS/MS spectra file Peptide candidate selection Nathan J. Edwards, Georgetown University Medical Center Unified MS/MS Search Interface For three columns, line up guides with these boxes MS/MS Spectra Reformatting Peptide Identification Meta-Search via Grid-Computing Real Data: Peptide Atlas – A8_IP Conclusions References The PepArML meta-search engine provides: A unified MS/MS search interface for Mascot, X!Tandem, KScore, OMSSA, and MyriMatch, Search job scheduling on multiple large-scale heterogeneous computational grids, Unsupervised, model-free result combining using machine-learning (PepArML [1]) The PepArML meta-search engine improves peptide identification sensitivity, significantly increasing the number of peptide ids at 10% FDR. Georgetown University 1.N. Edwards, X. Wu, and C.-W. Tseng. "An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra." Clinical Proteomics 5.1 (2009). 2.N.J. Edwards. "Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression." Molecular Systems Biology (2007). Annual Meeting, 2009 PepArML – Unsupervised Machine-Learning Combiner NSF TeraGrid CPUs Edwards Lab Scheduler & 48+ CPUs Meta-search with five search engines; Automatic target & decoy searches. Secure communication Heterogeneous compute resources Scales to 250+ simultaneous searches Free, instant registration Legend: Tandem, Mascot, OMSSA: T, M, O; Mascot w/ Peptide Prophet: M*; Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C-TM, C-TO, C- MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO. Q-TOF LTQ MALDI Heuristic C-TMO U-TMO U*-TMO Tandem, KScore, OMSSA. X!Tandem, KScore, OMSSA, MyriMatch, Mascot (1 core). Simple search description Charge and precursor enumeration for peptide candidate selection (for charge & 13 C peak correction) Search engine formatting constraints (MGF/mzXML) Consistent MS/MS spectrum identifier tracking Spectrum file “chunking” Peptide Candidate Selection Missed cleavages, specific or semi-specific proteolysis Precursor matching parameters, including Precursor mass tolerance & 13 C peak correction Charge state guessing and/or enumeration Job managementResult combining Peptide Atlas A8_IP LTQ MS/MS Dataset Tryptic search of Human ESTs using PepSeqDB [2] spectra searched ~ 26 times: - Target + 2 decoys, 5 engines, 1+ vs 2+/3+ charge 8685 search jobs, 25.7 days of total CPU time TeraGrid TKO jobs in < 2 hours (143 machines) Total elapsed time (Mascot bottleneck): < 26 hours.