Download presentation
Presentation is loading. Please wait.
Published byElwin Sparks Modified over 9 years ago
1
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Peptide sequence databases, meta-search engine, machine-learning combiner available from: http://edwardslab.bmcb.georgetown.eduhttp://edwardslab.bmcb.georgetown.edu Application of enumeration, meta-search, and machine-learning can significantly improve the sensitivity of peptide identification. Improving the Sensitivity of Peptide Identification With Meta-Search and Machine Learning Nathan J. Edwards 1, Xue Wu 2, Chau-Wen Tseng 2 For five columns, line up guides with these boxes Introduction All peptide sequences from: Six-frame translation of EST and HTC sequences; Three-frame translation of mRNA sequences; All IPI, RefSeq, Genbank, Vega, EMBL, HInvDB, SwissProt and TrEMBL proteins; SwissProt variants, splices, conflicts, mature isoforms grouped by gene-cluster & compressed, as FASTA. 1 Georgetown University Medical Center; 2 University of Maryland, College Park Peptide Sequence Databases For three columns, line up guides with these boxes PepSeqDB Release 1.2 Peptide Identification Meta-Search HMMatch Spectral Matching Conclusions References We use a variety of techniques, from sequence enumeration and meta-search to machine learning to increase the number high-confidence peptide identifications from large tandem mass-spectra datasets. These techniques seek to improve the number of peptide identifications made at a given level of statistical significance. We show that these techniques can improve identification sensitivity significantly. Georgetown University 1.Edwards. Novel Peptide Identification using Expressed Sequence Tags and Sequence Database Compression. Mol. Sys. Biol. 2007. 2.Wu, Tseng, Edwards. HMMatch: Peptide Identification by Spectral Matching of Tandem Mass Spectra using Hidden Markov Models. J. Comp. Biol. 2007. 3.Wu, Tseng, Rudnick, Balgley, Edwards. PepArML: An Unsupervised, Model-Free, Combining, Peptide Identification Arbiter for Tandem Mass Spectra via Machine Learning. In preparation. OrganismSize (AA)Size (Entries) Human209Mb75,043 Mouse151Mb55,929 Rat 67Mb43,211 Zebra-fish 90Mb47,922 Schedule: Automated rebuild every few months. Coming soon: Fast peptide to gene and source sequence mapping using suffix-trees and gene sequence-groups. Annual Meeting, 2008 PepArML - Unsupervised Machine-Learning Combiner NSF TeraGrid 1000+ CPUs UMIACS 250+ CPUs Edwards Lab Scheduler & 48+ CPUs Meta-search with four search engines; Target & decoy searches automatically. Web-service API for all data Secure communication Heterogeneous compute resources Simple search description Scales to 100’s of simultaneous searches Free, instant registration Iteration Legend: Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C- TM, C-TO, C-MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO. Q-TOF False Positive Rate LTQ MALDI H C-TMO U-TMO U*-TMO End b2b2 D3D3 y1y1 I2I2 D2D2 b1b1 I1I1 D1D1 Begin I0I0 y2y2 I4I4 D4D4 I3I3 I0I0 b1b1 I1I1 I2I2 I3I3 I4I4 I5I5 I6I6 y1y1 b2b2 y2y2 b3b3 y3y3 11%17% 6%94%8%0%11%86%17%0%6%92%19%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.