ProReP - Protein Results Parser v3.0© A Tool For Handling Tandem Mass Spectrometer Protein Database Search Results Capstone Presentation Kiran Annaiah (M.S Bioinformatics) Advisors Dr. Randy Arnold Dr. Haixu Tang
Outline Background Data generation from Mass Spec Experiment Mascot Search Engine Why to parse Mascot results? Parser features Results Conclusions Acknowledgments
Background High-throughput “shotgun” Proteomics Mass Spectrometry Identify, characterize and quantify all expressed proteins simultaneously in a mixture. Mass Spectrometry Peptide mass fingerprinting Collision Induced Dissociation (CID) spectra from MS/MS analysis LC/MS/MS approach used to identify protein components in a complex mixture Tandem mass spectra helps in inferring amino acid sequences of peptides
Peptide Mass Fingerprinting vs. MS/MS protein identification James S. Eddes et.al., 2002, Proteomics
Database Searching L M G S E I P K b1 b2 b3 b4 b5 b6 b7 NH2 CO2 y7 y6 m/z y7 y6 y5 y4 y3 y2 y1 Database searching software Results MASCOT® Proteins found Hemoglobin, beta chain Pept. Mass Score Sequence 738.84 41 HLDNLK 912.01 61 VHLTDAEK 915.06 56 AAVNGLWGK 1090.24 41 VINAFNDGLK 1122.33 62 VVAGVASALAHK 1218.42 70 LVINAFNDGLK … Database (SwissProt) Actin MYTCVPIASEQUENCEMIMEWTPQSDLIRPTVCIMNERCVGGPYILCMTEND Amylase DSLIKRNYTIPMCSQIRECNHIPLMTRCHGYYKWSIALAINTQSFGIVRIVAMNKLPSSCRTIVGHWEDRICTMQNCISPPEKELIAVARGTSP …
Mascot Search Engine Uses mass spectrometry data to identify proteins from primary sequence databases MS/MS ion search Enzyme cleavage rules applied to sequences in the protein databases Experimental mass values compared with calculated fragment ion mass values Use scoring algorithm to identify the closest match or matches Probability based MOWSE scoring algorithm Databases MSDB – non-identical protein sequence DB NCBInr SwissProt dbEST – “single-pass” cDNA sequences or EST’s
A Typical Experiment Analysis of Liver / Brain Tissue Digest with Trypsin Liquid Chromatography LC eluting sample electrosprayed into Mass Spec APAAIGAYSQAVLVDR from 14.5 kDa translational inhibitor protein MS-MS on intense peak of a parent ion Raw data converted to a DTA file Mascot Search Generates Html file
Mascot output – Html file (avg. size 5 MB)
Motivation Mass spectrometry generates enormous amount of data Mascot returns on an average hundreds of proteins matching the mass spectral data Time consuming to analyze the mascot results manually Need different ways of looking at data Comparison of various data sets (experiments) No tools were available in public domain to analyze Mascot results
Protein Results Parser v3.0 Features Single File parsing Sequence coverage - with single file parsing Two-file comparison Multiple files Compare Combine Tool was developed using Perl/Tk Windows application
Single File Parsing
Screened Html Result (smaller file size)
Sequence Coverage
Two file Comparison
Results – Comparison of Two Experiments
Combine and Compare Feature Drug A Treatments (protein digest) Drug B Fractions (SCX) Triplicates (LC/MS/MS) 15 data files 15 data files Combine Combine Compare
Multiple File Comparison
Results – Multiple file comparison (sequential display)
Results – Multiple file comparison (tabular display)
Combine – Merging of multiple experiments
Results – combining multiple experiments + +
Conclusion Decreased data analysis and processing time. Search results reduced using user specified criteria in an automated way. Removal of low-scoring peptide matched greatly improves the accuracy of data interpretation A single result file can be processed multiple times, using a different set of parsing criteria each time, without the need to repeat the database search. The ability to compare two or more result files in an automated fashion makes determination of sample similarity a nearly effortless endeavor
Acknowledgements Dr. Randy Arnold – Manager and Research Scientist (Proteomics Research and Development Facility – Dept. of Chemistry) Dr. Haixu Tang – Asst. Prof, School of Informatics Abhijit Mahabal – Grad student, CS Dept. Kranthi Varala – Grad Student, Bioinformatics