ProReP - Protein Results Parser v3.0©

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

The Proteomics Core at Wayne State University
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Mascot: an introduction Basics oWhat is Mascot? oIt is a search engine which uses mass spectrometry (MS) data to identify proteins.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor.
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
The Student Research and Scholarship Center Grove School of Engineering, And Pathways Bioinformatics Center, CCNY Present Winter Bioinformatics Workshop.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
GSAT501 - proteomics Name, home-town Students – previous lab experience –Lab you hope to end up in? Teachers – what is your current project.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
In-Gel Digestion Why In-Gel Digest?
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
RANIA MOHAMED EL-SHARKAWY Lecturer of clinical chemistry Medical Research Institute, Alexandria University MEDICAL RESEARCH INSTITUTE– ALEXANDRIA UNIVERSITY.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
MassMatrix Search Results Explained
Thomas BOTZANOWSKI & Blandine CHAZARIN
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Proteomic Approaches to Cancer Biomarkers
A perspective on proteomics in cell biology
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Presentation transcript:

ProReP - Protein Results Parser v3.0© A Tool For Handling Tandem Mass Spectrometer Protein Database Search Results Capstone Presentation Kiran Annaiah (M.S Bioinformatics) Advisors Dr. Randy Arnold Dr. Haixu Tang

Outline Background Data generation from Mass Spec Experiment Mascot Search Engine Why to parse Mascot results? Parser features Results Conclusions Acknowledgments

Background High-throughput “shotgun” Proteomics Mass Spectrometry Identify, characterize and quantify all expressed proteins simultaneously in a mixture. Mass Spectrometry Peptide mass fingerprinting Collision Induced Dissociation (CID) spectra from MS/MS analysis LC/MS/MS approach used to identify protein components in a complex mixture Tandem mass spectra helps in inferring amino acid sequences of peptides

Peptide Mass Fingerprinting vs. MS/MS protein identification James S. Eddes et.al., 2002, Proteomics

Database Searching L M G S E I P K b1 b2 b3 b4 b5 b6 b7 NH2 CO2 y7 y6 m/z y7 y6 y5 y4 y3 y2 y1 Database searching software Results MASCOT® Proteins found Hemoglobin, beta chain Pept. Mass Score Sequence 738.84 41 HLDNLK 912.01 61 VHLTDAEK 915.06 56 AAVNGLWGK 1090.24 41 VINAFNDGLK 1122.33 62 VVAGVASALAHK 1218.42 70 LVINAFNDGLK … Database (SwissProt) Actin MYTCVPIASEQUENCEMIMEWTPQSDLIRPTVCIMNERCVGGPYILCMTEND Amylase DSLIKRNYTIPMCSQIRECNHIPLMTRCHGYYKWSIALAINTQSFGIVRIVAMNKLPSSCRTIVGHWEDRICTMQNCISPPEKELIAVARGTSP …

Mascot Search Engine Uses mass spectrometry data to identify proteins from primary sequence databases MS/MS ion search Enzyme cleavage rules applied to sequences in the protein databases Experimental mass values compared with calculated fragment ion mass values Use scoring algorithm to identify the closest match or matches Probability based MOWSE scoring algorithm Databases MSDB – non-identical protein sequence DB NCBInr SwissProt dbEST – “single-pass” cDNA sequences or EST’s

A Typical Experiment Analysis of Liver / Brain Tissue Digest with Trypsin Liquid Chromatography LC eluting sample electrosprayed into Mass Spec APAAIGAYSQAVLVDR from 14.5 kDa translational inhibitor protein MS-MS on intense peak of a parent ion Raw data converted to a DTA file Mascot Search Generates Html file

Mascot output – Html file (avg. size 5 MB)

Motivation Mass spectrometry generates enormous amount of data Mascot returns on an average hundreds of proteins matching the mass spectral data Time consuming to analyze the mascot results manually Need different ways of looking at data Comparison of various data sets (experiments) No tools were available in public domain to analyze Mascot results

Protein Results Parser v3.0 Features Single File parsing Sequence coverage - with single file parsing Two-file comparison Multiple files Compare Combine Tool was developed using Perl/Tk Windows application

Single File Parsing

Screened Html Result (smaller file size)

Sequence Coverage

Two file Comparison

Results – Comparison of Two Experiments

Combine and Compare Feature Drug A Treatments (protein digest) Drug B Fractions (SCX) Triplicates (LC/MS/MS) 15 data files 15 data files Combine Combine Compare

Multiple File Comparison

Results – Multiple file comparison (sequential display)

Results – Multiple file comparison (tabular display)

Combine – Merging of multiple experiments

Results – combining multiple experiments + +

Conclusion Decreased data analysis and processing time. Search results reduced using user specified criteria in an automated way. Removal of low-scoring peptide matched greatly improves the accuracy of data interpretation A single result file can be processed multiple times, using a different set of parsing criteria each time, without the need to repeat the database search. The ability to compare two or more result files in an automated fashion makes determination of sample similarity a nearly effortless endeavor

Acknowledgements Dr. Randy Arnold – Manager and Research Scientist (Proteomics Research and Development Facility – Dept. of Chemistry) Dr. Haixu Tang – Asst. Prof, School of Informatics Abhijit Mahabal – Grad student, CS Dept. Kranthi Varala – Grad Student, Bioinformatics