Common parameters At the beginning one need to set up the parameters.

Slides:



Advertisements
Similar presentations
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Advertisements

1336 SW Bertha Blvd, Portland OR 97219
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
My contact details and information about submitting samples for MS
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Identification by Database Searching John Cottrell Matrix Science.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
Protein Identification via Database searching
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Protein Identification Using Tandem Mass Spectrometry
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Interpretation of Mass Spectra
Presentation transcript:

Common parameters At the beginning one need to set up the parameters.

Common parameters Most important: the input experimental spectra – Self-explaining. SequestMascotX!Tandem.DTAxXX.RAWX.MGFXXX.PKLXX.PKSX.mzDataXXX.mzXMLXXX.mzMLX

Common parameters Taxon, and database – Self-explaining. – E.g. samples form human cells should be queried against human protein database. – Sometimes Protein Sequence libraries are available.

Common parameters Parent mass tolerance If it is much smaller than the optimal would be: – the correct peptide can be eliminated from the search space – Execution time decreases Spectra comparison

Common parameters Parent mass tolerance If it is much bigger than the optimal would be: – decreases the significance of the scores, – makes execution time longer Spectra comparison

Common parameters Parent mass tolerance Usually is around 1Da. Spectra comparison

Common parameters Fragment ion match tolerance – Depends on the instrument accuracy. – If it is mach small than the optimum: matches will be lost 100% 0% 1 0

Common parameters Fragment ion match tolerance – If it is much smaller than the optimal would be: Correct matched peaks will be lost. Increases the FDR, increases the false negatives, decreases the sensitivity,

Common parameters If the fragment ion match tolerance is much bigger than the optimal would be: – Many theoretical peaks will match to an experimental peak – Increases the random scores and it decreases the statistical significance

Common parameters

Fragment ion tolerance (T) T = 0.4Da (correct) T = 0.05Da (too small)T = 2.0Da (too large)

Fragment ion tolerance (T) T = 0.4 (correct) T = 0.05 (too small)T = 2.0 (too large) 217 proteins 713 homologs 930 proteins 132 proteins 406 homologs 538 proteins 197 proteins 589 homologs 786 proteins

Common parameters Instrument – Some database search software's allow you to select the type of the instruments like ESI QUAD or Quad-TOF – This fine-tunes the search engine according to which fragment ion series will be used for scoring. – E.g.: Immonium ions, a series ions, b-, c-, x-, a- NH 3,z+H series, y-H 2 O etc.

Common parameters Enzyme, – the enzyme used for enzymatic digestion in the biological sample preparation. – This will be used for the in silico digestion of protein sequences for peptide generation.

Common parameters E-value cut off

Common parameters Ion mass search type – Monoisotopic (default) More accurate, – Average Might need larger fragment ion tolerance,

Common parameters Charge state – Too high charge state increases the FDR.

Common parameters Decoy search – Includes reversed dataset in the peptide identification. – Provides more accurate p-value and FDR estimation – Can double the search time

Common parameters Error tolerant search. Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated mass measurement error (should be seen in peptide view graphs, – Incorrect determination of precursor charge state – Peptide sequence is not in the database. – Missed cleavage & unexpected cleavage, – Unexpected chemical & post-translational modification.

Scores: Scores: Input data Experimental Spectra Protein sequence DB Score: 4 Peptide: AELDLNMTR Score: 32 Peptide: SHLITLLLFLFHSETICR Score: 3 Peptide: MEICRGLR Score: 15 Peptide: LLHGDPGEEDK Score: 4 Peptide: MDHPEDESHSEK Score: 5 Peptide: SAEDLEADK Score: 3 Peptide: SIEAKLTLR Input data Peptide assignment Validation Protein inference Quantitation Interpretation  Cn=(32-4)/32=0.875  Cn=(4-4)/4=0  Cn=(3-3)/3=0  Cn=(15-4)/15=0.733 Keep the peptide assignment that exceeds a certain limit.

>IPI:IPI |SWISS-PROT:P01127 MNRTFGQVVARLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSG GELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNR NVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKT PQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA Input data Experimental Spectra Scores: 1.2 Scores: 1.2 Input data Peptide assignment Validation Protein inference Quantitation Interpretation Spectra comparison: Protein sequence DB TFGQVVAR FGQVVAR GQVVAR QVVAR VVAR VAR AR TFGQVVA TFGQVV TFGQV TFGQ TFG TF Unexpected cleavages

>IPI:IPI |SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTR SHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGC CNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQE QRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA Input data Experimental Spectra Scores: 1.2 Scores: 1.2 Input data Peptide assignment Validation Protein inference Quantitation Interpretation Spectra comparison: Protein sequence DB Missed cleavages

>IPI:IPI |SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTR SHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGC CNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQE QRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA Input data Experimental Spectra Scores: Scores: Input data Peptide assignment Validation Protein inference Quantitation Interpretation Spectra comparison: Protein sequence DB Missed cleavages

>IPI:IPI |SWISS-PROT:P01127 MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTR SHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGC CNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQE QRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA Input data Experimental Spectra Scores: Scores: Input data Peptide assignment Validation Protein inference Quantitation Interpretation Spectra comparison: Protein sequence DB Missed cleavages

Common parameters Automatic error tolerant search. Chemical and Post-Translational Modifications (PTMs) Fixed modification (simply modifies the mass of the Amino Acid) Variable modifications (can modify the mass) Search engines iteratively insert all combination of the possible PTMs.

Common parameters Automatic error tolerant search. – more peptides can be indentified. –  enlarges the search space much more Increases the execution time Decreases the statistical significance, increases the FDR.

Common parameters Automatic error tolerant search. In order to reduce the search space two pass approach is applied. – 1 st pass: Identification of perfect peptides (no PTMs, perfect digestion) – 2 nd pass: Pass the proteins whose one of the peptides was identified in the 1 st pass. Extensive search in the reduced protein sequence, including missed and unexpected cleavage, PTMs, point mutations, etc.

Common parameters Output parameters – Mainly about formatting the results files. What and how many details want to see.

Common parameters Other program specific parameters. Different for X!tandem, Mascot, Sequest, etc.

X!Tandem

Outputs – Browsing the results

OMSSA’s search engine

OMSSA’s output

OMSSA’s result

Good spectrum, good score, bad annotation – Rare if the p-value is significant Good spectrum, bad score, bad annotation – Peptide might be modified, non-perfect digestion, not in the database.

Bad spectrum, bad score, bad annotation

Good spectrum, good score, good annotation

Trans-Proteomic Pipeline (TPP) Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.

Trans-Proteomic Pipeline (TPP)

Summary Protein identification from MS/MS data is not a black box. Always look at the results and understand how it