Proteomics Informatics –

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Protein Quantitation II: Multiple Reaction Monitoring
From Genome to Proteome Juang RH (2004) BCbasics Systems Biology, Integrated Biology.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
MS Calibration for Protein Profiles We need calibration for –Accurate mass value Mass error: (Measured Mass – Theoretical Mass) X 10 6 ppm Theoretical.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
A Database of Peak Annotations of Empirically Derived Mass Spectra
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
Protein Identification via Database searching
Bioinformatics Solutions Inc.
Phosphopeptide sequencing by MALDI-TOF/TOF of the C-terminal tail of AtPIP2;1.A, MS/MS spectrum of singly phosphorylated 277SLGSFRSAANV287 (m/z ).
MS Review.
Proteomics Informatics David Fenyő
Proteomic Approaches to Cancer Biomarkers
Interpretation of Mass Spectra I
A, high resolution MS/MS spectrum (lower panel) of 1435
NoDupe algorithm to detect and group similar mass spectra.
Protein Identification Using Tandem Mass Spectrometry
2D-LC-MS/MS analysis of tryptic digest of HEK293-SUMO3 cells (2 μg inj
Shotgun Proteomics in Neuroscience
Processing of fragment ion information in DTA files to remove isotope ions and noise. Processing of fragment ion information in DTA files to remove isotope.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Protein Identification David Fenyő
MS3 for peptide identification and mapping phosphorylation sites
ASPP2 hydroxylation is detectable in vivo and the amount of hydroxylation depends on FIH-1 abundance. ASPP2 hydroxylation is detectable in vivo and the.
Presentation transcript:

Proteomics Informatics – Protein identification I: searching protein sequence collections and significance testing (Week 4)

Peptide Mapping - Mass Accuracy

Peptide Mapping Database Size Human C. elegans S. cerevisiae

Peptide Mapping Cys-Containing Peptides Human C. elegans S. cerevisiae

Identification – Peptide Mass Fingerprinting Sequence DB Pick Protein Digestion MS All Peptide Masses Repeat for each protein MS Compare, Score, Test Significance Identified Proteins

ProFound Results

Database size

Mixtures

Peptide Fragmentation Mass Analyzer 1 Frag-mentation Detector Ion Source Mass Analyzer 2 b y

Identification – Tandem MS

Tandem MS – Sequence Confirmation K L E D F G S m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S K 1166 L 1020 E 907 D 778 663 534 405 F 292 G 145 S 88 b ions m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 113 113

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 129 129

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – de novo Sequencing 762 100 Amino acid masses 875 [M+2H]2+ % Relative Abundance 633 292 405 260 389 534 1022 504 663 778 907 1020 1080 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

Tandem MS – de novo Sequencing

Tandem MS – de novo Sequencing

Tandem MS – de novo Sequencing X X X …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… Peptide M+H = 1166 1166 -1079 = 87 => S SGF(I/L)EEDE(I/L)… SGF(I/L)EEDE(I/L)… 1166 – 1020 – 18 = 128 K or Q SGF(I/L)EEDE(I/L)(K/Q) …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… X X X

Tandem MS – de novo Sequencing Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information

Tandem MS – Database Search Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses all peptides Repeat for MS/MS Compare, Score, Test Significance

Search Results

Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.

Significance Testing - Expectation Values The majority of sequences in a collection will give a score due to random matching.

Significance Testing - Expectation Values Database Search List of Candidates M/Z Distribution of Scores for Random and False Identifications Extrapolate And Calculate Expectation Values List of Candidates With Expectation Values

Rho-diagrams: Overall Quality of a Data Set Expectation values as a function of score for random matching: Definition: Ei (i=0,-1,-2,…) is the number of spectra that has been assigned an expectation value between exp(i) and exp(i-1). For random matching:

Rho-diagram Random Matching

Rho-diagram Data Quality

Rho-diagram Parameters

How many fragments are sufficient? To identify an unmodified peptide? To identify a modified peptide? To identify an unmodified peptide? To identify an unmodified peptide? To identify a modified peptide? To localize a modification on a peptide?

How many fragments are sufficient? How does it depend on different parameters? Precursor mass Precursor mass error Fragment mass error Background peaks

Simulations using synthetic spectra Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides Seq. DB LSDPGVSPAVLSLEMLTDR

Simulations using synthetic spectra 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95 Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides LSDPGVSPAVLSLEMLTDR Seq. DB

Simulations using synthetic spectra Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides LSDPGVSPAVLSLEMLTDR 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95 6 8 9 7 5 8

Simulations using synthetic spectra Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95    201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89  8  6 8 9 7 5   

Simulations using synthetic spectra LSDPGVSPAVLSLEMLTDR Seq. DB Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides Is the identified sequence identical to the one used to generate the synthetic data? Seq. DB 201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89 Is it significant? Search engine Identification

Simulations using synthetic spectra Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95    201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89  8   6 8 9 7 5   Search engine Identification Seq. DB

Simulations using synthetic spectra Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95    201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89    6 8 9 7 5    9 Search engine Identification Seq. DB

Simulations using synthetic spectra 1825.92 1710.89 1609.84 1496.76 1365.72 1236.68 1123.59 1036.56 923.48 824.41 753.37 656.32 569.29 470.22 413.20 316.15 201.12 114.09 175.12 290.15 391.19 504.28 635.32 764.36 877.44 964.48 1077.56 1176.63 1247.67 1344.72 1431.75 1530.82 1587.84 1684.89 1799.92 1886.95 Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides Select a peptide sequence Calculate possible fragment ion masses Choose number of fragment ions to select Randomly select fragment ions Search and store result Average over peptides Prot. seq. LSDPGVSPAVLSLEMLTDR LSDPGVSPAVLSLEMLTDR LSDPGVSPAVLSLEMLTDR Is the identified sequence identical to the one used to generate the synthetic data?    201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89   6 8 9 7 5    8 Seq. DB 201.12 504.28 964.48 1123.59 1247.67 1496.76 1530.82 1710.89 Is it significant? Search engine Identification

Simulations using synthetic spectra Each point is an average of 50 peptides. Average over peptides Each point is an average of searches with 20 randomly generated synthetic fragment mass spectra. Threshold

Critical number of fragment masses

Small peptides are slightly more difficult to identify mprecursor Dmprecursor = 1 Da Dmfragment = 0.5 Da No modification

A lower precursor mass error requires fewer fragment masses for identification of unmodified peptides mprecursor = 2000 Da Dmfragment = 0.5 Da No modification

The dependence on the fragment mass error is weak below a threshold for identification of unmodified peptides Dmfragment mprecursor = 2000 Da Dmprecursor = 1 Da No modification

A moderate number of background peaks can be tolerated when identifying unmodified peptides mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da No modification

A large number of background peaks can be tolerated if the fragment mass is accurate mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.01 Da No modification

Identification of phosphopeptides is only slightly more difficult mprecursor = 2000 Da Dmprecursor = 1 Da Dmfragment = 0.5 Da

Proteomics Informatics – Protein identification I: searching protein sequence collections and significance testing (Week 4)