Download presentation
Presentation is loading. Please wait.
Published byAndra Hawkins Modified over 9 years ago
1
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at: www.proteomesoftware.com
2
Beyond the realm of manual interpretationBeyond the realm of manual interpretation How do we determine what is a valid protein identification?How do we determine what is a valid protein identification? Shotgun proteomics Analysis of complex mixtures 1.2 Million Spectra!!! Whole cell extract 10,000+ proteins 600,000 peptides Scaffold: Why do we need it?
3
Statistical Analysis Using Scaffold All search engines use different scoring All search engines use different scoring algorithms Can not directly compare results algorithms Can not directly compare results Many search engines results are described by Many search engines results are described by more than one value more than one value Examples: Mascot Ion Score and Identity Score Sequest Xcorr and DeltaCn
4
Peptide Prophet* Creates a universal score (discriminant score) for the search Creates a universal score (discriminant score) for the search engine result (e.g. XCorr and DeltaCn are compressed to one engine result (e.g. XCorr and DeltaCn are compressed to one score for SEQUEST results, Ion score and Identity score for score for SEQUEST results, Ion score and Identity score for Mascot results) Mascot results) Plots a histogram of the discriminant scores and Plots a histogram of the discriminant scores and calculates a bimodal distribution based on standard calculates a bimodal distribution based on standard statistics to differentiate between correct and incorrect hits statistics to differentiate between correct and incorrect hits Computes the probability that the match is correct at a Computes the probability that the match is correct at a given discriminant score given discriminant score *Nesvizhskii, A. I. et al, Anal. Chem. 2003, 75, 4646-4658 Statistical Analysis Using Scaffold
5
0 20 40 60 80 100 120 140 160 180 200 -3.9-2.3-0.70.92.54.15.77.3 Discriminant score (D) Number of spectra in each bin Histogram of discriminate scores Statistical Analysis Using Scaffold
6
Assumes a mixture of standard statistical distributions “incorrect” “correct” Statistical Analysis Using Scaffold
7
“incorrect” “correct” Peptide Probability Threshold Statistical Analysis Using Scaffold
8
9% 19%7% 34% 5% 4%22% SEQUEST X!Tandem One Search Engine may not be enough Mascot Statistical Analysis Using Scaffold www.proteomesoftware.com
9
Peptide Prophet statistics are applied separately for each search engine result (i.e. Mascot, SEQUEST, each search engine result (i.e. Mascot, SEQUEST, and X!Tandem) and X!Tandem) Scaffold Merger combines the peptide probabilities Scaffold Merger combines the peptide probabilities from each search engine to generate a protein from each search engine to generate a protein probability probability The probability of identifying a spectrum + The probability of agreement between search engines Protein Probability Statistical Analysis Using Scaffold
10
Advantages using of Scaffold Allows you to choose a statistical error rate by setting probability thresholds Allows you to choose a statistical error rate by setting probability thresholds Allows you to compare and combine results from different experiments and different search engines Allows you to compare and combine results from different experiments and different search engines Allows sharing of raw data and search results Allows sharing of raw data and search results Accepted as a suitable statistical method to validate large datasets Accepted as a suitable statistical method to validate large datasets Statistical Analysis Using Scaffold
11
This is the Samples view
12
List of all the proteins found in your samples Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries Homologous proteins (proteins matched to the same peptides) are shown. You can directly like out to database entries
13
General Rule Explain the spectral data with the smallest set of proteins A B Protein A and Protein B share all the same peptides so they will be grouped together How does Scaffold Deal with peptides that can be assigned to more than one protein?
14
General Rule Explain the spectral data with the smallest set of proteins Protein A and protein B each have one unique peptide they will be listed separately only if the peptide probability is > 50% How does Scaffold Deal with peptides that can be assigned to more than one protein? A B
15
How does Scaffold Deal with peptides that can be assigned to more than one protein? General Rule Explain the spectral data with the smallest set of proteins Protein B has two unique peptides it will be listed separately A B
16
Scaffold will extract GO terms from NCBI annotations
17
Gene Ontology “GO” terms Controlled vocabulary containing consistent descriptions of gene products in different descriptions of gene products in different databases databases Describe gene products in terms of their Describe gene products in terms of their associated biological processes, cellular associated biological processes, cellular components and molecular functions in a species components and molecular functions in a species independent manner independent manner Gene Ontology Projecthttp://www.geneontology.org/GO.doc.shtmlhttp://www.geneontology.org/GO.doc.shtml
18
List of samples
19
Color coded to represent probability that protein identification is correct Color coded to represent probability that protein identification is correct Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined Probability thresholds for peptide and protein identifications and required number of unique peptides can be defined
20
This is the Proteins view
21
Spectrum of each peptide labeled with y and b ions which can be used for manual validation Spectrum of each peptide labeled with y and b ions which can be used for manual validation
22
Manual Spectrum Evaluation Search engine scores Is peptide found by more Search engine scores Is peptide found by more than one search engine? than one search engine? Mascot ion score > 40 SEQUEST Xcorr > 2 (+2 ion), 2.5 (+3 ion) deltaCn > 0.2 deltaCn > 0.2 Good signal-to-noise Good signal-to-noise Long stretches of y and/or b ions Long stretches of y and/or b ions All dominant peaks are assigned as y or b ions All dominant peaks are assigned as y or b ions Fragmentation chemistry Fragmentation chemistry N-terminal cleavage at P dominate y-ion C-terminal cleavage at D and E dominate b-ion Peptides containing W abundant y-ions S and T tend to lose water (-18 Da) R, N, and Q tend to lose ammonia (-17 Da)
23
Peptide Sequence IAELAGFSVPENTK +2 charge on parent peptide Good Spectrum SEQUEST: Xcorr = 2.61 deltaCn = 0.4 deltaCn = 0.4 Dominant y-ion at N-terminal cleavage of P Mascot: Ion Score = 60.1 Identify Score = 37.3 Identify Score = 37.3 Good coverage of y and b ion series Good signal-to-noise
24
Bad Spectrum Peptide Sequence YPLADYALTPDMAIVDANLVMDMPK +3 charge on parent peptide SEQUEST: Xcorr = 2.26 deltaCn = 0.2 deltaCn = 0.2 Mascot: Ion Score = 9.93 Identity Score = 37.3 Identity Score = 37.3 Poor signal-to-noise Poor coverage of y and b ion series Multiple unassigned peaks
25
This is the Statistics view
26
Score Histogram Blue indicates “incorrect” proteins Protein is “correct” if it passes the peptide and protein probability and minimum # peptide filters probability and minimum # peptide filters. Scaffold Statistics View Red indicates “correct” proteins Important! Must have enough data to fit two distributions for the statistics to be valid.
27
Scaffold Statistics View With only 1 unique peptide (95% peptide prob) the maximum protein probability is <90%. With at least 2 unique Peptides (95% peptide prob) the maximum protein probability is ~100%.
28
SEQUEST only Scaffold Statistics View Missed IDs
29
Mascot only Scaffold Statistics View Missed IDs
30
Scaffold Statistics View Using both Mascot and Sequest results in more “correct” protein identifications Mascot only Sequest only Both
31
This is the Publish View
32
http://www.mcponline.org/misc/ParisReport_Final.shtml Journal of Molecular and Cellular Proteomics Publication Guidelines for Proteomic Data
33
Name and version of software used to extract peak list Name and version of software used to extract peak list Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Name and version of database searching software (Mascot, Sequest, Spectrum Mill, or X! Tandem) Values of all search parameters used (enzyme, modifications, mass tolerance, etc.) Values of all search parameters used (enzyme, modifications, mass tolerance, etc.) Name and size of the database searched (Swisprot or NCBI and the number of sequence entries) Name and size of the database searched (Swisprot or NCBI and the number of sequence entries) Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings) Name and version of any additional software used for statistical analysis and an explanation of the analysis (Scaffold, #peptide requirements, probability settings) Data Analysis Publication Guidelines for Proteomic Data
34
Publication Guidelines for Proteomic Data Each Protein Identified Accession number Sequence coverage and total number of unique peptides Sequence coverage and total number of unique peptides Each Peptide Identified Peptide sequence noting any modifications or missed cleavages Peptide sequence noting any modifications or missed cleavages Parent peptide ion mass and charge Parent peptide ion mass and charge All search engine scores All search engine scores
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.