Presentation is loading. Please wait.

Presentation is loading. Please wait.

AnalysisXML Results Design

Similar presentations


Presentation on theme: "AnalysisXML Results Design"— Presentation transcript:

1 AnalysisXML Results Design
Sean L. Seymour April 29, 2008

2 Principles Results should have a constant general framework.
Handling use case-specific content via cvParams as much as possible reduces the need to spawn subschemas. Even if many subschemas must be defined, parallel structure and language is valuable and will ease implementation and decrease dialects. It is important to define an adequately flexible frame for results in the first AnalysisXML. The time savings of being very specific now will cost us later when trying to extend.

3 Why These ‘Pseudo Instance Documents’?
The goal is communication – capture use cases clearly and concisely in a way that everyone can contribute to design. Shows hierarchy of information only. Not proper XML, although intended to show the core framework of elements. These parts are indicated in black font. Whether something is an attribute or child element doesn’t matter. Plumbing like identifiers are assumed (not indicated). Content specific to each use case variation is indicated in white font at the same level. This content would generally be handled as cvParams as much as possible. The point is to see the big picture, free of the ‘syntax fat’ of a real example, but with specific detail not present in a schema or UML diagram.

4 Identification-Only Use Cases

5 Use case: Peptide ID result from MS/MS spectra by database search
Description: This is basic bottom up peptide ID. Typically, it is assumed that a single analyte (peptide) is detected from a single spectrum, and there are generally one or more hypotheses (‘hits’ or ‘matches’ in vernacular) which are reported as alternate explanations for the spectrum. Use case: Peptide ID result from MS/MS spectra by database search Analyte Detection Result Specifics: One peptide detected, generally assumed to be one peptide detected per MS/MS spectrum. Specifics: Probably nothing here, but could capture pointers to spectra evidence here for use cases where both quant and ID are from the same set of spectra. Probably better not to do this, since there will be cases where this isn’t true and consistency giving few subschemas is probably worth more than the file size gain. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details if anything specific to this specific result – genearally nothing, but might capture sequence tag(s) here? Properties of the overall ID result Specifics: Here you could capture a probability at least one of the peptide hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum (evidence common to all hypotheses) Identification Hypothesis Specifics: PEPTIDE 1 - Pointer to a peptide in the conceptual molecule table or a description of the peptide. Properties of the ID hypothesis Specifics: Search engine-specific scores, aligned scores, ‘semitryptic’ ID Hypothesis Evidence Specifics: Fragmentation evidence in standard syntax to indicate matched ions for this peptide Identification Hypothesis Specifics: PEPTIDE 2 - … Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: PEPTIDE 3 - … Properties of the ID hypothesis ID Hypothesis Evidence

6 Use case: Peptide ID result from MS/MS spectra by sequence tag search
Description: This is a variant of database search that has several variations itself. The main difference is the use of one or more sequence tags as filters to constrain the database search to only results containing those sequences. Thus, the key difference may be that each analyte detection has some specific parameters beyond the general method – the tags. Use case: Peptide ID result from MS/MS spectra by sequence tag search Analyte Detection Result Specifics: One peptide detected, generally assumed to be one peptide detected per MS/MS spectrum. Specifics: Probably nothing here, but could capture pointers to spectra evidence here for use cases where both quant and ID are from the same set of spectra. Probably better not to do this, since there will be cases where this isn’t true and consistency giving few subschemas is probably worth more than the file size gain. Properties of the overall detection Identification Result Specifics: Capture sequence tag(s) here as the only parameters specific to this analyte detection. Properties of the overall ID result Specifics: Here you could capture a probability at least one of the peptide hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum (evidence common to all hypotheses) Identification Hypothesis Specifics: PEPTIDE 1 - Pointer to molecule or special syntax to indicate combinations of sequence and mass tags. Properties of the ID hypothesis Specifics: Search engine-specific scores, aligned scores, ‘semitryptic’ ID Hypothesis Evidence Specifics: Fragmentation evidence in standard syntax to indicate matched ions for this peptide Identification Hypothesis Specifics: PEPTIDE 2 - … Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: PEPTIDE 3 - … Properties of the ID hypothesis ID Hypothesis Evidence

7 Analyte Detection Result
Description: This is the basic ‘protein ID’ half of bottom up proteomics. It is assumed that peptides exist elsewhere (same or different AnalysisXML file) as AnalyteDetections. Protein inference asserts a protein has been detected based on these peptides and hypothesizes one or more possible identifications of the detected protein species. Use case: Protein detection by protein inference from peptide IDs from MS/MS Analyte Detection Result Specifics: One protein species detected. This is one ‘protein group’. Specifics: Probably nothing here. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details (protocol application) – could be the same as the peptide search engine or a separate tool like Protein Prophet. Properties of the overall ID result Specifics: Here you could capture a probability that at least one of the protein hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum Identification Hypothesis Specifics: ACCESSION 1 - Pointer to a protein in the conceptual molecule table. Properties of the ID hypothesis Specifics: Scores for this accession, role of accession – ex. equivalent top-ranked, subset, etc. Specifics: Pointers to specific peptide hypotheses in specific peptide analyte detections, these having attributes as well – ex. ‘is bold red’ in Mascot ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 2… Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 3… Properties of the ID hypothesis ID Hypothesis Evidence

8 Analyte Detection Result
Description: PMF uses lists of observed peptide masses from MS spectra to directly search for matches (no peptides are fragmented), while the mixed PMF/fragment search uses PMF with the additional input of MS/MS fragmentation evidence for one or more of the peptides. The use case is actually closest to the protein inference use case. Use case: Protein detection by PMF or mixed PMF and MS/MS fragmentation Analyte Detection Result Specifics: One protein species detected. This is one ‘protein group’. Specifics: Probably nothing here. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details (protocol application). Properties of the overall ID result Specifics: Here you could capture a probability that at least one of the protein hypotheses is correct, for example. ID Evidence Specifics: Pointer to MS spectrum, pointer to any MS/MS spectra in the case of fragmentation or push this to the specific proteins this fragmentation favors? Identification Hypothesis Specifics: ACCESSION 1 - Pointer to a protein in the conceptual molecule table. Properties of the ID hypothesis Specifics: Scores for this accession, role of accession – ex. equivalent top-ranked, subset, etc. Specifics: Pointers directly to conceptual peptide molecules or to specific peptide hypotheses in specific peptide analyte detections for MS/MS cases. ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 2… Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 3… Properties of the ID hypothesis ID Hypothesis Evidence

9 INCOMPLETE Use case: Metabolite ID result from MS/MS spectra
Description: This is… Use case: Metabolite ID result from MS/MS spectra Analyte Detection Result Specifics: Specifics: Properties of the overall detection Identification Result Specifics: Properties of the overall ID result Specifics: ID Evidence INCOMPLETE Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis Specifics: ID Hypothesis Evidence Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence

10 INCOMPLETE Use case: Glycan sequencing Analyte Detection Result
Description: This is… Use case: Glycan sequencing Analyte Detection Result Specifics: Specifics: Properties of the overall detection Identification Result Specifics: Properties of the overall ID result Specifics: ID Evidence INCOMPLETE Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis Specifics: ID Hypothesis Evidence Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence

11 INCOMPLETE Use case: Lipid identification Analyte Detection Result
Description: This is… Use case: Lipid identification Analyte Detection Result Specifics: Specifics: Properties of the overall detection Identification Result Specifics: Properties of the overall ID result Specifics: ID Evidence INCOMPLETE Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis Specifics: ID Hypothesis Evidence Specifics: Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Specifics: Properties of the ID hypothesis ID Hypothesis Evidence

12 Quantitation-Only Use Cases

13 Use case: Label-Free Quantitation by MS Feature Maps
Description: This result captures only the detection of one feature in one MS map. This feature would then be related to features in other MS maps by alignment in a different result type. The feature does not have an identification, but could later be linked to one. Use case: Label-Free Quantitation by MS Feature Maps Specifics: This could be either an isotope cluster in m/z space or a mass feature if mass reconstruction is done to establish charge series relationships between isotopic clusters. Analyte Detection Result Properties of the overall detection Specifics: Probably nothing here other than the identifier. Quantification Result Specifics: Pointer to separate quant protocol details if specific to this detection – generally not. Properties of the overall quant result Specifics: If quant data is derived from more than one method, you could report average or final values at this level. Quant Measurement Evidence Specifics: Pointer to set of MS spectra or zoomed subset spectrum to focus on the quant evidence. Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Intensities, errors, S/N, etc. Some content specific to tools, some common (hopefully a lot). Quantitative Measurement Specifics: If multiple methods, point to specific one here Specifics: If isotope cluster done separately and then combined, each quant measurement captures one cluster and the top level captures the combination. Properties of the quant measurement

14 Use case: Intact protein analysis
Description: Intact proteins measured by MS spectra yield intesity measurements and intact mass measurements. There can be subsequent analyses, but this result should be captured. Use case: Intact protein analysis Specifics: This typically comes from a single isotopic cluster or blob via MALDI or a charge series deconvolution from ESI. Analyte Detection Result Properties of the overall detection Specifics: Intact mass measurements could be reported here including errors. If there are any specific parameter settings for this analyte, they would go here. There may be some in this case. Quantification Result Specifics: Any specific quant parameter settings for this detection? Properties of the overall quant result Specifics: If quant data is derived from more than one method, you could report average or final values at this level. Quant Measurement Evidence Specifics: Pointer to set of MS spectrum or spectra if several. Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Integration intensities, errors, S/N, etc. Some content specific to tools, some common (hopefully a lot). Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Integration intensities, errors, S/N, etc. Some content specific to tools, some common (hopefully a lot).

15 Mixed ID/Quant Use Cases

16 Analyte Detection Result
Description: This is almost the same as without quantitation. Assuming it is reasonable to consider isobaric variant peptides as detection of essentially one analyte, the quantitative information is attained at the same time from part of the same MS/MS spectra that support the ID process. This is in contrast to MS-based quantitation where the heavy and light peptides clearly cannot be treated as the same analyte. Use case: Peptide ID result from MS/MS spectra with isobaric label quantitation Specifics: One peptide detected, generally assumed to be one peptide detected per MS/MS spectrum. Analyte Detection Result Specifics: Probably nothing here, but could capture pointers to spectra evidence here for use cases where both quant and ID are from the same set of spectra. Probably better not to do this, since there will be cases where this isn’t true and consistency giving few subschemas is probably worth more than the file size gain. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details (protocol application) Properties of the overall ID result Specifics: Here you could capture a probability at least one of the peptide hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum (evidence common to all hypotheses) Identification Hypothesis Specifics: PEPTIDE 1 - Pointer to a peptide in the conceptual molecule table or a description of the peptide. Properties of the ID hypothesis Specifics: Search engine-specific scores, aligned scores, ‘semitryptic’ ID Hypothesis Evidence Specifics: Fragmentation evidence in standard syntax to indicate matched ions for this peptide Identification Hypothesis… Specifics: PEPTIDE 2 - … Quantification Result Specifics: Pointer to separate quant protocol details if needed. Properties of the overall quant result Specifics: If quant data is derived from more than one method, you could report average or final values at this level. Quant Measurement Evidence Specifics: Pointer to same spectrum as ID or zoomed subset spectrum to focus on the quant evidence Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Intensities, ratios, errors, P-value for differential expression, etc. Some content specific to tools, some common (hopefully a lot).

17 Use case: Peptide-level aggregation of ID and/or quantitation
Description: The fundamental ID and peptide quant from isobaric label approaches stem from single spectra. This use case combines multiple observations of the same or similar things to produce aggregate results. This is valuable because multiple spectra of the same thing are not truly independent observations. This could aggregate to the independent unit. Use case: Peptide-level aggregation of ID and/or quantitation Specifics: One distinct peptide detected (not one spectrum instance) Analyte Detection Result Specifics: Could capture statistics that span the observations such as observed counts of spectra or different modified forms related to the peptide. Properties of the overall detection Identification Result Specifics: Properties of the overall ID result Specifics: Here you could capture a probability at least one of the peptide hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum (evidence common to all hypotheses) Identification Hypothesis Specifics: PEPTIDE 1 - Pointer to a peptide in the conceptual molecule table or a description of the peptide. Properties of the ID hypothesis Specifics: Combined scores, probabilities, etc. Specifics: Pointers to all supporting peptide hypothesis instances in prior detections from single spectra. ID Hypothesis Evidence Identification Hypothesis… Specifics: PEPTIDE 2 - … Quantification Result Specifics: Specifics: Quant is derived from several measurements, so report final values here – Intensities, ratios, errors, P-values, etc.. Properties of the overall quant result Quant Measurement Evidence Specifics: nothing here? Quantitative Measurement Specifics: Properties of the quant measurement Specifics: Pointers to each separate observation in prior peptide detections (quant for each reported there, so don’t duplicate here)

18 Analyte Detection Result
Use case: Protein detection by protein inference from peptide IDs from MS/MS with quantitation by spectral counting or similar. Description: This is essentially the same as the basic protein inference use case, but one of several methods for estimating rough quantitation is used at the same time, relying on the same evidence as . Analyte Detection Result Specifics: One protein species detected. This is one ‘protein group’. Specifics: Probably nothing here. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details if anything specific to this specific result – genearally nothing. Properties of the overall ID result Specifics: Here you could capture a probability that at least one of the protein hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum Identification Hypothesis Specifics: ACCESSION 1 - Pointer to a protein in the conceptual molecule table. Properties of the ID hypothesis Specifics: Scores for this accession, role of accession – ex. equivalent top-ranked, subset, etc. Specifics: Pointers to specific peptide hypotheses in specific peptide analyte detections, these having attributes as well – ex. ‘is bold red’ in Mascot ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 2… Quantification Result Specifics: Pointer to separate quant protocol details if needed. Properties of the overall quant result Specifics: If quant data is derived from more than one method, you could report average or final values at this level. Quant Measurement Evidence Specifics: Not clear what to do here – depends on exact method. Point to spectra directly or peptide analyte detections? Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Absolute or relative quant intensity estimate for protein, units, error

19 Analyte Detection Result
Use case: Protein detection by protein inference from peptide IDs from MS/MS with quantitation by isobaric labels. Description: This is the basic ‘protein ID’ half of bottom up proteomics. It is assumed that peptides exist elsewhere (same or different AnalysisXML file) as AnalyteDetections. Protein inference asserts a protein has been detected based on these peptides and hypothesizes one or more possible identifications of the detected protein species. Analyte Detection Result Specifics: One protein species detected. This is one ‘protein group’. Specifics: Probably nothing here. Properties of the overall detection Identification Result Specifics: Pointer to the search parameter details if anything specific to this specific result – genearally nothing. Properties of the overall ID result Specifics: Here you could capture a probability that at least one of the protein hypotheses is correct, for example. ID Evidence Specifics: Pointer to MSMS spectrum, pointer to survey MS spectrum Identification Hypothesis Specifics: ACCESSION 1 - Pointer to a protein in the conceptual molecule table. Properties of the ID hypothesis Specifics: Scores for this accession, role of accession – ex. equivalent top-ranked, subset, etc. Specifics: Pointers to specific peptide hypotheses in specific peptide analyte detections, these having attributes as well – ex. ‘is bold red’ in Mascot ID Hypothesis Evidence Identification Hypothesis Specifics: ACCESSION 2… Quantification Result Specifics: Pointer to analyte-specific quant protocol details if needed. Properties of the overall quant result Specifics: If quant data is derived from more than one method, you could report average or final values at this level. Quant Measurement Evidence Specifics: Pointers to the detected peptides, which have the quant evidence that is combined to report protein values. Quantitative Measurement Specifics: If multiple methods, point to specific one here Properties of the quant measurement Specifics: Intensities, ratios, errors, P-value for differential expression, etc. Some content specific to tools, some common (hopefully a lot).

20 Use case: MIDAS – peptide ID by MS/MS fragmentation triggered by MRM scans, which give quantitation
Description: MIDAS is MRM Initiated Detection And Sequencing. Essentially it is an MRM workflow, but the identification is confirmed or rejected by triggering an MS/MS scan when the MRM has good signal. There are other approaches to confirm the ID without triggering an MS/MS that would align with this use case. Analyte Detection Result Properties of the overall detection Identification Result Properties of the overall ID result ID Evidence INCOMPLETE Identification Hypothesis Properties of the ID hypothesis ID Hypothesis Evidence Identification Hypothesis Quantification Result Properties of the overall quant result Quant Measurement Evidence Quantitative Measurement Properties of the quant measurement

21 Quantitation Use Cases Requiring Comparison to an Internal or External Standard

22 Things that fall into this category:
MS-based quantitation with isotopic labeling: ICAT ICPL SILAC AQUA mTRAQ O-18/O-16 Many others like these Absolute quantitation using a calibration curve: Metabolite quantitation via MRM scans Peptide quantitation via MRM scans

23 What’s different about quantitation in these cases?
The quantitative result is the property of a comparison of two or more analyte detection results. Strictly speaking, you could argue this is true of isobaric labeling schemes as well, however it will may be preferable to accept the slight inaccuracy of considering all isobaric labeled variants as the same analyte detection for efficiency reasons. The key issue not present in simple signal or isobaric quantitation is the need to associate detected analytes. The association act can have uncertainty that must be captured Ex. How sure are you that two clusters are really the heavy and light SILAC labeled versions of the same peptide? Ex. How sure are you that two features in two different MS maps are properly aligned (are the same thing)? (label-free quantitation) There may be known concentrations associated with detections. Standard curve points.

24 Proposed solution: <AssociationResult>
There is no quantitative result without the comparison of two or more analyte detection result. Strictly speaking, you could argue this is true of isobaric labeling schemes as well, however it will probably be preferable to accept the slight inaccuracy of considering all isobaric labeled variants as the same analyte detection for efficiency. The key issue not present in simple signal or isobaric quantitation is the need to associate detected analytes. The association act may not be perfect, and thus, we must enable capturing uncertainty in association. Ex. How sure are you that two clusters are really the heavy and light SILAC labeled versions of the same peptide?

25 Analyte Detection Result Model Association Result Proposal
<AnalyteDetectionResultSet> <AssociationResultSet> <AnalyteDetectionResult> <AssociationResult> <IdentificationResult> <AssociationQualitativeResult> <IdentificationEvidence> <AssociationEvidence> <IdentificationHypothesis> <AssociationHypothesis> <IdentificationHypothesis> <AssociationHypothesis> <QuantitationResult> <AssociationQuantitativeResult> <QuantitationEvidence> <QuantitationEvidence> <QuantitationMeasurement> <QuantitationMeasurement> <QuantitationMeasurement> <QuantitationMeasurement>


Download ppt "AnalysisXML Results Design"

Similar presentations


Ads by Google