Download presentation
Published byLiana Soulsby Modified over 10 years ago
1
Proteogenomic Novelty in 105 TCGA Breast Tumors
Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer Research Center Washington University New York University CPTAC Data Jamboree April 16, 2014 National Institutes of Health Bethesda, Maryland
2
Tumor-specific protein databases for MS/MS-spectra searches
Kelly Ruggles, David Fenyo, NYU
3
QUILTS: Treatment of different variant types
In alternates frameshifts Unannotated Alternative Splicing 1 frame translation 1 frame translation In frameshifts db 1 frame translation Novel Partially Novel Splicing Novel Novel downstream: 1 frame translation Novel upstream: 6 frame translation In other db Completely Novel Expression 6 frame translation Fusion Genes 6 frame translation In variants db Variants 1 frame translation
4
Proteogenomic mapping: Genetic alterations can be observed on protein level (105 tumors)
| work in progress Low confidence thresholds applied to Genome calls Variants: >2 QUAL phred-scaled quality score in ALT Alternative splices: >1 read This document defines the quality value as: "QUAL phred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong). If ALT is ”.” (no variant) then this is -10log_10 p(variant), and if ALT is not ”.” this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. (Numeric)” Low thresholds applied to Genome calls (>1 read RNA-seq, >2 QUAL phred-scaled Variants) High thresholds applied to Proteome calls (<0.1% FDR) % of frameshifts, alternative splices & single AA variants observable by proteomics mRNA may not be translated or at low abundance Proteome coverage is incomplete
5
Global proteome and phosphoproteome
discovery workflow for TCGA breast tumors 1 mg total protein per tumor Internal reference: equal representation of basal, Her2 and Luminal A/B subtypes
6
Serial Search Strategy with Personalized Databases
Concatenated FASTA files, 105 patients Altered proteins Removed redundant entries 25,776,160 Spectra (105 patients) (36 iTRAQ experiments) (25 LC-MS/MS runs / experiment) > Canonical – Variant Patient 1 SIGNALINGPATHWAHREGULATOR >Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR Variants: 133,241 3247 Variants Matched RefSeq-Hs-7/2013: 31,852 > Canonical – Alternate splice Patient 1 SIGNALINGREGULATOR >Canonical – Alternate splice Patient 2 SIGNALINGPATHREGULATOR Alternate Spliceforms: 67,853 > Canonical Protein SIGNALINGPATHWAYREGULATOR 197 Splice Junctions Matched 11,328,955 Matched Spectra (44% of total) (1% FDR) 14,447,205 Leftover Spectra > Canonical – Truncation Patient 1 SIGNALINGPATFRAMESHIF >Canonical – Novel Exon Insert Patient 2 SIGNALINGPATHWAYINSERTREGULATOR >Canonical – Partial Exon Deletion Patient 3 SIGNALINGPATHWAYULATOR Frameshifts: 19,944 22 Truncation Overlaps Matched 11 Insertion Overlaps Matched 49 Deletion Junctions Matched Concatenated: 252,890 Low confidence thresholds for Genome calls Variants: >2 QUAL score (phred-scaled) Alternative splices, frameshifts: >1 read High confidence for Proteome IDs <0.1% FDR peptide spectrum match
7
Frequency of Single AA Variants, Alternative Splices, Frameshifts Across Patients
Somatic variants are less frequent than germline variants Some germline variants are very common Rare germline variants present in RefSeq Some alternative splice forms and frameshifts are very common Should be in RefSeq Genome & Transcriptome Data very common
8
1 experiment: 3 individual patients + 1 Common control (40 patients)
How many RNA-seq reads to yield a proteomics observation of an alternate splice or frameshift? 1 experiment: 3 individual patients + 1 Common control (40 patients) 197 Alternative splices 82 Frameshifts Max # Reads 17 observed in >1 Expmt Max # Reads 19 observed in >1 Expmt
9
Present in only 1 Common control member
Frameshift Truncation: ras-Related protein Rab-15 Observed only in Proteomics Exp 3 E159 Max RNA-Seq Reads: 1 Present in only 1 Common control member
10
Present in only 1 Common control member
Frameshift Truncation: Cysteine-rich protein 1 Observed in 9 Proteomics Experiments E159 Max RNA-Seq Reads: 1 Present in only 1 Common control member
11
Present in only 1 Common control member
Frameshift Truncation: Cullin-2 isoform a Observed in 3 Proteomics Experiments Max RNA-Seq Reads: 1 Present in only 1 Common control member E159
12
1 experiment: 3 individual patients + 1 Common control (40 patients)
Many missing observations even when transcript present in many common control members 1 experiment: 3 individual patients + 1 Common control (40 patients) Alternative splices Frameshifts
13
Majority of Alternative Splice Junctions and Frameshifts observed in >1 Proteomics Experiment
Pie chart 1 experiment: 3 individual patients + 1 Common control (40 patients) Alternative splices Frameshifts 150/197 observed in >1 experiment 44/82 observed in >1 experiment
14
Next steps: Examine “other” category Fusion genes (junction-spanning)
Novel exon splicing (2 sides) Completely novel gene Use updated somatic variants from QUILTS Define genomic data thresholds suitable for proteomic observations RNA-seq: Min read count Variant calling: phred-scaled QUAL score Sort out Germline/Somatic variant call mix status across patients
15
Summary of Proteome Re-processing 105 TCGA patients- 36 iTAQ experiments
16
Changes in Re-processing of TCGA data
Extraction Centroiding Use Xcalibur , instead of SM. iTRAQ ratios are little changed, intensities lower by ~5x (will more closely match NIST central analysis pipeline) Precursor MH+ range expanded from to Searches Replace database with RefSeq version used as reference for the personalized database generation. database content/size very similar, protein identifiers change from gi numbers to RefSeq numbers. Allowed modifications will be expanded. Increases the # of identified spectra by ~10%. From Full iTRAQ, M-ox, N-deam, q-pyro To iTRAQ-Full-Lys-only, M-ox, N-deam, q-pyro, c-pyro, Ac-nTermProt Autovalidation Proteome initial processing, peptide FDR per experiment : %, but overall peptide FDR across all 36 experiments: ~5.5% Phosphoproteome initial processing , peptide FDR per experiment : % but overall peptide FDR across all 36 experiments: ~7.2%. Changes will seek to bring the overall peptide FDR’s down to ~1% require multiple observations (protein, P-site) across experiments raise score thresholds Quantitation Will use PIP(precursor ion purity) filtering to exclude from quantitation but not identification. PIP > 50% excludes ~7.8% of spectra. Filtering reduces standard deviations on protein & phosphosite level iTRAQ ratios
17
Transcript present in 18/40 Common Control Members
Y Chromosome Frameshift - CD99 antigen Observed in 36 Proteomics Experiments E159 Partial exon deletion splice, plus frameshift truncation Max RNA-Seq Reads: 12 Transcript present in 18/40 Common Control Members
18
Acknowledgments Broad Institute/FHCRC Steve Carr Karl Clauser
Michael Gillette Jana Qiao Philipp Mertins DR Mani Eric Kuhn Sue Abbatiello Amanda Paulovich Pei Wang Sean Wang Ping Yan Washington U./MD Anderson/NYU Sherri Davies Matthew Ellis David Fenyo Kelly Ruggles Reid Townsend Li Ding NCI Staff Emily Boja Mehdi Mesri Rob Rivers Chris Kinsinger Henry Rodriguez Funding National Cancer Institute
19
Single AA Variants may be Somatic in Some Patients, Germline in Others
Nov 2013 Genomic Highly Interesting, should correlate with prognosis and/or subtype. May correlate with prognosis? Might as well be canonical isoforms? Detectable, but too rare to indicate biology. Proteomic G&S mix genomic variants have the highest observation rate by Proteomics. Genomic variants present in only a single patient are observable by Proteomics
20
Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline
81 Patients Nov 2013 Genomic Proteomic Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity? Is there some cancer biology involved for high S/G ratio variants? Are patients with germline form more cancer prone? Does somatic form correlate with prognosis, development of drug-resistance?
21
Wide Range of Somatic Single AA Variants/Patient
Skip Low confidence thresholds applied to calls Variants: >2 QUAL score (phred-scaled) Alternative splices: >1 read
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.