Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collin Tokheim, Rachel Karchin  Cell Systems 

Similar presentations


Presentation on theme: "Collin Tokheim, Rachel Karchin  Cell Systems "— Presentation transcript:

1 CHASMplus Reveals the Scope of Somatic Missense Mutations Driving Human Cancers 
Collin Tokheim, Rachel Karchin  Cell Systems  Volume 9, Issue 1, Pages 9-23.e8 (July 2019) DOI: /j.cels Copyright © 2019 The Authors Terms and Conditions

2 Cell Systems 2019 9, 9-23.e8DOI: (10.1016/j.cels.2019.05.005)
Copyright © 2019 The Authors Terms and Conditions

3 Figure 1 Overview of CHASMplus See also Figure S1 and Table S1.
(A) CHASMplus, a machine learning approach, was applied to somatic missense mutations in tumors from 32 different cancer types found in The Cancer Genome Atlas (TCGA). High scores predicted by CHASMplus indicate stronger evidence for a mutation being a cancer driver. (B) CHASMplus has a model for each cancer type in the TCGA to identify driver missense mutations specific to a cancer type. Putative driver missense mutations are identified at a false discovery rate (FDR) threshold of 1%. In the example, the mutation was statistically significant for the BRCA model but not for the UVM model. Abbreviations: breast invasive carcinoma (BRCA), uveal melanoma (UVM), and cumulative distribution function (CDF). Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

4 Figure 2 Assessment of Pan-cancer and Cancer-Type-Specific Predictions
See also Figure S2 and Table S2. (A) Receiver operating characteristic (ROC) curve for identifying activating mutations in MCF10A cells, a breast epithelium line, with breast cancer-specific models. (B) ROC curve for discerning literature annotated oncogenic mutations from OncoKB for specific cancer types: breast invasive ductal carcinoma (BRCA), glioblastoma multiforme (GBM), high-grade serous ovarian cancer (OV), and colon adenocarcinoma (COAD). (C) Score distribution for cancer-type-specific models (labeled by title) for TCGA missense mutations in EGFR, found in either lung adenocarcinoma (LUAD; typically, kinase domain mutation) or glioblastoma multiforme (GBM; typically, extracellular domain mutations). Box plots show quartiles, with whiskers defined according to Tukey's criterion. (D) Overview of 5 pan-cancer benchmarks by scale of evaluation and type of supportive evidence. (E) Pan-cancer performance measured by the area under the receiver operating characteristic curve (auROC) on the 5 mutation-level benchmarks (shown in text). The color scale from red to blue indicates methods ranked from high to low performance. Benchmarks are categorized by in vitro (green), in vivo (yellow), and literature-based benchmarks (turquoise). Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

5 Figure 3 Frequency Spectrum of Driver Missense Mutations
See also Figures S3 and S4 and Tables S3, S4, and S5. (A) Proportion of the overall frequency of driver missense mutations found to be rare (<1% of samples or singleton mutations), intermediate (1%–5%), and common (>5%), correspondingly shown as light to dark blue. (B) Correlation between tumor mutation burden and overall driver prevalence (number of driver mutations per tumor) across the frequency spectrum of drivers (rare, intermediate, and common). Shaded area indicates the 95% confidence interval. (C) Analysis of genes containing predicted driver missense mutations that preferentially occurred in a subtype-specific manner (chi-square test; q < 0.1). The pie chart illustrates the percentage of genes for all cancer types with a significant subtype-specific pattern, while the heatmap illustrates significant genes for breast invasive carcinoma. (D) Structure of the phosphatase 2A holoenzyme (PDB: 2IAE). (E) Structures of the ERBB2 extracellular domain (left; PDB: 2A91) and kinase domain (right; PDB: 3PP0). (F) Lollipop plot of driver missense mutations identified by CHASMplus (yellow), and likely truncating variants (frameshift insertion or deletion: purple; nonsense mutation: red; and splice site mutation: orange) in CASP8 for head and neck squamous cell carcinoma (HNSC). TCGA acronyms for cancer types are listed in STAR Methods. Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

6 Figure 4 Inferred Immune Cell Content and Gene Expression Correlate with Presence of Predicted Driver Missense Mutations in CASP8 See also Figure S5. Twelve immune-related biomarkers are shown, as estimated in head and neck squamous cell carcinoma (HNSC) tumor samples from TCGA (Thorsson et al., 2018). Each panel compares the distribution of a marker in samples harboring a rare driver missense mutation (orange), a truncating mutation (purple), and control samples with no CASP8 mutations (green). Both tumor samples with rare driver missense mutations and truncating mutations showed a similar significantly elevated inferred immune cell infiltration. Top row: inferred immune infiltrates from DNA methylation or gene expression from TCGA HNSC samples (Thorsson et al., 2018). Bottom row: gene expression values from RNA-Seq for several important immune-related genes reported in (Thorsson et al., 2018). “mis” indicates samples with driver missense mutations identified by CHASMplus, and “lof” is likely loss-of-function variants (nonsense, frameshift insertion or deletions, splice site, translation start site, and nonstop mutations). Mann-Whitney U test, adjusted p value (Benjamini-Hochberg method): ∗ < 0.05, ∗∗ < 0.01, and ∗∗∗ < Box plots show quartiles, with whiskers defined according to Tukey's criterion. Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

7 Figure 5 CHASMplus Predictions Correlate with Multiplexed Functional Assays in PTEN See also Table S6. (A) Heatmap displaying gene-weighted CHASMplus scores (gwCHASMplus) of PTEN missense mutations. (B and C) (B) Scores are negatively correlated with PTEN lipid phosphatase activity (Spearman rho = −0.520) (Mighell et al., 2018) and (C) PTEN protein abundance (Spearman rho = −0.428) (Matreyek et al., 2018). (D) Both gwCHASMplus correlations had a larger absolute value than the correlation between the two experiments (Spearman rho = 0.351). ∗∗∗ = p < (E and F) Distribution of (E) PTEN lipid phosphatase activity or (F) protein abundance in predicted driver missense mutations from TCGA (common: >5% of tumor samples; intermediate: 1%–5%; and rare: <1%), all other missense mutations and truncating mutations. Box plots show quartiles, with whiskers defined according to Tukey's criterion. (G) Comparison of CHASMplus to the 2nd and 3rd ranked methods in Figure 2E. Left: specificity of methods at identifying PTEN missense mutations that do not lower lipid phosphatase activity. Right: sensitivity (recall), precision, and F1 score for identifying missense mutations that lower lipid phosphatase activity. CHASMplus had the highest specificity, precision, and F1 score. Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

8 Figure 6 Characteristics and Trajectory of Missense Mutation Driver Discovery See also Figure S7 and Table S7. (A) Plot displaying normalized driver diversity and driver prevalence (fraction of tumor samples mutated) for driver missense mutations in 32 cancer types. K-means clustering identified 5 clusters with centroids shown as numerically designated circles. (B) Prevalence of driver missense mutations identified by CHASMplus as a function of sample size. Lines represent LOWESS fit to different rarities of driver missense mutations. TCGA acronyms for cancer types are listed in the STAR Methods. Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions

9 Figure 7 Hotspot Detection Alone Has Limited Statistical Power to Identify Driver Mutations (A) Statistical power to detect a significantly elevated number of non-silent mutations in an individual codon, as a function of sample size and mutation rate. Circles represent each cancer type from the TCGA and are placed by sample size and median mutation rate. Curves are colored by the frequency of driver mutations (fraction of non-silent mutated cancer samples above background). If a circle is below a curve, then hotspot detection is not yet sufficiently powerful to detect driver mutations of that frequency. (B) Bar graph comparing power (sensitivity) to detect labeled oncogenic driver missense mutations from OncoKB between CHASMplus and the cancer hotspots method (Chang et al., 2016). Stratification by TP53 suggests that the increased power provided by CHASMplus is not solely a result of high performance on oncogenic TP53 mutations. Cell Systems 2019 9, 9-23.e8DOI: ( /j.cels ) Copyright © 2019 The Authors Terms and Conditions


Download ppt "Collin Tokheim, Rachel Karchin  Cell Systems "

Similar presentations


Ads by Google