Transcriptional heterogeneity of breast cancer subtypes, a novel chemotherapy sensitivity marker Tingting Jiang, Weiwei Shi, Sophia Ononye, Gang Han, Vikram Wali, Lajos Pusztai, Christos Hatzis Yale Cancer Center, Yale School of Medicine, Yale University New Haven, CT 06520 Transcriptional Heterogeneity Within Breast Cancer Subtypes and Chemotherapy Response Groups Abstract Intrinsic breast cancer subtypes although transcriptionally uniform are genomically and clinically heterogeneous. However, no formal statistical methods have been applied to measure heterogeneity. Here we used the mean pairwise dispersion distance to capture the global transcriptional heterogeneity of different groups and to assess if it differs in different subtypes and if it correlates with chemotherapy response. Furthermore, we evaluated the heterogeneity of genomic aberrations within subtypes and found that it was highly concordant with the transcription results. We found that basal-like is the most heterogeneous subtype while luminal A the least and that triple negative tumors with residual disease (RD) are more heterogeneous than those with pathological complete response (pCR). Results Figure 3A-3D Distribution of transcriptional heterogeneity within KEGG pathways for different BC subtypes: Basal, HER2, luminal B and luminal A. Figure 3E. Mean dispersion distance of KEGG pathways in different chemotherapy response groups within TNBC breast cancer. Figure 3F: Heat-map of least heterogeneous pathway: Ribosome metabolism in basal-like breast cancer. Figure 3G: Heat-map of most heterogeneous pathway: Linoleic acid metabolism in basal-like breast cancer. Figure 3H: Heat-map of mean pairwise dispersion distance of KEGG pathways in different subtypes. Simulation Study For Metric Selection B) A) Breast Cancer Subtype Chemotherapy response in TNBC A) Patient 1-S 2-S 4-S 40-S Through the same bootstrapping procedure, we calculated mean pairwise dispersion distance of different subtypes with respect only to genes within each pathway. When considering the four subtypes together, most pathways showed similar heterogeneity as the global transcription profile (Figure 3G). In general, basic biological process such as ribosome pathway (Figure 3E) were homogenous among all subtypes while signaling and metabolism, such as linoletic acid metabolism in basal like breast cancer were the most heterogeneous (Figure 3G). Similarly most pathways (35 out of 50) were transcriptionally more heterogeneous in RD than in pCR, as observed with the global transcription profiles. Gene 1-U 2-U 4-U 40-U Background Transcriptional profiles define 4 major distinct and clinically relevant breast cancer subtypes: luminal A, luminal B, basal-like and HER2 positive cancers. These tumors differ in the expression of estrogen receptor (ER), progesterone receptor (PR) and Human Epidermal Growth Factor Receptor -2 (HER2) and also show significant differences in the distribution of various mutations and other DNA sequence abnormalities. However, further clinical and molecular heterogeneity exist within each of these subtypes. We assume that the molecular complexity of a tumor biopsy can be reflected in the gene expression measurements since most copy number alterations and functionally active mutations lead to alterations in gene expression. The goal of this study is to develop a metric to capture the global transcriptional heterogeneity of different groups and to compare that in different subtypes and in different clinical outcome groups. Basal HER2 LumA LumB RD pCR Figure 2A. Transcriptional heterogeneity of different BC subtypes based on the mean pairwise dispersion distance. One hundred tumors from each subtype were bootstraped with 500 replications. Figure 2B. Transcriptional heterogeneity in triple negative breast tumors (TNBC) with complete (pCR) or partial or no (RD) response after neoadjuvant chemotherapy. Thirty cases from each group were bootstrapped for 500 replications. Genomic Heterogeneity with TCGA Datasets B) Mean of Pearson Mean of Cosine Mean of Dispersion 1-S 1-U 2-S 2-U 4-S 4-U 40-S 40-U 100-0 90-10 80-20 70-30 60-40 50-50 Somatic Mutations Copy Number Variation As Figure 2A shows, basal-like breast cancer have the highest intertumor transcriptional heterogeneity, while luminal A tumors were significantly more homogeneous as a group (p-value <2.2*10^-16). Within TNBC, tumors with residual disease (RD) after taxane-containing treatment were significantly more diverse transcriptionally compared to those who achieved pathological complete response (pCR) (pvalue <2.2*10^-16) (Figure 2B). Pathway-Specific Intertumor Transcriptional Heterogeneity Basal B) HER2 F) A) Basal HER2 LumA LumB Basal HER2 LumA LumB Materials and Methods Gene expression profiles from 923 breast tumors generated on the Affymetrix U133A platform were compiled from public databases, normalized by MAS5, and processed by surrogate variable analysis (SVA) to remove batch effects. Probe sets in the lowest expression quartile and with < 10% median absolute deviation were removed. We also collected three datasets with different types (gene expression by Agilent 244k array, somatic mutation by exome-sequencing, and copy number variation by Affymetrix 6.0 SNP arrays) as processed data for breast cancer patients from TCGA website (https://tcga-data.nci.nih.gov/docs/publications/brca_2012). The sample size for three datasets were 547, 466 and 463 respectively and 463 samples are available on all three platforms. Subtype was assigned by PAM50 classifier. Inter-tumor transcriptional heterogeneity within a group of tumors was estimated by the mean or the median of all pairwise distances in the group. Pairwise similarity of tumor profiles was assessed by three distance metrics (Pearson, dispersion and cosine). Distances were calculated based on all probe sets or on probe sets within a specific pathway. We bootstrapped within each group to estimate the distribution of heterogeneity index and compared them between subtypes and different chemotherapy response groups. Figure 4: Genomic heterogeneity within BC subtypes based on the mean pairwise Hamming distance of somatic mutation and copy number variation (CNV) profiles of tumors from TCGA. To calculate distribution of intratumor heterogeneity within subtypes, 30 tumors from each subtype were bootstrapped for 500 replications. Figure 1A. Heatmaps of eight scenarios with different intra-tumor complexity and population composition (rows are genes and columns are tumors) . Figure 1B. Evaluation of different heterogeneity metrics with different number of latent groups and different mixing proportion of 2 latent groups. To compare intertumor heterogeneity in genomic aberrations within subtypes, we evaluated the Hamming distance with simulated data to show that it successfully reflects population complexity for binarized data. Both genomic aberration profiles confirmed that luminal A is the most homogenous subtype. Basal tumors appear the most heterogeneous in terms of CNV (Figure 4B), with the relative effects resembling those from transcriptional profiles (Figure 2A). HER2 tumors appears to be the most diverse in terms of their somatic mutation patterns. Ribosome Pathway LumB LumA C) D) G) Eight predefined scenarios representing different transcriptional profile complexity patterns and subgroup composition were generated using R package UMPIRE 1.2.4 (Figure 1A). In Figure1A, from left to right, each heatmap represents 40 samples with 1, 2, 4, 40 even-sized subgroups generated from the same gene pattern with either strong (S) or weak (U) within-group intergene correlation. For each scenario, the mean and median pairwise distances for the three different metrics (Pearson, Cosine and dispersion) were calculated 500 times from simulated data generated independently. Figure 1B shows that the mean pairwise dispersion distance is able to track increasing group heterogeneity with increasing number of latent groups, lower level of inter-gene correlation and increasing mixing proportion of 2 latent groups. Based on these results, we selected the mean pairwise dispersion distance of gene expression profiles in a group of cancers as the estimator of the intrer-group transcriptional heterogeneity. Conclusions The mean pairwise dispersion distance is able to characterize within group transcriptional heterogeneity. BC subtypes exhibit consistent order of inter-tumor heterogeneity of global transcriptome from two platforms: Basal> HER2 positive> Luminal B> Luminal A. Chemosensitive TNBC tumors appear to be transcriptionally more homogeneous than those with partial or no response. There exits different degree of heterogeneity in pathway-related expression in different subtypes and in different chemotherapy response groups. The relative intertumor heterogeneity in the different BC subtypes is similar when based on transcriptional profiles, CNV or somatic mutations. Linoletic Acid Metabolism Chemotherapy response E) H) Pathway pCR Basal HER2 LumA LumB RD