Tumor Heterogeneity: From biological concepts to computational methods Bo Li, PhD Dana Farber Cancer Institute Harvard Statistics Department
Background Tumor heterogeneity: difference between tumors What it affects: – Diagnosis – Prognosis – Selection of treatment – Drug resistance
Levels of Tumor Heterogeneity Attolini et al., 2010 Burell et al., 2013 tumor/normal mixing inter-tumor heterogeneity 3 intra-tumor heterogeneity tumor subclones
Tumor microenvironment Junttila et al., 2013, Nature
Tumor Evolution as a Darwinian Process Greaves and Maley, 2012, Nature Key facts: tumor cell population is heterogeneous tumor genome harbors somatic aberrations 5 Darwin’s notebook, 1837
Clonal expansion model 1976 Nowell, 1976, Science
Vogelstein model Fearon and Vogelstein, 1990
Key factors to study tumor heterogeneity Sampling procedures – Ideal but expensive: single cell profiling – Practical: multi-regional sampling or longitudinal sampling – Most commonly used: bulk tissue collected from end-stage tumor Data types – sequencing data on DNA or RNA – SNP array data – mRNA expression profiling – DNA methylation array, etc. Examples of large cancer studies: – The Cancer Genome Atlas (TCGA): ~10,000 samples collected from over 30 types of cancer, mostly in the US – International Cancer Genome Consortium (ICGC): ~ 11,000 samples from 50 types of cancer, worldwide 8
Sampling procedures Single cell sequencing Multi-region sampling Gerlinger et al., 2012Nawy et al., 2014
Computational inference of tumor purity Pathological inference is semi-quantitative, empirical and low throughput. Tumor purity inference – DNA copy number variation based – DNA methylation based – Gene expression based (ESTIMATE,Yoshihara et al., 2013)
DNA COPY NUMBER VARIATION BASED PURITY ESTIMATION
Two-way mixing hypothesis euploid cells AmpDel AGP ~ 7/9 9/9 6/9 7/9 7/9 3/9 CN-LOH aneuploid cells 12 AGP=Aneuploid Genome Proportion, surrogate for tumor purity
I II III IV n T =n A +n B Using allele-specific SNP array data to infer AGP Normal (AB) Amp (AAB) Deletion (A or B) Copy Neutral LOH (AA or BB) Homozygous deletion (0) Balanced amplification (AABB) High-fold amplification (AAAB) High-fold amplification (AAAAB) BAF-LRR plot log 2 (n T )-1 |0.5-n B /n T | 0% Euploid Mixing (AGP=1)40% Euploid Mixing (AGP=0.6) AB A or B AB 13 Illumina 550K data on tumor/normal pairs Intensity (logR) B Allele Frequency (BAF) n B /n T I II III IV
GBM Molecular Subtypes Glioblastoma Multiforme (GBM) is a malignant brain cancer, with median survival time ~18 months. GBM is heterogeneous among patients in histology, molecular signatures and clinical outcome. Two of the studies have attempted to classify GBM tumors into molecular subtypes. 14 Phillps et al, 2006Verhaak et al, 2010 ProneuralProliferativeMesenchymalProneuralNeuralClassicalMesenchymal Median Age of Onset Survival (month) Two classification schemes are not consistent: Proneural subtypes have different clinical and demographical features. Subtypes do not show significant survival difference in Verhaak et al..
AGP correlates with gene expression pattern Same samples analyzed by Verhaak et al, 2010, used 128 with both gene expression and SNP array data available. Li et al, 2012, Clin Can Res 15 Discovery of a new GBM subtype
Revised Classification for GBM For the remaining 108 ‘Typical’ GBMs, we performed a two- step consensus clustering, and identified three subtypes, with significant survival difference. P=
Integer rounding ASCAT ABOSLUTE Baysian inference HMM PICNIC oncoSNP MixHMM GPHMM PSCN pennCNV tumor LOH-based inference BACOM qpure Pattern recognition AttiyehGAP TAPS CHAT GenoCNA
Discussion Multiple subclones co- exist in the tumor cell population How to estimate tumor purity? What happens if a subclone has more events than the dominant clone?
DNA methylation based purity estimation Rationale: DNA methylation is dichotomized at each locus. Sites differentially methylated in tumor or normal cells are informative for purity estimation. Zheng et al., 2014, Genome Biology
Single cell analysis Low throughput: – FISH (spectrum karyotyping) – FACS (Cytof) High throughput (NGS): – DNA Genome evolution, subclonality etc – RNA Gene expression heterogeneity
DNA sequencing Navin et al., 2011, Nature
RNA sequencing Subtype clustering Patel et al., 2014, Science
Future directions Experimental design is critical. Longitudinal + multi- regional sampling Heterogeneity of the tumor microenvironment Immune infiltration Fibroblast Endothelial cells