Head and Neck Cancer: microRNA analysis Amy Li Monti Lab Rotation Boston University 11/25/13
Dataset Head and Neck Cancer Dataset from The Cancer Genome Atlas (TCGA) Contains large and well-documented data in many cancer subtypes DNA-methylation, SNP Array, RNA-seq, miRNA-seq, low pass DNA-seq, Reverse Phase Protein Array Normalized miRNAseq data 463 samples 39 patients: tumor tissue and adjacent normal tissue 385 patients: tumor tissue only 1046 miRNA Clinical information 360 samples, 71 clinical attributes (ie. anatomic subdivision, gender, grade, race, stage)
Goals Identify miRNA markers of: Cancer status: Normal vs. tumor Cancer progression: Differentially expressed in each stage or grade Integrate miRNA expression with gene expression data Gene set enrichment in miRNA targets mRNA data (Vinay)
Tumor Progression Classification Tumor Grade: Assigned based on how abnormal the tumor cells looks under a microscope Ranges from G1 (well-differentiated) to G4 (undifferentiated) Well differentiated tumor cells from a lower grade resemble normal cells, tend to spread slowly, and is generally indicative of better prognosis Tumor Stage: Based on size or extent (reach) of the primary tumor http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade
Analysis Overview Exploratory data analysis Unsupervised clustering Mean vs. standard deviation Boxplots of miRNA expression Data filtering Clinical demographics Unsupervised clustering Heatmaps Fisher test for association between clusters and sample attribute assignment (tumor status, grade, stage, etc) Tests for confounders Association between grade and other attributes, ie: ethnicity, gender, smoking history, age, alcohol consumption Differential Analysis Look for differentially expressed genes with respect to grade or stage miRNA targets and Gene Set Enrichment Analysis Identify sets of miRNA targets and see whether such gene sets are enriched with respect to disease phenotype
Exploratory Data Analysis: Mean vs. standard deviation For each gene: plot mean vs standard deviation Linear relationship
Exploratory Data Analysis: Boxplot of Expression For each gene: made boxplot of expression across the samples Reordered based on median
Exploratory Data Analysis: Data filtering Sample filtering: Samples without clinical labels Gene filtering: Lowly expressed genes Row maximum Genes with constant expression Standard deviation Full Matrix 1046 × 463 Filtered Matrix 692 × 393
Exploratory Data Analysis: Clinical Demographics Grade: G1: well-differentiated G4: Undifferentiated
Exploratory Data Analysis: Clinical Demographics Oral: oral cavity, oral tongue,, bucal mucosa
Exploratory Data Analysis: Clinical Demographics
Unsupervised Clustering: Paired samples: Tumor vs. Adjacent Normal 38 1 Tumor 2 37 Fisher Test: Test for association between Cluster Assignment and Actual Class Label P-val ~ 0 Row: pearson, ward Col: pearson, ward
Unsupervised Clustering: Grades: G1, G2, G3, G4 Fisher Test: Tested for association between grades and cluster assignments for total number of clusters ranging from 2 to 5 P-vals not significant in all cases Row: pearson, ward Col: euclidean, ward
Tests for confounders Tested for association between grade and the putative confounding variable using Fisher test (discrete variables) or ANOVA (continuous variables) Ethnicity (p=0.57), race (p=0.84), gender (p=0.09), age (p=0.55), alcohol consumption (p=0.63) Correct gene expression for gender using a linear regression model prior to performing differential analysis
Data Processing for Differential Analysis Sample Filtering Removed samples without clinical labels Removed samples sequenced on IlluminaGA (kept IlluminaHiseq samples) Removed samples with minority races (kept “white”) Gene filtering Removed miRNAs with low expression (90% quantile < 100) Removed miRNAs with constant expression (sd < 0.1) Attribute Filtering Grade: Removed GX and “Not Available” Kept G1, G2, G3, G4 Stage: Removed “Not Available” Kept S1, S2, S3, S4A, S4B
Differential Analysis: Grade diffAnal.R Performs permutation tests to identify significant genes differentially regulated in one of two classes Normalized expression matrix corrected for gender Class label: grade attribute binarized to “low” vs. “high” Run diffAnal for each high vs. low cutoff: G0 (adjacent normal) vs. G1-G4 (tumor) G1 vs. G2-G4 G1-G2 vs. G3-G4 G1-G3 vs. G4
Differential Analysis: Grade : G0 vs. G1-G4
Differential Analysis: Grade : G1 vs. G2-G4
Differential Analysis: Grade : G1-G2 vs. G3-G4
Differential Analysis: Grade : G1-G3 vs. G4
Differential Analysis: Trends Found more significant markers for tumors vs. normal than for distinguishing between low and high grades Performed same analysis for stage, significant markers for stage are weaker than that of grade For both grade and stage, most significant markers found by diffAnal show upregulation in the later disease state.
Differential Analysis: Tumor Classification Marker 148 total genes (90% quantile > 100) used for diffAnal 65 significant genes upregulated in tumors 37 significant genes downregulated in tumors Cutoff: FDR < 0.01
Differential Analysis: Cancer Progression Marker for Grades Cancer progression marker will satisfy ALL of: Tumor classification marker Significant FDR in 2/3 runs of diffAnal Monotonous increase or decrease across grades 4 miRNA markers identified (all are upregulated with increasing grade) G1-_vs_G2+_fdr G2-_vs_G3+_fdr G3-_vs_G4+_fdr hsa-mir-106b 0.03 0.01 0.04 hsa-mir-15b 0.05 0.81 hsa-mir-582 hsa-mir-151 0.61 -0.25 hsa-mir-196b 0.18 hsa-mir-10a 0.32 -0.96 hsa-mir-374a -0.6 hsa-mir-128-2 0.26 hsa-mir-25 0.02 hsa-mir-128-1 0.47 0.17 hsa-mir-28 0.44
Differential Analysis: Cancer Progression Marker for Grades
Finding miRNA Targets miRWalk Targetscan mirBase
Finding miRNA Targets miRNA AhR Targets Targets (54) (162) miRWalk “Validated targets” module Targets for differentially expressed miRNAs: 162 targets Intersect targets found by miRWalk with AhR targets 6 matches: NQO1, NFE2, IL1B, TNF, TGFB1, MYC miRNA Targets (162) AhR Targets (54) miRNA markers (4) (6)
Work in Progress Gene set enrichment analysis: Consider targets of strong miRNA markers as a gene set Is there an enrichment of this defined gene set in certain disease phenotypes, ie. high grade? Pathway analysis: Which pathways are these miRNA markers involved in? Modeling tumor progression: Explore other definitions of tumor progression markers