Predictive sub-typing of subjects Retrospective and prospective studies Exploration of clinico-genomic data Identify relevant gene expression patterns Issues in Bayesian Tree Modeling of Clinical and Gene Expression Data
Current Areas of Application Breast Cancer lymph node status disease recurrence Ovarian Cancer tumor location
Lymph Node Involvement Is a Key Breast Cancer Risk Factor But -- lymph node dissection also carries morbidity and inaccuracy
Identifying Metagenes Associated With Lymph Node Status Tumor Sample Gene
Metagenes/ Expression Signatures Dimension reduction: Signal improvement Clustering Singular value decomposition Empirical or model-based factor analysis Characterize patterns in data
Gene Clustering
Gene Clustering (cont’d)
Factor extraction (SVD)
Differential Gene Expression
Differential Gene Expression (Threshold 1)
Differential Gene Expression (Threshold 2)
Differential Gene Expression (Threshold 3)
Nonlinear Expression
Nonlinear Expression (Threshold 1)
Nonlinear Expression (Threshold 2)
Lymph Node Metastasis Metagenes
Ovarian Tumor Site Genes
Statistical Tree Models for Clinico-Genomic Prediction Regression trees: Non-linear, interactions Recursive partitioning Retrospective studies Many trees: Model uncertainty Predictions average across trees
Binary Outcomes Retrospective Sampling LN +
Binary Outcomes: Prospective Inference from Retrospective Model
Binary Outcomes: Retrospective Model Model conditionals for predictors Nonparametric Bayes: Dirichlet model Modeling in x space – joint structure Implies Beta priors on
Growing Binary Trees Node split: Each candidate predictor:threshold pair 2x2 table: 2 Bernoulli’s, fixed columns (Y=0/1) Assess and select split, or stop Conservative Bayesian tests Multiple trees: Multiple splits at any node
Inference with Many Binary Trees Within-tree inference & prediction: Sequences of beta posteriors for Simulate: Impute Pr(Y=1|leaf) Multiple trees: Likelihood across trees Average predictions across trees Model (predictor:threshold)s uncertainty “Smoothing” classification boundaries
Binary Outcome: Lymph Node Metastasis Tumor Sample Gene Predictive trees: Nonparametric Bayes’ Metagene expression Retrospective sampling Lancet 2003 (Huang, West et al) Lancet 2003 (Huang, West et al)
Predicting Lymph Node Status With Metagenes LN+ LN- Probability of LN+ Out-of-sample cross validation Sample
Forests of Clinico-Genomic Trees Select from potential clinical and genomic predictor variables multiple trees variable combination – co-occurrence multiple subtypes
… With Metagenes and Clinical Predictors LN+ LN- Probability of LN+ Out-of-sample cross validation Sample
Lymph Node Clinico-Genomic Predictors
Predicting Ovarian Tumor Site Omentum Ovary Probability of Omentum Out-of-sample cross validation Sample
Gene Identification Implicated metagenes – gene subsets Genes correlated with key metagenes Breast Cancer – nodal metastasis: Interferon pathway/inducible gene subset Interferons mediate anti-tumor response Evidence of dysfunction of normal anti-tumor response? Ovarian Cancer – tumor site: Growth regulatory pathway/inducible gene subset Evidence of dysfunction of normal cell growth?
Ongoing Research Stochastic search (sequential,annealing) Representation of tree ‘forest’ Metagene definition/ creation Cluster implementation of tree models
Computational & Applied Genomics Program Joseph Nevins Mike West Erich Huang Ed Iversen Holly Dressman Duke University Koo Foundation-Sun Yat Sen Cancer Center Andrew Huang, Skye Cheng, Mei-Hua Tsou Department of Obstetrics and Gynecology John Lancaster Andrew Berchuck
Growing Binary Trees (2x2) ?