Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.

Similar presentations


Presentation on theme: "1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT."— Presentation transcript:

1 1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009

2 2 Harvard Medical School Background Microarray technology enables profiling expression of thousands of genes in parallel on a single chip. Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis. Challenge: –Number of variables (i.e., genes) is much greater than the number observations (i.e., biological samples), inducing the problem of overfitting. Existing methods: – Gene selection: compute statistics (eg., t-statistics, SNR, PCA) of individual genes and select high rank genes. – Classification model: create a classification function of selected genes.

3 3 Harvard Medical School Proposed Approach Issues: –Assumption on gene independencies is inadequate. –Other genes may be collinearly expressed with the signature. –Selection and classification are two non-integrated steps. Need a cut-off threshold to select high rank genes. Proposed strategies: –Adopt system biology approach to infer the functional dependence among genes. –Use the dependence network for tissue discrimination. –Integrate gene selection and classification model in Bayesian network framework.

4 4 Harvard Medical School Data Representation by Bayesian Network Gene 1 Gene 2 Gene N Case 1............ Case 2........ Tissue state 1 Case M Tissue state 2 G1G1 Pheno G2G2 GNGN............ Bayesian networks are directed acyclic graphs where: –Node corresponds to random variables. –Directed arcs encode conditional probabilities of the target nodes on the source nodes.

5 5 Harvard Medical School Gene Selection by Bayes Factor Pheno G1G1 G2G2 GNGN GpGp GqGq G1G1 G2G2 GNGN............ gene selection by Bayes factor

6 6 Harvard Medical School Collinearity Elimination via Network Learning Pheno G1G1 G2G2 GNGN GpGp GqGq G2G2 GNGN GpGp GqGq G1G1 GpGp GNGN collinearity elimination

7 7 Harvard Medical School Sample Classification The phenotype variable is independent of the blue genes, given the green genes. Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination. Tissue classification: GNGN Pheno G2G2 GpGp GqGq G1G1

8 8 Harvard Medical School Algorithm Summary Gene Selection by Bayes Factor Collinearity Elimination Sample Classification Optimize Performance........................ Optimize Hyperparameters (sensitivity analysis)......

9 9 Harvard Medical School Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer: –AC and SCC are distinct in survival, chances of metastasis, and responses to chemotherapy and targeted therapy. –Physicians lack confidence in correct recognition when there are multiple primary carcinomas. Training: –58 ACs and 53 SCCs. –77 genes selected in the network. –25 signature genes. Discriminate Lung Carcinoma Subtypes

10 10 Harvard Medical School Bayesian Network for Lung Carcinoma

11 11 Harvard Medical School Large-Scale Testing on Independent Samples 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese). Accuracy = 95.2% AUROC.

12 12 Harvard Medical School Comparisons with Other Popular Methods Higher classification accuracy. Small-sized signature to avoid overfitting. Testing AUROC p-value # signature genes Bayesian Network95.2%---25 PCA/LDA91.2%0.004713 PAM (Tibshirani et al., PNAS 2002) 91.0%0.001477 Weighted Voting (Golub et al., Science 1999) 93.4%0.6240800

13 13 Harvard Medical School KRT6 Family Characterizes the Lung Carcinoma Discrimination

14 14 Harvard Medical School KRT6 Family Characterizes the Lung Carcinoma Discrimination Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes. –Accounting for 95% of the accuracy of the whole 25-gene signature. –Located on chromosome 12q12-q13. –A nonlinear, concave discriminative surface.

15 15 Harvard Medical School Verification by Chr12q12-q13 Aberrations Investigate DNA copy number changes in comparative genomic hybridization (CGH) array. –12 ACs and 13 SCCs from Vrije University Medical Center, Netherland. –A dumbbell discriminative surface achieves 80% classification accuracy. –Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.

16 16 Harvard Medical School Conclusion Reverse engineer regulatory network information for tissue classification. Adopt the system biology approach to infer gene dependencies network. –Select genes by Bayes factor. –Eliminate collinearity via network learning. –Integrate gene selection and classification model in a single Bayesian network framework. Demonstrate the promising translational value of the system biology approach in clinical study.


Download ppt "1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT."

Similar presentations


Ads by Google