1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.

Slides:



Advertisements
Similar presentations
1 Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays Hsun-Hsien Chang 1, Michael McGeachie 1,2 1 Children’s Hospital.
Advertisements

Most Random Gene Expression Signatures are Significantly Associated with Breast Cancer Outcome Venet, et al. PLoS Computational Biology, 2011 Molly Carroll.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University Microarray Data.
1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1.
Expression profiles for prognosis and prediction Laura J. Van ‘t Veer The Netherlands Cancer Institute, Amsterdam.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Part II: Discriminative Margin Clustering Joint work with: Rob Tibshirani, Dept of Statistics Patrick O. Brown, School of Medicine Stanford University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
. Differentially Expressed Genes, Class Discovery & Classification.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Analysis of microarray data
Comprehensive Gene Expression Analysis of Prostate Cancer Reveals Distinct Transcriptional Programs Associated With Metastatic Disease Kevin Paiz-Ramirez.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
SIGMA: A Platform to Visualize and Analyze DNA Copy Number Microarray Data Raj Chari, PhD Student BC Cancer Research Centre Department of Cancer Genetics.
Whole Genome Expression Analysis
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu.
The Broad Institute of MIT and Harvard Classification / Prediction.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Gene Expression Signatures for Prognosis in NSCLC, Coupled with Signatures of Oncogenic Pathway Deregulation, Provide a Novel Approach for Selection of.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University DATA WAREHOUSE FOR BIO-GEO HEALTH CARE.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Examples of Classifying Expression Data / 7.90 Computational Functional Genomics Spring 2002.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Consensus Group Stable Feature Selection
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
I NSTITUTE for G ENOMIC B IOLOGY Nathan D. Price Department of Chemical and Biomolecular Engineering Center for Biophysics and Computational Biology Institute.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Claudio Lottaz and Rainer Spang
Loyola Marymount University
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Somi Jacob and Christian Bach
Loyola Marymount University
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Loyola Marymount University
Loyola Marymount University
Claudio Lottaz and Rainer Spang
Presentation transcript:

1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009

2 Harvard Medical School Background Microarray technology enables profiling expression of thousands of genes in parallel on a single chip. Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis. Challenge: –Number of variables (i.e., genes) is much greater than the number observations (i.e., biological samples), inducing the problem of overfitting. Existing methods: – Gene selection: compute statistics (eg., t-statistics, SNR, PCA) of individual genes and select high rank genes. – Classification model: create a classification function of selected genes.

3 Harvard Medical School Proposed Approach Issues: –Assumption on gene independencies is inadequate. –Other genes may be collinearly expressed with the signature. –Selection and classification are two non-integrated steps. Need a cut-off threshold to select high rank genes. Proposed strategies: –Adopt system biology approach to infer the functional dependence among genes. –Use the dependence network for tissue discrimination. –Integrate gene selection and classification model in Bayesian network framework.

4 Harvard Medical School Data Representation by Bayesian Network Gene 1 Gene 2 Gene N Case Case Tissue state 1 Case M Tissue state 2 G1G1 Pheno G2G2 GNGN Bayesian networks are directed acyclic graphs where: –Node corresponds to random variables. –Directed arcs encode conditional probabilities of the target nodes on the source nodes.

5 Harvard Medical School Gene Selection by Bayes Factor Pheno G1G1 G2G2 GNGN GpGp GqGq G1G1 G2G2 GNGN gene selection by Bayes factor

6 Harvard Medical School Collinearity Elimination via Network Learning Pheno G1G1 G2G2 GNGN GpGp GqGq G2G2 GNGN GpGp GqGq G1G1 GpGp GNGN collinearity elimination

7 Harvard Medical School Sample Classification The phenotype variable is independent of the blue genes, given the green genes. Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination. Tissue classification: GNGN Pheno G2G2 GpGp GqGq G1G1

8 Harvard Medical School Algorithm Summary Gene Selection by Bayes Factor Collinearity Elimination Sample Classification Optimize Performance Optimize Hyperparameters (sensitivity analysis)......

9 Harvard Medical School Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer: –AC and SCC are distinct in survival, chances of metastasis, and responses to chemotherapy and targeted therapy. –Physicians lack confidence in correct recognition when there are multiple primary carcinomas. Training: –58 ACs and 53 SCCs. –77 genes selected in the network. –25 signature genes. Discriminate Lung Carcinoma Subtypes

10 Harvard Medical School Bayesian Network for Lung Carcinoma

11 Harvard Medical School Large-Scale Testing on Independent Samples 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese). Accuracy = 95.2% AUROC.

12 Harvard Medical School Comparisons with Other Popular Methods Higher classification accuracy. Small-sized signature to avoid overfitting. Testing AUROC p-value # signature genes Bayesian Network95.2%---25 PCA/LDA91.2% PAM (Tibshirani et al., PNAS 2002) 91.0% Weighted Voting (Golub et al., Science 1999) 93.4%

13 Harvard Medical School KRT6 Family Characterizes the Lung Carcinoma Discrimination

14 Harvard Medical School KRT6 Family Characterizes the Lung Carcinoma Discrimination Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes. –Accounting for 95% of the accuracy of the whole 25-gene signature. –Located on chromosome 12q12-q13. –A nonlinear, concave discriminative surface.

15 Harvard Medical School Verification by Chr12q12-q13 Aberrations Investigate DNA copy number changes in comparative genomic hybridization (CGH) array. –12 ACs and 13 SCCs from Vrije University Medical Center, Netherland. –A dumbbell discriminative surface achieves 80% classification accuracy. –Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.

16 Harvard Medical School Conclusion Reverse engineer regulatory network information for tissue classification. Adopt the system biology approach to infer gene dependencies network. –Select genes by Bayes factor. –Eliminate collinearity via network learning. –Integrate gene selection and classification model in a single Bayesian network framework. Demonstrate the promising translational value of the system biology approach in clinical study.