Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu.

Slides:



Advertisements
Similar presentations
Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
CS188: Computational Models of Human Behavior
A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Data Mining Classification: Naïve Bayes Classifier
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
By Russell Armstrong Supervisor Mrs Wei Ji Diagnosis Analysis of Lung Cancer by Genome Expression Profiles.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bayesian Networks II: Dynamic Networks and Markov Chains By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Artificial Intelligence and Lisp Lecture 7 LiU Course TDDC65 Autumn Semester, 2010
Using Bayesian Networks to Predict Water Quality in Sydney Harbour Final Presentation Name: Shannon Watson Supervisors: Ann Nicholson & Charles Twardy.
An Introduction to DNA Microarrays Jack Newton University of Alberta
Bayesian Networks Alan Ritter.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Daphne Koller Bayesian Networks Application: Diagnosis Probabilistic Graphical Models Representation.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Bayesian Networks. Male brain wiring Female brain wiring.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
Appendix: The WEKA Data Mining Software
1 Decision tree based classifications of heterogeneous lung cancer data Student: Yi LI Supervisor: Associate Prof. Jiuyong Li Data: 15 th May 2009.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Classification And Bayesian Learning
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
Classification with Gene Expression Data
Qian Liu CSE spring University of Pennsylvania
Results for all features Results for the reduced set of features
Evaluating classifiers for disease gene discovery
Bayesian Classification
INTRODUCTION TO Machine Learning
Data Mining Classification: Alternative Techniques
NRES 746: Laura Cirillo, Cortney Hulse, Rosie Perash
Forest Learning from Data
Somi Jacob and Christian Bach
INTRODUCTION TO Machine Learning
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu

Contents Aims Microarrays Bayesian Networks Classification Methodology Results

Aims and Goals Investigate suitability of Bayesian Networks for analysis of Microarray data Apply Bayesian learning on Microarray data for classification Comparison with other classification techniques

Microarrays Array of microscopic dots representing gene expression levels Gene expression is the process of DNA genes being transcribed into RNA Short sections of genes attached to a surface such as glass or silicon Treated with dyes to obtain expression level

Challenges of Microarray Data Very large number of variables, low number of samples Data is noisy and incomplete Standardisation of data format ◦ MGED – MIAME, MAGE-ML, MAGE-TAB ◦ ArrayExpress, GEO, CIBEX

Bayesian Networks Represents conditional independencies of random variables Two components: ◦ Directed Acyclic Graph (DAG) ◦ Probability Table

Methodology Create a program to test accuracy of classification ◦ Written in MATLAB using Bayes Net Toolbox (Murphy, 2001), and Structure Learning Package (Leray, 2004) ◦ Uses Naive network structure, K2 structure learning, and pre- determined structure Test program on synthetic data Test program using real data Comparison of Bayes Net and Decision Tree

Synthetic Data Data created from well-known Bayesian Network examples ◦ Asia network, car network, and alarm network Samples generated from each network Tested with naive, pre-known structure, and with structure learning

Synthetic Data - Results Asia Network Lauritzen and Spiegelhalter, ‘Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems’, 1988, pg 164 Correct Naive81.0% K2 Learning83.4% Known Graph85.0% 50 Samples, 10 Folds, 100 Iterations Class Node: Dyspnoea Correct Naive83.1% K2 Learning84.3% Known Graph85.1% 100 Samples, 10 Folds, 50 Iterations Class Node: Dyspnoea

Synthetic Data - Results Correct Naive53.5% K2 Learning58.3% Known Graph62.4% Car Network Heckerman, et al, ‘Troubleshooting under Uncertainty’, 1994 pg Samples, 10 Folds, 100 Iterations Class Node: Engine Starts Correct Naive56.5% K2 Learning58.7% Known Graph61.2% 100 Samples, 10 Folds, 50 Iterations Class Node: Engine Starts

Synthetic Data - Results ALARM Network 37 Nodes, 46 Connections Beinlich et al, ‘The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks’, 1989 Correct Naive72.4% K2 Learning78.7% Known Graph89.6% 50 Samples, 10 Folds, 10 Iterations Class Node: InsufAnesth Correct Naive69.0% K2 Learning77.8% Known Graph93.6% 50 Samples, 10 Folds, 10 Iterations Class Node: Hypovolemia

Lung Cancer Data Set Publically available data sets: ◦ Harvard: Bhattacharjee et al, ‘Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses’, 2001  11,657 attributes, 156 instances, Affymetrix ◦ Michigan: Beer et al, ‘Gene-Expression Profiles Predict Survival of Patients with Lung Adenocarcinoma’, 2002  6,357 attributes, 96 instances, Affymetrix ◦ Stanford: Garber et al, ‘Diversity of Gene Expression in Adenocarcinoma of the Lung’, 2001  11,985 attributes, 46 instances, cDNA  Contains missing values

Feature Selection Li (2009) provides a feature-selected set of 90 attributes ◦ Using WEKA feature selection ◦ Also allows comparison with Decision Tree based classification Discretised data in 3 forms ◦ Undetermined values left unknown ◦ Undetermined values put into either category – two category ◦ Undetermined values put into another category – three category WEKA: Ian H. Witten and Eibe Frank, ‘Data Mining: Practical machine learning tools and techniques’, 2005.

Harvard Set Harvard Training on Michigan Harvard Training on Stanford MATLABWEKADTDT 2-Cat -> 2-Cat NF95 (99.0%) 2-Cat -> 2-Cat F94 (97.9%)93 (96.9%)92 (95.8%) 3-Cat -> 3-Cat NF94 (97.9%)95 (99.0%)94 (97.9%) 3-Cat -> 3-Cat F88 (91.7%)95 (99.0%)94 (97.9%) MATLABWEKADT 2-Cat -> 2-Cat NF41 (89.1%)46 (100%)43 (93.5%) 2-Cat -> 2-Cat F41 (89.1%)45 (97.8%)36 (78.3%) 3-Cat -> 3-Cat NF41 (89.1%)46 (100%)42 (91.3%) 3-Cat -> 3-Cat F41 (89.1%)46 (100%)42 (91.3%)

Michigan Set Michigan Training on Harvard Michigan Training on Stanford MATLABWEKADTDT 2-Cat -> 2-Cat NF150 (96.2%)154 (98.7%)153 (98.1%) 2-Cat -> 2-Cat F144 (92.3%)153 (98.1%)150 (96.2%) 3-Cat -> 3-Cat NF145 (92.9%)153 (98.1%) 3-Cat -> 3-Cat F140 (89.7%)152 (97.4%)153 (98.1%) MATLABWEKADT 2-Cat -> 2-Cat NF41 (89.1%)46 (100%)41 (89.1%) 2-Cat -> 2-Cat F41 (89.1%)46 (100%)40 (87.0%) 3-Cat -> 3-Cat NF41 (89.1%)45 (97.8%)39 (84.8%) 3-Cat -> 3-Cat F41 (89.1%)46 (100%)39 (84.8%)

Stanford Set Stanford Training on Harvard Stanford Training on Michigan MATLABWEKADTDT 2-Cat -> 2-Cat NF139 (89.1%)153 (98.1%)139 (89.1%) 2-Cat -> 2-Cat F139 (89.1%)150 (96.2%)124 (79.5%) 3-Cat -> 3-Cat NF139 (89.1%)150 (96.2%)154 (98.7%) 3-Cat -> 3-Cat F139 (89.1%)150 (96.2%)152 (97.4%) MATLABWEKADT 2-Cat -> 2-Cat NF86 (89.6%)95 (99.0%)86 (89.6%) 2-Cat -> 2-Cat F86 (89.6%)92 (95.8%)72 (75.0%) 3-Cat -> 3-Cat NF86 (89.6%)95 (99.0%)94 (97.9%) 3-Cat -> 3-Cat F86 (89.6%)95 (99.0%)91 (94.8%)

Future Work Use structure learning for Bayesian Classifiers Increase of homogeneous data Other methods of classification