Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005
(c) 2005 SNU CSE Biointelligence Lab2 Goals of the Project Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size. Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.
(c) 2005 SNU CSE Biointelligence Lab3 Given Bayesian Networks Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (
(c) 2005 SNU CSE Biointelligence Lab4 Example Bayesian Network Structure I
(c) 2005 SNU CSE Biointelligence Lab5 Example Bayesian Network Structure II
(c) 2005 SNU CSE Biointelligence Lab6 *.dsc Files Node name Possible states Parents Child Conditional probability distribution
(c) 2005 SNU CSE Biointelligence Lab7 Data Generation X1X1 X3X3 X4X4 X2X2 X5X5 X6X6 1.Sample X 1 from P(X 1 ) 2.Sample X 2 from P(X 2 ) 3.Sample X 3 from P(X 3 | X 1 ) 4.Sample X 4 from P(X 4 | X 1, X 2 ) 5.Sample X 5 from P(X 5 | X 3 ) 6.Sample X 6 from P(X 6 | X 4 )
(c) 2005 SNU CSE Biointelligence Lab8 Data Generation Tool data_generator Usage: data_generator [network file style] [# of nodes] [# of data samples] [input file] [output file]...
(c) 2005 SNU CSE Biointelligence Lab9 Structural Learning of Bayesian Networks Using WEKA software (
(c) 2005 SNU CSE Biointelligence Lab10 Learning Example The original network structure Learned network structure
(c) 2005 SNU CSE Biointelligence Lab11 Materials for the First One Given Bayesian networks sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc Data generation tool data_generator.exe [for Windows], data_generator [for Linux] Downloadable MSBNX ( WEKA ( You should write your own code for comparing Bayesian network structures.
(c) 2005 SNU CSE Biointelligence Lab12 Study Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data
(c) 2005 SNU CSE Biointelligence Lab13 Gene Expression Data # of data examples 120 (60: before treatment, 60: after treatment) # of genes measured (Affymetrix HG-U95A array) Task Classification between “before treatment” and “after treatment” based on gene expression pattern
(c) 2005 SNU CSE Biointelligence Lab14 Affymetrix GeneChip Arrays Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal
(c) 2005 SNU CSE Biointelligence Lab15 Preprocessing Remove the genes having more than 60 ‘A’ calls # of genes: 3190 Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high)
(c) 2005 SNU CSE Biointelligence Lab16 Gene Filtering Using mutual information Estimated probabilities were used. # of genes: 3190 1000 Final dataset # of attributes: 1001 (one for the class) Class: 0 (after treatment), 1 (before treatment) # of data examples: 120
(c) 2005 SNU CSE Biointelligence Lab17 Final Dataset
(c) 2005 SNU CSE Biointelligence Lab18 Materials for the Second One Given Preprocessed microarray data file: data2.txt Downloadable WEKA (
(c) 2005 SNU CSE Biointelligence Lab19 Due: June 16, 2005 Analysis of the influence of network size and data size on structural learning of Bayesian networks Six Bayesian networks of various sizes are given. Generate data examples from each Bayesian network. Learn Bayesian network structures from the generated data. Analyze the learned results according to the network size and the data size. Classification using Bayesian networks A microarray dataset consisting of two classes of samples is given. Learn Bayesian network classifiers from the dataset. Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.