Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

Similar presentations


Presentation on theme: "Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005."— Presentation transcript:

1 Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005

2 (c) 2005 SNU CSE Biointelligence Lab2 Goals of the Project Analysis of the influence of network size and data size on structural learning of Bayesian networks  Six Bayesian networks of various sizes are given.  Generate data examples from each Bayesian network.  Learn Bayesian network structures from the generated data.  Analyze the learned results according to the network size and the data size. Classification using Bayesian networks  A microarray dataset consisting of two classes of samples is given.  Learn Bayesian network classifiers from the dataset.  Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.

3 (c) 2005 SNU CSE Biointelligence Lab3 Given Bayesian Networks Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (http://research.microsoft.com/adapt/MSBNx/)http://research.microsoft.com/adapt/MSBNx/

4 (c) 2005 SNU CSE Biointelligence Lab4 Example Bayesian Network Structure I

5 (c) 2005 SNU CSE Biointelligence Lab5 Example Bayesian Network Structure II

6 (c) 2005 SNU CSE Biointelligence Lab6 *.dsc Files Node name Possible states Parents Child Conditional probability distribution

7 (c) 2005 SNU CSE Biointelligence Lab7 Data Generation X1X1 X3X3 X4X4 X2X2 X5X5 X6X6 1.Sample X 1 from P(X 1 ) 2.Sample X 2 from P(X 2 ) 3.Sample X 3 from P(X 3 | X 1 ) 4.Sample X 4 from P(X 4 | X 1, X 2 ) 5.Sample X 5 from P(X 5 | X 3 ) 6.Sample X 6 from P(X 6 | X 4 )

8 (c) 2005 SNU CSE Biointelligence Lab8 Data Generation Tool data_generator  Usage: data_generator [network file style] [# of nodes] [# of data samples] [input file] [output file]...

9 (c) 2005 SNU CSE Biointelligence Lab9 Structural Learning of Bayesian Networks Using WEKA software (http://www.cs.waikato.ac.nz/ml/weka/)http://www.cs.waikato.ac.nz/ml/weka/

10 (c) 2005 SNU CSE Biointelligence Lab10 Learning Example The original network structure Learned network structure

11 (c) 2005 SNU CSE Biointelligence Lab11 Materials for the First One Given  Bayesian networks  sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc  Data generation tool  data_generator.exe [for Windows], data_generator [for Linux] Downloadable  MSBNX (http://research.microsoft.com/adapt/MSBNx/)http://research.microsoft.com/adapt/MSBNx/  WEKA (http://www.cs.waikato.ac.nz/ml/weka/)http://www.cs.waikato.ac.nz/ml/weka/ You should write your own code for comparing Bayesian network structures.

12 (c) 2005 SNU CSE Biointelligence Lab12 Study Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003. 60 leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data

13 (c) 2005 SNU CSE Biointelligence Lab13 Gene Expression Data # of data examples  120 (60: before treatment, 60: after treatment) # of genes measured  12600 (Affymetrix HG-U95A array) Task  Classification between “before treatment” and “after treatment” based on gene expression pattern

14 (c) 2005 SNU CSE Biointelligence Lab14 Affymetrix GeneChip Arrays Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by  Signal: numerical value describing the abundance of mRNA  A/P call: denotes the statistical significance of signal

15 (c) 2005 SNU CSE Biointelligence Lab15 Preprocessing Remove the genes having more than 60 ‘A’ calls  # of genes: 12600  3190 Discretization of gene expression level  Criterion: median gene expression value of each sample  0 (low) and 1 (high)

16 (c) 2005 SNU CSE Biointelligence Lab16 Gene Filtering Using mutual information  Estimated probabilities were used.  # of genes: 3190  1000 Final dataset  # of attributes: 1001 (one for the class)  Class: 0 (after treatment), 1 (before treatment)  # of data examples: 120

17 (c) 2005 SNU CSE Biointelligence Lab17 Final Dataset 120 1000

18 (c) 2005 SNU CSE Biointelligence Lab18 Materials for the Second One Given  Preprocessed microarray data file: data2.txt Downloadable  WEKA (http://www.cs.waikato.ac.nz/ml/weka/)http://www.cs.waikato.ac.nz/ml/weka/

19 (c) 2005 SNU CSE Biointelligence Lab19 Due: June 16, 2005 Analysis of the influence of network size and data size on structural learning of Bayesian networks  Six Bayesian networks of various sizes are given.  Generate data examples from each Bayesian network.  Learn Bayesian network structures from the generated data.  Analyze the learned results according to the network size and the data size. Classification using Bayesian networks  A microarray dataset consisting of two classes of samples is given.  Learn Bayesian network classifiers from the dataset.  Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.


Download ppt "Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005."

Similar presentations


Ads by Google