Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

1 Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays Hsun-Hsien Chang 1, Michael McGeachie 1,2 1 Children’s Hospital.
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
Minimum Redundancy and Maximum Relevance Feature Selection
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –Arabidopsis mutants –Yeast Cell Cycle –Yeast Rosetta.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Microarray GEO – Microarray sets database
Data Mining Techniques Outline
Public data - available for projects 6 data sets: –Human Tissues –Leukemia –Spike-in –FARO compendium – Yeast Cell Cycle –Yeast Rosetta Find one yourself.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Neural Technology and Fuzzy Systems in Network Security Project Progress Group 2: Omar Ehtisham Anwar Aneela Laeeq
Geometric Approaches to Reconstructing Time Series Data Project Update 29 March 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
An Exercise in Machine Learning
Data Mining Chun-Hung Chou
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
Whole Genome Expression Analysis
Appendix: The WEKA Data Mining Software
Data Type 1: Microarrays
RNAseq analyses -- methods
It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany.
Analysing Microarray Data Using Bayesian Network Learning Name: Phirun Son Supervisor: Dr. Lin Liu.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Project 2: Classification Using Genetic Programming Kim, MinHyeok Biointelligence laboratory Artificial.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Project of CZ5225 Zhang Jingxian:
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department,
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Tmm: Analysis of Multiple Microarray Data Sets Richard Moffitt Georgia Institute of Technology 29 June, 2006.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Information Processing by Neuronal Populations Chapter 6: Single-neuron and ensemble contributions to decoding simultaneously recoded spike trains Information.
Chapter 4. Analysis of Brain-Like Structures and Dynamics (2/2) Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans 09/25.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Bayesian Networks in Bioinformatics Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Artificial Intelligence Chapter 11 Alternative Search Formulations and Applications.
Statistical Applications in Biology and Genetics
Microarray Technology and Applications
Model Averaging with Discrete Bayesian Network Classifiers
Gene Selection for Microarray-based Cancer Classification Using Genetic Algorithm 이 정문 2003/04/01 BI Lab.
AP Statistics: Chapter 7
Akio Utsugi National Institute of Bioscience and Human-technology,
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Subhayu Basu et al. , DNA8, (2002) MEC Seminar Su Dong Kim
A Classification Data Set for PLM
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005

(c) 2005 SNU CSE Biointelligence Lab2 Goals of the Project Analysis of the influence of network size and data size on structural learning of Bayesian networks  Six Bayesian networks of various sizes are given.  Generate data examples from each Bayesian network.  Learn Bayesian network structures from the generated data.  Analyze the learned results according to the network size and the data size. Classification using Bayesian networks  A microarray dataset consisting of two classes of samples is given.  Learn Bayesian network classifiers from the dataset.  Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.

(c) 2005 SNU CSE Biointelligence Lab3 Given Bayesian Networks Randomly generated Network structure: scale-free and modular # of variables: 10, 30, and 45 All variables are binary Network file format: *.dsc for MSBNX (

(c) 2005 SNU CSE Biointelligence Lab4 Example Bayesian Network Structure I

(c) 2005 SNU CSE Biointelligence Lab5 Example Bayesian Network Structure II

(c) 2005 SNU CSE Biointelligence Lab6 *.dsc Files Node name Possible states Parents Child Conditional probability distribution

(c) 2005 SNU CSE Biointelligence Lab7 Data Generation X1X1 X3X3 X4X4 X2X2 X5X5 X6X6 1.Sample X 1 from P(X 1 ) 2.Sample X 2 from P(X 2 ) 3.Sample X 3 from P(X 3 | X 1 ) 4.Sample X 4 from P(X 4 | X 1, X 2 ) 5.Sample X 5 from P(X 5 | X 3 ) 6.Sample X 6 from P(X 6 | X 4 )

(c) 2005 SNU CSE Biointelligence Lab8 Data Generation Tool data_generator  Usage: data_generator [network file style] [# of nodes] [# of data samples] [input file] [output file]...

(c) 2005 SNU CSE Biointelligence Lab9 Structural Learning of Bayesian Networks Using WEKA software (

(c) 2005 SNU CSE Biointelligence Lab10 Learning Example The original network structure Learned network structure

(c) 2005 SNU CSE Biointelligence Lab11 Materials for the First One Given  Bayesian networks  sf_10.dsc, sf_30.dsc, sf_45.dsc, md_10.dsc, md_30.dsc, md_45.dsc  Data generation tool  data_generator.exe [for Windows], data_generator [for Linux] Downloadable  MSBNX (  WEKA ( You should write your own code for comparing Bayesian network structures.

(c) 2005 SNU CSE Biointelligence Lab12 Study Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data

(c) 2005 SNU CSE Biointelligence Lab13 Gene Expression Data # of data examples  120 (60: before treatment, 60: after treatment) # of genes measured  (Affymetrix HG-U95A array) Task  Classification between “before treatment” and “after treatment” based on gene expression pattern

(c) 2005 SNU CSE Biointelligence Lab14 Affymetrix GeneChip Arrays Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by  Signal: numerical value describing the abundance of mRNA  A/P call: denotes the statistical significance of signal

(c) 2005 SNU CSE Biointelligence Lab15 Preprocessing Remove the genes having more than 60 ‘A’ calls  # of genes:  3190 Discretization of gene expression level  Criterion: median gene expression value of each sample  0 (low) and 1 (high)

(c) 2005 SNU CSE Biointelligence Lab16 Gene Filtering Using mutual information  Estimated probabilities were used.  # of genes: 3190  1000 Final dataset  # of attributes: 1001 (one for the class)  Class: 0 (after treatment), 1 (before treatment)  # of data examples: 120

(c) 2005 SNU CSE Biointelligence Lab17 Final Dataset

(c) 2005 SNU CSE Biointelligence Lab18 Materials for the Second One Given  Preprocessed microarray data file: data2.txt Downloadable  WEKA (

(c) 2005 SNU CSE Biointelligence Lab19 Due: June 16, 2005 Analysis of the influence of network size and data size on structural learning of Bayesian networks  Six Bayesian networks of various sizes are given.  Generate data examples from each Bayesian network.  Learn Bayesian network structures from the generated data.  Analyze the learned results according to the network size and the data size. Classification using Bayesian networks  A microarray dataset consisting of two classes of samples is given.  Learn Bayesian network classifiers from the dataset.  Compare the classification accuracy of Bayesian network classifiers with that of other classifiers such as neural networks.