JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001

Slides:



Advertisements
Similar presentations
Introduction to Neural Networks Computing
Advertisements

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
Classification and Diagnostic of Cancers
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Back-Propagation Algorithm
Chapter 6: Multilayer Neural Networks
CS 484 – Artificial Intelligence
Radial Basis Function Networks
Gene based diagnostic prediction of cancers by using Artificial Neural Network Liya Wang ECE/CS/ME539.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Radial Basis Function Networks
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
381 Self Organization Map Learning without Examples.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Machine Learning Supervised Learning Classification and Regression
Artificial Neural Networks
Classification with Perceptrons Reading:
Molecular Classification of Cancer
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Qing-Rong Chen, Gordon Vansant, Kahuku Oades, Maria Pickering, Jun S
Presentation transcript:

JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001 Classification and Diagnostic Prediction of Cancers using Gene Expression Profiling and Artificial Neural Networks JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001

The Small, Round Blue Cell Tumors (SRBCTs) of Childhood Four categories – Neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS). Similar in appearance on routine histology. However accurate diagnosis is essential – as treatment options , response to therapy, etc, vary. No single test can precisely distinguish SRBCTs – Immunohistochemistry, cytogenetics, interphase fluorescent in situ hybridization and reverse transcription.

Gene Expression Profiling using cDNA Microarrays. Micoarrays measure the activities of several thousand genes simultaneously. Can be used for Cancer Classification. This will give better therapeutic measurements to cancer patients by diagnosing cancer types with improved accuracy. and furthermore cancers belonging to several diagnostic categories – SRBCTs.

Artificial Neural Networks (ANNs) – put to the task. Modeled on the structure and behavior of neurons in the human brain. Can be trained to recognize and categorize complex patterns. Pattern recognition achieved by adjusting of the ANN by a process of error minimization through learning from experience. ANNs were applied to decipher gene-expression signatures of SRBCTs and then used for diagnostic classification.

Error Minimization Mean Squared Error Summed Square Error

Network Architecture and Parameters Due to limited amount of calibration data and the fact that four output nodes are needed, the network architecture was limited to Linear perceptrons. 10 input nodes were used representing the 10 PCA components described later on. 4 output nodes modeled by the Sigmoid function. Calibration is performed using JETNET, with learning rate η = 0.7, momentum coefficient p = 0.3. The learning rate is decreased with a factor of 0.99 after each iteration. Initial weight values are chosen randomly from [-r,r], where r = 0.1/max[Fi], where Fi is the number of nodes connecting to node i. Weight values are updated after every 10 samples.

Back-propagation Minimizing by gradient descent is the least sophisticated but nevertheless in many cases a sufficient method. It amounts to updating the weights according to the Back-propagation learning rule. The partial derivative ∂Et/∂w represents a sensitivity factor, determining the direction of search in weight space for the synaptic weights. where Delta rule

….continue A momentum is often added to stabilize the learning. where α < 1

Calibration and validation of the ANN Models. cDNA microarrays containing 6567 genes: 63 training samples comprised of 13 EWS and 10 RMS from tumor biopsy and 10 EWS, 10 RMS, 12 NB, 8 BL from cell lines. 25 test samples comprised of 5 EWS, 5 RMS, 4 NB, from tumors and 1EWS, 2 NB, 3BL from cell lines. Plus 5 non-SRBCT samples (test ability reject diagnosis). Filtering for the minimal number of expression reduced the genes to 2308. Principle Component Analysis (PCA) further reduced dimensionality.

….continue 10 dominant PCA components per sample were used as inputs…. and four outputs – (EWS, RMS, NB, BL). A three-fold cross-validation procedure was used and 3750 ANN models were produced (Figure 1). No sign of “over-training” of the models as would be shown by a rise in the summed square error for the validation set with increasing iterations (epochs) -see figure 2.

The Artificial Neural Network Quality Filtering PCA 25 test samples set aside and the 63 training samples are randomly partitioned into 3 groups One group is reserved for validation and the other two used for calibration. For each model the calibration was optimized with 100 iterative cycles (epochs). This was repeated using each of the three groups for validation. The samples were again randomly partitioned and the entire training process repeated. For each selection of a validation group one model was calibrated, resulting in a total of 3750 trained models. Once the models were calibrated they were used to rank the genes according to their importance for classification. The entire process was repeated using only top ranked genes.

….continue Validation Each validation sample is then passed through 1250 models and hence 1250 predictions for each validation sample are produced. Each ANN model gives a number between 0 (not this cancer type) and 1(this cancer type) as an output for each cancer type. The average for all model outputs for every validation sample is then computed (denoted the average committee vote). Each sample is classified as belonging to the cancer type corresponding to the largest committee vote. Using these ANN models, all 63 training samples were correctly classified to their respective categories.

Optimization of Genes used for Classification. The contribution of each gene to the classification by the ANN models was then assessed. Feature extraction was performed in a model dependent way due to relatively few samples. This was achieved by monitoring the sensitivity of classification to a change in the expression level of each gene, using the 3750 previously calibrated models.

Sensitivity (S) of the outputs (o) with respect to any 2308 input varaibles (xk) is defined as: Where Ns is the number of samples (63) and No is the number of outputs (4). The procedure for computing Sk involves a committee of 3750 models.

….continue In this way genes were ranked according to the significance of classification and the classification error rate using increasing numbers of these ranked genes was determined. The classification Error rate minimized at 0% at 96 genes. Using only these 96 genes, recalibration of the ANN models was performed and again all 63 samples were correctly classified.

Assessing the Quality of Classification - Diagnoses. The aim of diagnoses is to be able to reject test samples which do not belong to any of the four categories. To do this a distance dc from a sample to the ideal vote for each cancer type was calculated:

….continue Where c is the cancer type, oi is the average committee vote for cancer i, and δi,c is unity if i corresponds to cancer type c and zero otherwise. The distance is normalized such that the distance between two ideal samples belonging to different disease categories is unity. Based on the validation set, an empirical probability distribution of distances for each cancer type was generated. The empirical probability distributions are built using each ANN model independently. Thus, the number of entries in each distribution is given by 1250 multiplied with the number of samples belonging to the caner type.

….continue For a given test sample it is thus possible to reject possible classifications based on the these probability distributions. Hence for each disease category a cuttoff distance from the ideal sample was defined within which it is expected a sample of this category to fall in. The distance given by the 95th percentile of the probability distribution was chosen. This is the basis of diagnoses, as a sample that falls outside the cuttoff distance cannot be confidently diagnosed.

Diagnostic Classification and Hierarchical Clustering. The diagnostic capabilities of all 3750 ANN models were then tested using the 25 blinded test samples. A sample is classified to a diagnostic category if it receives the highest vote for that category and because this classifier has only four possible outputs, all samples will be classified to one of the four categories. If a sample falls outside the 95th percentile of the probability distribution of distances between samples and their ideal output (for example for EWS it is EWS = 1, RMS = NB = BL = 0), its diagnosis is rejected. Using the 3750 ANN models calibrated with the 96 genes, 100% classification was achieved for the 20 SRBCT test samples and furthermore all of the 5 non-SRBCT samples were excluded from any of the four diagnostic categories, since they fell outside the 95 percentile.

….continue Hierarchical clustering using the 96 genes, identified from the ANN models, correctly clustered all 20 of the test samples