Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.

Slides:



Advertisements
Similar presentations
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Advertisements

Machine Learning Neural Networks
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Part II: Discriminative Margin Clustering Joint work with: Rob Tibshirani, Dept of Statistics Patrick O. Brown, School of Medicine Stanford University.
Classification and Diagnostic of Cancers
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Radial Basis Function Networks
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
Gene based diagnostic prediction of cancers by using Artificial Neural Network Liya Wang ECE/CS/ME539.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Gene expression profiling identifies molecular subtypes of gliomas
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
From motif search to gene expression analysis
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Big data classification using neural network
Semi-Supervised Clustering
CEE 6410 Water Resources Systems Analysis
Artificial Neural Networks
An Artificial Intelligence Approach to Precision Oncology
FINAL PROJECT- Key dates
Exploring Microarray data
Gene Expression Analysis
第 3 章 神经网络.
Gene expression.
Basic machine learning background with Python scikit-learn
Molecular Classification of Cancer
Claudio Lottaz and Rainer Spang
PCA, Clustering and Classification by Agnieszka S. Juncker
Loyola Marymount University
Artificial Intelligence Chapter 3 Neural Networks
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Artificial Intelligence Chapter 3 Neural Networks
The Naïve Bayes (NB) Classifier
Somi Jacob and Christian Bach
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Loyola Marymount University
Evolutionary Ensembles with Negative Correlation Learning
Loyola Marymount University
Qing-Rong Chen, Gordon Vansant, Kahuku Oades, Maria Pickering, Jun S
Loyola Marymount University
Claudio Lottaz and Rainer Spang
Artificial Intelligence Chapter 3 Neural Networks
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed Khan et al. (Summarized by Marcílio Souto – ICMC/USP- São Carlos)

2 Abstract Small, round blue-cell tumors (SRBCTs) Four distinct categories hard to discriminate cDNA microarray and Artificial Neural Networks (ANNs) Tumor diagnosis and the identification of candidate targets for therapy

3 The Problem SRBCTs of childhood Neuroblastoma (NB) Rhabdomyosarcoma (RMS) Non Hodgkin lymphoma (NHL) The Ewing family of tumors (EWS) All four distinctions have similar appearances in routine histology Accurate diagnosis is essential In clinical practice Immunohistochemistry: the detection of protein expression Reverse transcription-PCR: tumor-specific translocation EWS-FLI1 in EWS and the PAX3-FKHR in ARMS

4 The Approach Gene-expression profiling using cDNA microarrays A simultaneous analysis of multiple markers Multiple categorical distinctions Artificial neural networks (ANNs) Diagnosing myocardial infarcts Diagnosing arrhythmias from electrocardiograms Interpreting radiographs Interpreting magnetic resonance images

5 The Experiment cDNA microarray with 6,567 genes 63 training examples Tumor biopsy material Cell lines Filtering for a minimal level of expression 2,308 genes PCA further reduced the dimensionality. 10 dominant PCA components were used. (63% of the variance in the data matrix) Three-fold cross-validation 3,750 ANNs were constructed (average vote) No overfitting and zero classification error in the training sample

6 Data Sets 63 Total number of samples for train and validation 0 The number of unlabeled samples 8 The number of train samples for cancer IV (BL) 12 The number of train samples for cancer III (NB) 20 The number of train samples for cancer II (RMS) 23 The number of train samples for cancer I (EWS) train Table for thetest Table I for the 25 Total number of test samples 5 The number of unlabeled samples (non-SRBCT) 3 The number of test samples for cancer IV (BL) 6 The number of test samples for cancer III (NB) 5 The number of test samples for cancer II (RMS) 6 The number of test samples for cancer I (EWS)

7 The Schematic View of the Analysis Process

8 Data Analysis Initial Cuts Principal Components Analysis Artificial Neural Network Prediction Extraction of Relevant Genes

9 Data Analysis: Initial Cuts and PCA Initial Cuts Gene are omitted if for any of the samples the red intensity (ri) is less than 20 From 6567 to 2308 genes Principal Components Analysis (PCA) Reduce the dimensionality of data to 10 components – 2308 genes to 10 inputs inputs This number (10) was found by means of pre- experiments

10 Data Analysis: Artificial Neural Network (1/3) Architecture and Parameters Linear Perceptron (LP) 10 inputs representing the PCA components 4 output nodes – one for each class of tumor (EWS, BL, NB and RMS) 44 free parameters, including four threshold units Calibration (training) was performed using JETNET  =0.7; momentum=0.3 Learning rate decreased after each epoch (0.99) Initial weights randomly chosen from [-r,r] – r=0.1/F Weights updated after every 10 epochs At most 100 epochs

11 Data Analysis: Artificial Neural Network (2/3) Calibration and Validation 3-fold cross-validation 63 labeled samples are randomly shuffled and split into 3 equally sized groups The network is trained with two of these groups and the other used to validation This procedure is repeated 3 times The random shuffling is redone 1250 times 3750 networks For validation, the average of the result for the 1250 networks as output – committee For test samples, the committee is formed with all 3750 networks 25 samples in the test set

12 Data Analysis: Artificial Neural Network (3/3) Assessing the quality of classifications Each sample is classified as belonging to the cancer type corresponding to the largest average committee vote Rejection of second largest class or samples that do not belong to any of the class Definition of a distance from a sample to the ideal vote for each cancer type Based on the validation set, for each type of cancer an empirical distribution of its distance is generated For a given test sample, the system can reject possible classification based on these probability distributions OBS: the classification as well as the extraction of important genes converges using less than 100 networks The only reason 3750 networks were used is to have sufficient statistics for these empirical probability distributions

13 Relevant Gene Extraction In order to select relevant genes, the authors proposed a sensitivity measure (S) of the outputs (o) with respect to any of the 2308 input variables, summed over the number of samples and outputs All 3750 networks are involved They also proposed a measure related for a single output Thus, they can rank the genes according to their importance for the total classification but also according to their importance for the different disease separately They explored for 6, 12, 24, 48, 96, 192, 384, 768 and 1536 genes For each choice training (calibration) was redone

14 Summed Square Error Graph

15 Optimizations of Genes Utilized for Classification Using 3,750 trained models, rank all genes according to their significance for the classification Determine the classification error rate using increasing number of these ranked genes

16 Recalibrating the ANNs Using only 96 genes, the analysis process was repeated Zero classification error

17 Diagnostic Classification 25 test examples (5 non-SRBCTs) If a sample falls outside the 95 th percentile of the probability distribution of distances between samples and their ideal output, its diagnosis is rejected

18 Multi-Dimensional Scaling (MDS) Using 96 genes

19 Hierarchical Clustering of 96 Genes - 93 unique genes (3 IGF2 and 2 MYC) - 13 ESTs - 41 genes have not been reported as associated with these diseases. - Perfect clustering of four categories

20 Expression of FGFR4 on SRBCT Tissue Array At the protein level, Immunohistochemistry on SRBCT tissue arrays for the expression of fibroblast growth factor receptor 4 (FGFR4) FGFR4 Expressed during myogenesis (not in adult muscle) Potential role in tumor growth Prevention of terminal differentiation in muscle Strong cytoplasmic immunostaining for FGFR4 was seen in all 26 RMSs tested.

21 Discussion Current diagnoses of tumors rely on histology (morpholgy) and immunohistochemistry (protein expression) Using cDNA microarrays Multiple markers (robust) Reveal the underlying genetic aberrations or biological processes Tumors and cell lines Cell lines for ANN calibration

22 Reference J. Khan et al. ”Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks”, Nature Medicine, Vol. 7, Number 6, June 2001 and the references therein. Analysis Methods Supplement for Nature Medicine, Vol. 7, Number 6, June M. Ringner, C. Peterson and J. Khan ”Analyzing array data using supervised methods”, Pharmacogenomics, vol. 3, Number 3, NIH News Release: Gene Chips Accurately Diagnose Four Complex Childhood Cancers Artificial Intelligence Used With Gene Expression Microarrays for the First Time.