In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania Catalin.Stoean@inf.ucv.ro.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

OpenCV Introduction Hang Xiao Oct 26, History  1999 Jan : lanched by Intel, real time machine vision library for UI, optimized code for intel 
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
AIIA Lab, Department of Informatics Aristotle University of Thessaloniki Z.Theodosiou, F.Raimondo, M.E.Garefalaki, G.Karayannopoulou, K.Lyroudia, I.Pitas,
66: Priyanka J. Sawant 67: Ayesha A. Upadhyay 75: Sumeet Sukthankar.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Chapter18 Determining and Interpreting Associations Among Variables.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Reduced Support Vector Machine
Statistics Lecture 20. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Co-Occurrence and Morphological Analysis for Colon Tissue Biopsy Classification Khalid Masood, Nasir Rajpoot, Kashif Rajpoot*, Hammad Qureshi Signal and.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
16 November, 2005 Statistics in HEP, Manchester 1.
Biomedical Image Analysis and Machine Learning BMI 731 Winter 2005 Kun Huang Department of Biomedical Informatics Ohio State University.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Efficient Model Selection for Support Vector Machines
The identification of interesting web sites Presented by Xiaoshu Cai.
Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.
WELCOME. Malay Mitra Lecturer in Computer Science & Application Jalpaiguri Polytechnic West Bengal.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Object Detection with Discriminatively Trained Part Based Models
1 Computing in High Energy and Nuclear Physics, February 2006, Mumbai, India.
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle 1, Michael Feldman.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Date of download: 5/28/2016 Copyright © 2016 SPIE. All rights reserved. Flowchart of the computer-aided diagnosis (CAD) tool. (a) Segmentation: The region.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Optical Character Recognition
Automatic Classification for Pathological Prostate Images Based on Fractal Analysis Source: IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 28, NO. 7, JULY.
Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.
Date of download: 9/17/2016 Copyright © 2016 SPIE. All rights reserved. Spatial light interference microscopy (SLIM) optical setup. The phase is retrieved.
Recognition of biological cells – development
Mammogram Analysis – Tumor classification
Instance Based Learning
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
CSSE463: Image Recognition Day 11
Basic machine learning background with Python scikit-learn
Brain Hemorrhage Detection and Classification Steps
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
A Fast and Scalable Nearest Neighbor Based Classification
Somi Jacob and Christian Bach
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Concave Minimization for Support Vector Machine Classifiers
Feature Selection Methods
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Presentation transcript:

In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania Catalin.Stoean@inf.ucv.ro

Histopathological images Histopathology represents the microscopic examination of biopsies in order to locate and classify disease Clinical diagnosis of cancer, identification of the malignancy level: study of histological images Pathologists search for regularities of cell shapes and change in distribution of the cells across the tissue. Based on their expertise and personal experience, they decide whether the examined tissue regions are cancerous and also determine the malignancy level. Histology is the study of the microscopic anatomy of cells and tissues of organisms Histological analysis represents the examining of a thin slice of tissue under microscope 80% of the 1 million prostate biopsies yearly performed in U.S. are benign so, a pathologist spends only 20% of their time grading the malignant tissues. The distribution is not equally useful in the colorectal cancer cases, but still a significant amount of work could be saved for the pathologists

Histopathological images The judgment of the pathologists, although educated and sometimes based on vast experience, is subjective and often leads to considerable variability Variation in the classification of the same image by the same pathologist at different times (intra-observer) and the variation in the classification of the same image by different pathologists (inter-observer) Quantitative image-based evaluation is necessary for reducing or eliminating inter- and intra- observer variations in diagnosis

Histological images – healthy tissues, 10x

Histological images – grade 1, 10x

Histological images – grade 2, 10x

Histological images – grade 3, 10x

Normal tissue Grade 1 Grade 2 Grade 3

The main computational steps on histopathological images The tissue sample is first dissected and fixed to preserve its structure. It is then dehydrated, cleared and embedded. Then, it is sectioned into very thin slices (e.g. 5 m), sections are mounted on a glass slide for staining via hematoxylin and eosin.

Feature extraction Both, glands and nuclei are considered for the information retrieval process. For glands¥: Thresholding, watershed, erosion, dilation For nuclei: Grayscale -> normalized box filtering -> Thresholding -> Distance transform algorithm -> Image normalization -> Binary Image ¥ ACM GECCO 2015, Madrid

Feature extraction Both glands and nuclei are considered for the information retrieval process. The identified structures (glands, nuclei) are counted. Delaunay triangles and Voronoi diagrams are obtained from the central points that enclose the contours of the detected structures. For each structure area, the perimeter and the radius of the enclosing circle are computed. For each measure, the mean, median, minimum/maximum and standard deviation are calculated.

Average, median, standard deviation, min/max 76 features extracted Feature Measures Statistics Total Morphological Area, perimeter, radius for glands Average, median, standard deviation, min/max 12 Area, perimeter, radius for nuclei Number of glands and of nuclei - 2 Topological Area, perimeter, radius for the Delaunay triangles for glands Area, perimeter, radius for the Voronoi polygons for glands Area, perimeter, radius for the Delaunay triangles for nuclei Area, perimeter, radius for the Voronoi polygons for nuclei Number of Delaunay triangles for glands and nuclei

Data set & Classification On the obtained data, support vector machines, linear kernel (proved better than the radial one) 76 indicators, 357 samples, all 800x600 pixels¥. Previously*, also considered PCA and reduced the number of attributes from 76 to 13. On this data, radial SVM – better than linear (by 4%). Training data: 2/3 of entire data, 30 repeated runs. SVM: 79.89% SVM + PCA: 74.85% The PCA reduced the number of attributes from 76 to only 13, as this was the number of components required to explain at least 95% of the variance. * C. Stoean et al., SVM-Based Cancer Grading from Histopathological Images using Morphological and Topological Features of Glands and Nuclei, Intelligent Interactive Multimedia Systems and Services, 55, Smart Innovation, Systems and Technologies, Springer, pp. 145-155, 2016. ¥Dataset available for download at https://sites.google.com/site/imediatreat/

Feature selection Two filter methods are tried (via the RSelector R package): Consistency-based filter Correlation mechanism (CFS) One wrapper method – GA (via the genalg R package) binary representation Individual size: 76 Fitness function: SVM with linear/radial kernel, 2/3 training and 1/3 test data, 30 repeats. GA parameter tuning: the choices comprised a population size of 50 and 100, a number of iterations to give the stop condition between 30, 40 and 50 and the mutation probability in {0.005, 0.01, 0.1}. During the parameter tuning phase, the number of repeated runs of the SVM was reduced to only 10 and the accuracy was computed as an average of the 10 classification accuracies. Fitness Population size Iterations Mutation probability SVM linear 100 50 0.01 SVM radial 0.1

Results CFS selects only 3 features. Consistency appoints 15 features. GA & SVM linear: 20 features. GA & SVM radial: 24 features. Classifier GA CFS Consistency SVM linear 83.94% 49.59% 69.97% SVM radial 81.9% 53.38% 77.08%

Most selected features by the GA SVM Linear kernel SVM Radial kernel

Most selected features 34 features are not selected by any of the methods

Conclusions & Future Work A data set of 357 histopathological images, 800x600 pixels, separated into 4 grades. https://sites.google.com/site/imediatreat/ Diagnosis accuracy over the four classes: 83.94% More options in the computational steps remain to be tried Parameter tuning Only thresholding was previously considered

Q&A