Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.

Slides:



Advertisements
Similar presentations
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Advertisements

ECG Signal processing (2)
Evolutionary Neural Logic Networks for Breast Cancer Diagnosis A.Tsakonas 1, G. Dounias 2, E.Panourgias 3, G.Panagi 4 1 Aristotle University of Thessaloniki,
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
CHAPTER 10: Linear Discrimination
Support Vector Machines
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Reduced Support Vector Machine
Introduction to Hypothesis Testing CJ 526 Statistical Analysis in Criminal Justice.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
CIBB-WIRN 2004 Perugia, 14 th -17 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini Feature.
1 Diagnosing Breast Cancer with Ensemble Strategies for a Medical Diagnostic Decision Support System David West East Carolina University Paul Mangiameli.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Efficient Model Selection for Support Vector Machines
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Data Mining and Decision Support
Cheng-Lung Huang Mu-Chen Chen Chieh-Jen Wang
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Prediction model building and feature selection with support.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Classification of Breast Cancer Cells Using Artificial Neural Networks and Support Vector Machines Emmanuel Contreras Guzman.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Mammogram Analysis – Tumor classification
An Artificial Intelligence Approach to Precision Oncology
Rule Induction for Classification Using
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Project 4: Facial Image Analysis with Support Vector Machines
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Support Vector Machines (SVM)
Introduction Feature Extraction Discussions Conclusions Results
An Introduction to Support Vector Machines
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Somi Jacob and Christian Bach
Model generalization Brief summary of methods
Feature Selection Methods
CAMCOS Report Day December 9th, 2015 San Jose State University
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with Applications 2008

Introduction  Breast cancer is a serious problem for the young women of Taiwan.  Almost 64.1% of women with breast cancer are diagnosed before the age of 50 and 29.3% of women with breast cancer are diagnosed before the age of 40.  However, the causes are still unknown.

Introduction  This study (Ziegler et al., 1993) shows that fibroadenoma shared some risk factors with breast cancer.  HSV-1 (herpes simplex virus type 1)  EBV (Epstein-Barr virus)  CMV (cytomegalovirus)  HPV (human papillomavirus)  HHV-8 (human herpesvirus-8)

Introduction  DNA viruses, as causes, are closely related to the human cancers as part of the high-risk factors.  In order to obtain the relationship between DNA viruses and breast tumors.  This paper uses the support vector machines (SVM) to find the pertinent bioinformatics.

Two Important Challenge  When using SVM, two problems are confronted:  How to choose the optimal input feature subset for SVM.  How to set the best kernel parameters.  These two problems are crucial because the feature subset choice influences the appropriate kernel parameters and vice versa.

Feature Selection  Feature selection is an important issue in building classification systems.  It is advantageous to limit the number of input features in a classifier in order to have a good predictive and less computationally intensive model.  This study tried F-score calculation to select input features.

F-Score

F-Score Algorithm

Parameters Optimization  To design a SVM, one must choose a kernel function,set the kernel parameters and determine a soft margin constant C.  The grid algorithm is an alternative to finding the best C and gamma when using the RBF kernel function.  This study tried grid search to find the best SVM model parameters.

Grid-Search Algorithm

Data collection  The source of 80 data points (tissue samples)  52 specimens of non-familial invasive ductal breast cancer.  28 mammary fibroadenomas.  (From Chung-Shan Medical University Hospital )

Data partition  Data set is further randomly partitioned into training and independent testing sets via a stratified 5-fold cross validation.

SVM-based optimize parameters and feature selection

The relative feature importance with F-score

The relative importance of DNA virus based on the F-score

The five feature subsets based on the F-score

Overall training and testing accuracy for each feature subset

Type I and type II errors  Type I errors (the "false positive"): the error of rejecting the null hypothesis given that it is actually true  Type II errors (the "false negative"): the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true

Detail testing accuracy for feature subset of size 2 and 3

Linear discriminate analysis (LDA)  Originally developed in 1936 by R.A. Fisher, Discriminate Analysis is a classic method of classification.  Discriminate analysis can be used only for classification  Linear discriminant analysis finds a linear transformation ("discriminant function") of the two predictors, X and Y, that yields a new set of transformed values that provides a more accurate discrimination than either predictor alone:  Transformed Target = C1*X + C2*Y

The P-level of each attribute for LDA Selection criteria: P-level value < 0.05

Training and testing accuracy for LDA

Comparison summary between SVM and LDA

Conclusion  In order to find the correlation DNA viruses with breast tumor, and to achieve a high classificatory accuracy.  F-score is adapted to find the important features.  grid search approach is used to search the optimal SVM parameters.  The results revealed that the SVM-based model has good performance in diagnosing breast cancer according to our data set.

Conclusion  The present study’s results also show that the attributes{HSV-1, HHV-8} or {HSV-1, HHV-8, CMV} can achieve identical high accuracy, at 86% of average overall hit rate.  This study suggests simultaneously considering HSV-1 and HHV-8 is feasible; however, only considering HHV-8 or HSV-1 is less accurate.

Future Work  The practical obstacle of the SVM-based (as well as neural networks) classification model is its black-box nature.  A possible solution for this issue is the use of SVM rule extraction techniques or the use of hybrid-SVM model combined with other more interpretable models.

Thank You