1 Diagnosing Breast Cancer with Ensemble Strategies for a Medical Diagnostic Decision Support System David West East Carolina University Paul Mangiameli.

Slides:



Advertisements
Similar presentations
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Advertisements

Random Forest Predrag Radenković 3237/10
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
Lecture Notes for Chapter 4 Introduction to Data Mining
Sparse vs. Ensemble Approaches to Supervised Learning
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Ensemble Learning: An Introduction
Lecture 5 (Classification with Decision Trees)
Classification Based in part on Chapter 10 of Hand, Manilla, & Smyth and Chapter 7 of Han and Kamber David Madigan.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Experimental Evaluation
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Radial Basis Function Networks
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Enterprise systems infrastructure and architecture DT211 4
Machine Learning CS 165B Spring 2012
Bayesian Network for Predicting Invasive and In-situ Breast Cancer using Mammographic Findings Jagpreet Chhatwal1 O. Alagoz1, E.S. Burnside1, H. Nassif1,
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Combining Regression Trees and Radial Basis Function Networks paper by: M. Orr, J. Hallam, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, T. Leonard presentation.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.
Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis Denny Wibisono December 10, 2001.
Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Validation methods.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Classification of Breast Cancer Cells Using Artificial Neural Networks and Support Vector Machines Emmanuel Contreras Guzman.
Combining Bagging and Random Subspaces to Create Better Ensembles
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Introduction to Machine Learning
Machine Learning Fisher’s Criteria & Linear Discriminant Analysis
Week 2 Presentation: Project 3
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Eco 6380 Predictive Analytics For Economists Spring 2016
Chapter 13 – Ensembles and Uplift
Trees, bagging, boosting, and stacking
Statistical Techniques
COMP61011 : Machine Learning Ensemble Models
Schizophrenia Classification Using
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Somi Jacob and Christian Bach
Model generalization Brief summary of methods
Machine Learning with Clinical Data
Evolutionary Ensembles with Negative Correlation Learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

1 Diagnosing Breast Cancer with Ensemble Strategies for a Medical Diagnostic Decision Support System David West East Carolina University Paul Mangiameli University of Rhode Island Rohit Rampal Portland State University

2 Introduction Breast cancer is one of the most prevalent illnesses to women over 40 Society must do all it can to reduce the frequency and severity of this disease Early and accurate detection is critical Our MDSS aids physicians in this task

3 Introduction Critical to performance and acceptance of any MDSS is the selection of the underlying models This paper investigates the performance of MDSS comprised of: –single (individual) models and –ensembles (groups) of models How should ensembles be formed

4 Overview of Presentation Discussion of Literature on Model Selections for MDSS Research Methodology –Breast cancer data sets –Model description Experimental Design Results of Generalized Classification Errors Greedy Model Selection Strategy Conclusion

5 Literature Review Most MDSS studies use single best model strategy –Parametric: Linear discriminant analysis Logistic regression –Non-parametric K nearest neighbor Kernel density

6 Literature Review –Neural Networks Multilayer perceptron Radial Basis Function Mixture of experts –Classification and Regression Trees (CART)

7 Literature Review A very limited number of studies use ensembles of models The ensembles are comprised of one type of model – we call these baseline ensembles These ensembles are usually bagging or bootstrap aggregating ensembles –Bootstrap is a way of perturbing the data so as to create diversity in the decision making of each model in the ensemble

8 Research Methodology Breast Cancer Data Sets Model Descriptions Experimental Design

9 Breast Cancer Data Sets Cytology Data –699 Records of fine needle aspirants –Biopsied tumor mass is benign or malignant 458 benign and 241 malignant Prognostic Data –198 Records of invasive, non-metastasized, malignant tumors –Tumor is either recurrent or non-recurrent 151 non-recurrent and 47 recurrent

10 Model Description 24 Models in Total Linear Discriminant Analysis (LDA) - 1 Logistic Regression (LR) - 1 Nearest Neighbor Classifiers (KNN) - 4 Kernal Density (KNN) - 4 Multilayer Perceptron (MLP) - 4 Mixture of Experts (MOE) - 4 Radial Basis Function (RBF) - 4 Classification and Regression Trees (CART) - 2

11 Experimental Design Single Best Model Generalization Error Split data into 3 partitions –Training set –Validation set –Independent test set - test set is randomly selected and equals 10% of data Partition remaining data into 10 mutually exclusive sets – ten fold cross validation One partition is the validation set

12 Experimental Design Single Best Model Generalization Error Collapse other 9 partitions and use as training set Train each of the 24 models and then compute error on the validation set Repeat 10 times using each partition as the validation set

13 Experimental Design Single Best Model Generalization Error Model with the lowest error across the 10 fold cross validation runs is called the single best model Test the single best model against the hold out test set and compute the error Repeat previous steps 100 times Determine Single Best Model generalized error over all 100 runs

14 Experimental Design Baseline Bagging Ensembles Bagging means Bootstrap Aggregating Baseline means comprise of just one of the 24 models There are 24 baseline bagging ensembles – one for each model

15 Experimental Design Baseline Bagging Ensembles Essentially, the same ten fold cross validation approach is used Perturb the training data (sample with replacement) for each of the 24 models in the ensemble –Creates 24 uniquely weighted models with the same general architecture within the ensemble Use majority vote to determine aggregate decisions for the test set Use 500 runs to determine mean generalization error for each ensemble

16 Experimental Design Diverse Bagging Ensembles Same as the baseline bagging procedure but now pre-select the models in the ensemble Of the 24 models, randomly choose: 24, 12, 8, 6, 4, 3, and 2 different models to make up the ensemble. Also, restrict the choice of models for the ensemble to the top 50% and top 25% of the single best models

17 Results Generalized Error Initial Results Design and Results of Greedy Algorithm

18 Initial Results Comparison of Generalized Error Model Selection Strategy Prognostic DataCytology Data Single Best Model Baseline Bagging Ensemble Diverse Random Ensembles Top 50% Diverse Bagging Ensemble Top 25% Diverse Bagging Ensemble

19 Discussion of Initial Results The existing literature uses either the single best or baseline bagging model selection strategy By using a diverse initial group of 24 models and then creating diverse ensembles comprised of top 25% of the single best models, the generalized errors were significantly lowered

20 Some Observations about Initial Results Large number of single models were the best at least once for the 100 runs Baseline bagging ensembles did poorly for the cytology data compared to the single best model –We found near unanimous consensus (indicated by high levels of error correlations) among models –Indicates that bootstrapping may not create model instability The best diverse ensembles were comprised of 3 different models and the worse had all had more than 6

21 Development of Greedy Ensemble Algorithm Create ensembles with: Diversity of models (3 - 6 different models in the ensemble) Each model has a low generalization error in the single best model results Each model had low error correlation levels (i.e.- high model instability) in the baseline bagging results

22 Model Selection for Greedy Algorithm Regressing generalized error as a function of model instability we found that there were a few models with high instability and low generalized error for each data set We choose 3 different models for the greedy algorithm ensemble Prognostic - RBFd, LR, CARTb Cytology – RBFc, RBFb, MOEa

23 Results Comparison of Greedy Algorithm Ensemble to Top 25% Ensemble Model Selection Strategy Prognostic DataCytology Data Top 25% Diverse Bagging Ensemble Greedy Algorithm Ensemble

24 Concluding Discussion

25 Investigated the Performance of MDSS Comprised of: Single models Baseline bagging ensembles Diverse random bagging ensembles Selected bagging ensembles Greedy Algorithm bagging ensembles

26 We Found that MDSS comprised of a single model or an ensemble of single models (baseline) do not perform well Diverse ensembles are better Diverse ensembles comprised of models with low generalized errors from the single best model results are still better The best results are achieved by diverse ensembles comprised of models with low generalized error from the single best model results and high instability from the baseline bagging results

27 Limitations Limitations of the Methodology –Use of plurality voting –Use of only bootstrap Limitations of Applicability –Only 2 data sets Both data sets regard breast cancer –MDSS is crude – not for commercial applications