IEEE CBMS’06, DM Track Salt Lake City, Utah 22.06.06 “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Florida International University COP 4770 Introduction of Weka.
ECG Signal processing (2)
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
CMPUT 466/551 Principal Source: CMU
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 Mykola Pechenizkiy Department of Computer Science and Information.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.
Three kinds of learning
Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Discovering Interesting Subsets Using Statistical Analysis Maitreya Natu and Girish K. Palshikar Tata Research Development and Design Centre (TRDDC) Pune,
1 Mining with Noise Knowledge: Error Awareness Data Mining Xindong Wu Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
11 Department of Computer Science, National Tsing Hua University, No. 101 Kuang Fu Road, Hsinchu 300, Taiwan Institute of Information Systems and Applications,
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
CLASSIFICATION: Ensemble Methods
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining and Decision Support
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Experience Report: System Log Analysis for Anomaly Detection
Machine Learning with Spark MLlib
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Advanced Artificial Intelligence Feature Selection
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
A Unifying View on Instance Selection
iSRD Spam Review Detection with Imbalanced Data Distributions
CSCI N317 Computation for Scientific Applications Unit Weka
Chapter 7: Transformations
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 1 Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland Seppo Puuronen & Oleksandr Pechenizkiy Dept. of CS and IS University of Jyväskylä Finland Mykola Pechenizkiy Dept. of Mathematical IT University of Jyväskylä Finland IEEE CBMS’06: DM Track Salt Lake City, Utah, USAJune 21-23, 2006

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 2 Outline  DM and KDD background –KDD as a process, DM strategy –Supervised Learning (SL)  Noise in data –Types and sources of noise  Feature Extraction approaches used: –Conventional Principal Component Analysis –Class-conditional FE: parametric and non-parametric  Experiments design –Impact of class noise on SL and the effect of FE –Dataset characteristics  Results and Conclusion

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 3 Knowledge discovery as a process Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, kNN Naïve Bayes C4.5 PCA and LDA Class noise is introduced in training datasets

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 4 CLASSIFICATION New instance to be classified Class Membership of the new instance J classes, n training observations, p features Given n training instances (x i, y i ) where x i are values of attributes and y is class Goal: given new x 0, predict class y 0 Training Set The task of classification Examples: - diagnosis of thyroid diseases; - heart attack prediction, etc.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 5 Data may contain various types of errors:  random or systematic –random errors are often referred to as noise; –some authors regard as noise both mislabeled examples and outliers which are correctly classified but are relatively rare instances ( exceptions ).  Quality of a dataset in SL – characterized by two information parts of instances: – the quality of the attributes indicates how well they characterize instances for classification purposes, and – the quality of class labels indicates the correctness of class labels’ assignments.  Noise is often similarly divided into two major categories that are –class noise (misclassifications or mislabeling) contradictory instances (instances with the same values of the attributes but different class labels, forming so-called irreducible or Bayes error ) and wrongly classified (labeled) instances that are misclassifications (mislabelings). –attribute noise (errors introduced to attribute values): erroneous attribute values, missing or so-called ‘don‘t know’ values, and incomplete or so-called ‘don’t care’ values.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 6 Sources of Class Noise:  The major factors that impact on the amount of mislabeled instances in a dataset – data-entry errors; –the errors of devices used for automatic classification; –the subjectivity and the inadequacy of information used to label each instance.  Domains in which medical experts may disagree are natural ones for subjective labeling errors: –if the absolute ground truth is unknown then experts must subjectively provide labels and mislabeled instances naturally appear; –if an observation needs to be ranked according to a disease severity;  If the information used to label an instance is different from the information to which the learning algorithm will have access: –if an expert relies on visual input rather than the numeric values of the attributes.  If the results of some tests (attribute values) are unknown – impossible to obtain or difficult to obtain –e.g. because of cost or time considerations.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 7 Handling class noise  noise-tolerant techniques – try to avoid overfitting the possibly noisy training set during SL: –handle noise implicitly; –noise-handling mechanism is often embedded into either search heuristics and stopping criteria used in model construction; post-processing such as decision tree post-pruning; or model selection mechanism based e.g. on MDL principle.  filtering techniques – detect and eliminate noisy instances before SL: –handle noise explicitly; –the noise-handling mechanism is often implemented as a filter that is applied before SL; –results in a reduced training set f the noisy instances are not corrected but deleted; –single-algorithm filters and ensemble filters.  brief review of these approaches, their proc and cons can be found in paper, we omit this discussion due to time constrains It is often hard to distinguish noise from exceptions (outliers) without the help of an expert, especially if the noise is systematic

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 8 Focus of this study:  to apply Feature Extraction (FE) techniques to eliminate the effect of class noise on SL.  This approach fits better to the second category of noise-tolerant techniques as –it helps to avoid overfitting implicitly within learning techniques.  However, this approach has also some similarity with the filtering approach as –it clearly has a separate phase of dimensionality reduction which is undertaken before the SL process.  Brief background on FE techniques used in this study – in next few slides.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 9 Feature Extraction Feature extraction (FE) is a dimensionality reduction technique that extracts a subset of new features from the original set by means of some functional mapping keeping as much information in the data as possible (Fukunaga 1990). Conventional Principal Component Analysis (PCA) is one of the most commonly used feature extraction techniques, that is based on extracting the axes on which the data shows the highest variability (Jolliffe 1986). PCA has the following properties: (1) it maximizes the variance of the extracted features; (2) the extracted features are uncorrelated; (3) it finds the best linear approximation in the mean-squares sense; (4) it maximizes the information contained in the extracted features.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 10 FE example “Heart Disease” 0.1·Age-0.6·Sex-0.73·RestBP-0.33·MaxHeartRate -0.01·Age+0.78·Sex-0.42·RestBP-0.47·MaxHeartRate -0.7·Age+0.1·Sex-0.43·RestBP+0.57·MaxHeartRate 100% Variance covered 87% 60% 67%

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 11 PCA- and LDA-based Feature Extraction  Experimental studies with these FE techniques and basic SL techniques: Tsymbal et al., FLAIRS’02; Pechenizkiy et al., AI’05 Use of class information in FE process is crucial for many datasets: Class-conditional FE can result in better classification accuracy while solely variance-based FE has no effect on or deteriorates the accuracy. No superior technique, but nonparametric approaches are more stables to various dataset characteristics

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 12 Experiment design  WEKA3 environment: Data Mining Software in Java: –  10 medical datasets: –next slide.  Classification algorithms: – k NN, Naïve Bayes, C4.5.  Feature Extraction techniques: –PCA, PAR, NPAR – 0.85% variance threshold.  Artificially imputed class noise: –0% - 20%; 2% step  Evaluation: –accuracy averaged over 30 test runs of Monte-Carlo cross validation for each sample; –30% - test set; 70% - used for forming a train set out of which 0%-20% have artificially corrupted class label.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 13 Datasets Characteristics dataset instancesfeatures classes contractions laryngeal laryngeal laryngeal rds weaning voice voice Further information on these datasets and the datasets themselves are available at

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 14 Classification Accuracy vs. Imputed Class Noise

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 15 Classification Error Increase due to Class Noise k NN (k Nearest Neigbour) Naïve Bayes C4.5 decision tree

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 16 Typical behavior of SL with(-out) FE * laryngeal1 dataset k Nearest Neigbour Naïve Bayes C4.5 decision tree

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 17 Summary and Conclusions  Class noise affects SL with most of considered datasets  FE can significantly increase the accuracy of SL –producing better feature space and fighting “the curse of dimensionality”.  In this study we showed that applying FE for SL –decreases the negative effect of class noise in the data;  Directions of further research: –the comparison of FE techniques with other dimensionality reduction and instance selection techniques; –the comparison of FE with filter approaches for class noise elimination.

IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy, S. Puuronen, A. Tsymbal and O. Pechenizkiy 18 Contact Info Mykola Pechenizkiy Department of Mathematical Information Technology, University of Jyväskylä, FINLAND THANK YOU! MS Power Point slides of this and other recent talks and full texts of selected publications are available online at: