ACM SAC’06, DM Track Dijon, France 27.04.06 “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,

Slides:



Advertisements
Similar presentations
Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland
Advertisements

Applications of one-class classification
Florida International University COP 4770 Introduction of Weka.
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity.
IEEE CBMS’06, DM Track Salt Lake City, Utah “Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction” by M. Pechenizkiy,
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
Yue Han and Lei Yu Binghamton University.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo.
An Overview of Machine Learning
Why Data Mining Research Does Not Contribute to Business? Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 Mykola Pechenizkiy Department of Computer Science and Information.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Principal Component Analysis
Knowledge Management Challenges in Knowledge Discovery Systems Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.
Dynamic Integration of Virtual Predictors Vagan Terziyan Information Technology Research Institute, University of Jyvaskyla, FINLAND
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine
A New Subspace Approach for Supervised Hyperspectral Image Classification Jun Li 1,2, José M. Bioucas-Dias 2 and Antonio Plaza 1 1 Hyperspectral Computing.
Studying the Presence of Genetically Modified Variants in Organic Oilseed Rape by using Relational Data Mining Aneta Ivanovska 1, Celine Vens 2, Sašo Džeroski.
1 Comparison of Principal Component Analysis and Random Projection in Text Mining Steve Vincent April 29, 2004 INFS 795 Dr. Domeniconi.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Gary M. Weiss Alexander Battistin Fordham University.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Consensus Group Stable Feature Selection
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Data Mining and Decision Support
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Experience Report: System Log Analysis for Anomaly Detection
Principal Component Analysis (PCA)
Boosted Augmented Naive Bayes. Efficient discriminative learning of
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
A Unifying View on Instance Selection
Principal Component Analysis
Dept. of Computer Science University of Liverpool
CSCI N317 Computation for Scientific Applications Unit Weka
Somi Jacob and Christian Bach
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 1 The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland Seppo Puuronen Dept. of CS and IS University of Jyväskylä Finland Mykola Pechenizkiy Dept. of Mathematical IT University of Jyväskylä Finland ACM SAC’06: DM TrackDijon, FranceApril 23-27, 2006

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 2 Outline  DM and KDD background –KDD as a process, DM strategy  Supervised Learning (SL) –Curse of dimensionality and indirectly relevant features –Feature extraction (FE) as dimensionality reduction  Feature Extraction approaches used: –Conventional Principal Component Analysis –Class-conditional FE: parametric and non-parametric  Sampling approaches used: –Random, Stratified random, kdTree-based selective  Experiments design –Impact of sample reduction on FE for SL  Results and Conclusion

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 3 Knowledge discovery as a process Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Naïve Bayes PCA and LDA Instance selection: Random; Stratified and kd-Tree-based

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 4 CLASSIFICATION New instance to be classified Class Membership of the new instance J classes, n training observations, p features Given n training instances (x i, y i ) where x i are values of attributes and y is class Goal: given new x 0, predict class y 0 Training Set The task of classification Examples: - diagnosis of thyroid diseases; - heart attack prediction, etc.

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 5 Improvement of Representation Space  Curse of dimensionality  drastic increase in computational complexity and classification error with data having a large number of dimensions  Indirectly relevant features

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 6 Extracted features Original features How to construct good RS for SL? What is the effect of sample reduction on the performance of FE for SL?

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 7 FE example “Heart Disease” 0.1·Age-0.6·Sex-0.73·RestBP-0.33·MaxHeartRate -0.01·Age+0.78·Sex-0.42·RestBP-0.47·MaxHeartRate -0.7·Age+0.1·Sex-0.43·RestBP+0.57·MaxHeartRate 100% Variance covered 87% 60% 67%

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 8 PCA- and LDA-based Feature Extraction  Experimental studies with these FE techniques and basic SL techniques: Tsymbal et al., FLAIRS’02; Pechenizkiy et al., AI’05 Use of class information in FE process is crucial for many datasets: Class-conditional FE can result in better classification accuracy while solely variance-based FE has no effect on or deteriorates the accuracy. No superior technique, but nonparametric approaches are more stables to various dataset characteristics

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 9 What is the effect of sample reduction?  Sampling approaches used: –Random sampling (dashed) –Stratified random sampling –kdTree-based sampling (dashed) –Stratified kdTree-based sampling

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 10 Stratified Random Sampling

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 11 Stratified sampling with kd -tree based selection

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 12 Experiment design  WEKA environment  10 UCI datasets  SL: Naïve Bayes  FE: PCA, PAR, NPAR – 0.85% variance threshold  Sampling: RS, stratified RS, kdTree, stratified kdTree  Evaluation: –accuracy averaged over 30 test runs of Monte-Carlo cross validation for each sample –20% - test set; 80% - used for forming a train set out of which 10%- 100% are selected with one of 4 sampling approaches: RS, stratified RS, kd-tree, stratified kd-tree

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 13 Accuracy results If sample size p ≥ 20% then NPAR outperforms other methods; and if p ≥ 30%, NPAR outperforms others even if they use p = 100%. The best p for NPAR depends on sampling method: stratified/ RS p = 70%, kd -tree p = 80%, and stratified + kd -tree p = 60%. PCA is the worst when p is relatively smaller, especially with stratification and kd -tree indexing. PAR and Plain behaves similarly with every sampling approach. In general for p > 30% different sampling approaches have very similar effects.

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 14 Results: kd -Tree sampling with/out stratification Stratification improves kd -tree sampling wrt FE for SL. The figure on the left shows the difference in NB accuracy due to use of RS in comparison with kd -tree based sampling, and the right part – due to use of RS in comparison with kd -tree based sampling with stratification RS – kd -treeRS – stratified kd -tree

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 15 Summary and Conclusions  FE techniques can significantly increase the accuracy of SL –producing better feature space and fighting “the curse of dimensionality”.  With large datasets only part of instances is selected for SL –we analyzed the impact of sample reduction on the process of FE for SL.  The results of our study show that –it is important to take into account both class information and information about data distribution when the sample size to be selected is small; but –the type of sampling approach is not that much important when a large proportion of instances remains for FE and SL; –NPAR approach extracts good features for SL with small #instances (except RS case) in contrast with PCA and PAR approaches.  Limitations of our experimental study: –fairly small datasets, although we think that comparative behavior of sampling and FE techniques wont change dramatically; –experiments only with Naïve Bayes, it is not obvious that the comparative behavior of the techniques would be similar with other SL techniques; –no analysis of complexity issues, selected instances and number of extracted features, effect of noise in attributes and class information.

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 16 Contact Info Mykola Pechenizkiy Department of Mathematical Information Technology, University of Jyväskylä, FINLAND Tel Mobile: Fax: THANK YOU! MS Power Point slides of this and other recent talks and full texts of selected publications are available online at:

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 17 Extra slides

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 18 Datasets Characteristics

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 19 Framework for DM Strategy Selection Pechenizkiy M DM strategy selection via empirical and constructive induction. (DBA’05)

ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy, S. Puuronen and A. Tsymbal 20 Meta-Learning