Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah - 18548059 Supervisor: Dr. Sid Ray.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Relevance Feedback and User Interaction for CBIR Hai Le Supervisor: Dr. Sid Ray.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

DECISION TREES. Decision trees  One possible representation for hypotheses.

Face Recognition and Biometric Systems Eigenfaces (2)

O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

Ensemble Learning: An Introduction

1 Efficient Algorithms for Non-Parametric Clustering With Clutter Weng-Keen Wong Andrew Moore.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.

1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

Fast Isocontouring For Improved Interactivity Chandrajit L. Bajaj Valerio Pascucci Daniel R. Schikore.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.

On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.

Introduction to machine learning

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Machine Learning CS 165B Spring 2012

Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain

Image Pattern Recognition The identification of animal species through the classification of hair patterns using image pattern recognition: A case study.

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.

Issues with Data Mining

Presented by Tienwei Tsai July, 2005

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Learning the threshold in Hierarchical Agglomerative Clustering

Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Computer Graphics and Image Processing (CIS-601).

1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.

Automated Solar Cavity Detection

Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression

Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Graphs and MSTs Sections 1.4 and 9.1. Partial-Order Relations Everybody is not related to everybody. Examples? Direct road connections between locations.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

CS Machine Learning Instance Based Learning (Adapted from various sources)

Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.

Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

k-Nearest neighbors and decision tree

Boosted Augmented Naive Bayes. Efficient discriminative learning of

K Nearest Neighbor Classification

Feature Selection To avid “curse of dimensionality”

A Unifying View on Instance Selection

ECE539 final project Instructor: Yu Hen Hu Fall 2005

Kruskal’s Algorithm for finding a minimum spanning tree

Ensemble learning Reminder - Bagging of Trees Random Forest

Minimum Spanning Trees (MSTs)

Presentation transcript:

Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray

Outline Introduction  Pattern Recognition  Feature subset selection Current methods Proposed method IFS Results Conclusion

Introduction: Pattern Recognition The classification of objects into groups by learning from a small sample of objects  Apples and strawberries:  Classes: apples and strawberries  Features: colour, size, weight, texture Applications:  Character recognition  Voice recognition  Oil mining  Weather prediction  …

Introduction: Pattern Recognition Pattern representation  Measuring and recording features  Size, colour, weight, texture…. Feature set reduction  Reducing the number of features used  Selecting a subset  Transformations Classification  The resulting features are used for classification of unknown objects

Introduction: Feature subset selection Can be split into two processes:  Feature subset searching  Not usually feasible to exhaustively try all feature subset combinations  Criterion function  Main issue of feature subset selection (Jain et al. 2000)  Focus of our research

Current methods Euclidean distance   Statistical properties of the classes are not considered Mahalanobis distance   Variances and co-variances of the classes are taken into account

Limitations of Current Methods

Friedman and Rafsky’s two sample test Minimum spanning tree approach for determining whether two sets of data originate from the same source A MST is built across the data from two sources, edges which connect samples of different data sets are removed If many edges are removed, then the two sets of data are likely to originate from the same source

Friedman and Rafsky’s two sample test Method can be used as a criterion function MST built across the sample points Edges which connect samples of different classes are removed A good subset is one that provides discriminatory information about the classes, therefore the fewer edges removed the better

Limitations of Friedman and Rafsky’s technique

Our Proposed Method Use the number of edges and edge lengths in determining the suitability of a subset A good subset will have a large number of short edges connecting samples of the same class And a small number of long edges connecting samples of different classes

Our Proposed Method We experimented with using average edge length and weighted average - weighted average was expected to perform better

IFS - Interactive Feature Selector Developed to allow users to experiment with various feature selection methods Automates the execution of experiments Allows visualisation of data sets, and results Extensible, developers can add criterion functions, feature selectors and classifiers easily into the system

IFS - Screenshot

Experimental Framework Data setNo. SamplesNo. FeatsNo. Classes Iris15043 Crab20072 Forensic Glass21497 Diabetes33282 Character Synthetic75075

Experimental Framework Spearman’s rank correlation  A good criterion function will have good correlation with the classifier, subsets which are ranked high should achieve high accuracy levels Subset chosen  Final subsets selected by criterion functions are compared to the optimal subset chosen by the classifier Time

Forensic glass data set results

Synthetic data set

Algorithm completion times

Algorithm complexities K-NN  MST criterion functions  Mahalanobis distance  Euclidean distance 

Conclusion MST based approaches generally achieved higher accuracy values and rank correlation - in particular with the K-NN classifier Criterion function based on Friedman and Rafsky’s two sample test performed the best

Conclusion MST approaches are closely related with the KNN classifier Mahalanobis criterion function suited to data sets with Gaussian distributions and strong feature interdependence Future work:  Construct a classifier based on KNN, which gives closer neighbours higher priority  Improve IFS