Information Management course

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Fast Algorithms For Hierarchical Range Histogram Constructions
Data Mining Classification: Alternative Techniques
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Decision Tree Algorithm
Ensemble Learning: An Introduction
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Support Vector Machines
Knowledge Acquisition from Game Records Takuya Kojima, Atsushi Yoshikawa Dept. of Computer Science and Information Engineering National Dong Hwa University.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Benk Erika Kelemen Zsolt
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Gene-Markers Representation for Microarray Data Integration Boston, October 2007 Elena Baralis, Elisa Ficarra, Alessandro Fiori, Enrico Macii Department.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Ensemble Classifiers.
CS 9633 Machine Learning Support Vector Machines
Information Management course
Data Science Algorithms: The Basic Methods
Presented by Jingting Zeng 11/26/2007
Session 7: Face Detection (cont.)
Trees, bagging, boosting, and stacking
Information Management course
Information Management course
Dipartimento di Ingegneria «Enzo Ferrari»,
Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006
Machine Learning: Lecture 3
Discriminative Frequent Pattern Analysis for Effective Classification
COSC 4335: Other Classification Techniques
Redundant Feature Elimination for Multi-Class Problems
Estimating Maximal Information Amount under Interval Uncertainty
COSC 4368 Machine Learning Organization
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Using Manifold Structure for Partially Labeled Classification
SVMs for Document Ranking
Stance Classification of Ideological Debates
Presentation transcript:

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07: 06/11/2012

L. C. Molina, L. Belanche, A. Nebot “Feature Selection Algorithms: A Survey and Experimental Evaluation”, IEEE ICDM (2002) and L. Belanche, F. Gonzales “Review and Evaluation of Feature Selection Algorithms in Synthetic Problems”, arXiv – available online (2011) 2 2 2 2

Feature Selection Algorithms Introduction Relevance of a feature Algorithms Description of fundamental FSAs Empirical evaluation Experimental evaluation 3 3

Introduction The Feature selection problem: Given a set of candidate features, select a subset defined by one of the following approaches: Having a fixed size and maximizing an evaluation measure; Of smaller size that satisfies a constraint on an evaluation measure Best tradeoff between size and evaluation measure FSA are motivated by a definition of relevance (not obvious) FSAs can be classified according to their output Giving a weighted linear order of features Giving a subset of original features (the one we focus on) N.B. (2) is (1) with binary weighting 4 4

Feature Selection Algorithms Introduction Relevance of a feature Algorithms Description of fundamental FSAs Empirical evaluation Experimental evaluation 5 5

Relevance with respect to an objective Relevance must be defined with respect to an objective: assuming the objective is classification and the set of features is X: A feature x ∊ X is relevant to an objective c() if there exist two examples A and B that differ only in the value of x c(A) ≠ c(B) i.e. there are two elements that can be classified correctly only thanks to x However, our datasets are samples in the feature space: A feature x ∊ X is strongly relevant to the sample S to an objective c() if there exist two elements A and B of S that A feature x is weakly relevant if there exists a X' ⊂ X with x ∊ X', where x is strongly relevant with respect to S 6 6

Relevance as a complexity measure Idea: given a data sample S and an objective c(), define r(S,c) as the smallest number of relevant features to c() such that the error in S is the least possible for the inducer i.e. the smallest number of features required by a specific inducer to reach optimum performance in modeling c() using S Examples of such complexity measures: Incremental usefulness: after choosing X', x is useful if the accuracy of c() computation is higher on x U X' than on X' Entropic relevance: compute the amount of (Shannon) entropy in the dataset before and after the removal of a feature 7 7