NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies.

Slides:



Advertisements
Similar presentations
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Advertisements

Chapter 5 Multiple Linear Regression
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Variable - / Feature Selection in Machine Learning (Review)
Minimum Redundancy and Maximum Relevance Feature Selection
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Lecture 4: Embedded methods
Feature Selection Presented by: Nafise Hatamikhah
Exploratory Data Mining and Data Preparation
Sparse vs. Ensemble Approaches to Supervised Learning
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Feature Selection Lecture 5
Feature Selection Bioinformatics Data Analysis and Tools
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Feature selection methods Isabelle Guyon IPAM summer school on Mathematics in Brain Imaging. July 2008.
Ensemble Learning (2), Tree and Forest
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Whole Genome Expression Analysis
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
A Comparative Study on Variable Selection for Nonlinear Classifiers C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Usman Roshan Machine Learning, CS 698
CZ5225: Modeling and Simulation in Biology Lecture 8: Microarray disease predictor-gene selection by feature selection methods Prof. Chen Yu Zong Tel:
Lecture 5: Causality and Feature Selection Isabelle Guyon
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
Presented by: Isabelle Guyon Machine Learning Research.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Learning Mid-Level Features For Recognition
Usman Roshan Machine Learning
COMP61011 Foundations of Machine Learning Feature Selection
Machine Learning Feature Creation and Selection
Feature selection Usman Roshan.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Feature Selection Ioannis Tsamardinos Machine Learning Course, 2006
Usman Roshan Machine Learning
Machine Learning in Practice Lecture 22
Feature Selection Methods
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies

7:30-8:00: Welcome and introduction to the problem of feature/variable selection - Isabelle Guyon - 8:00-8:20 a.m. Dimensionality Reduction via Sparse Support Vector Machines - Jinbo Bi, Kristin P. Bennett, Mark Embrechts and, Curt Breneman - 8:20-8:40 a.m. Feature selection for non-linear SVMs using a gradient descent algorithm - Olivier Chapelle and Jason Weston - 8:40-9:00 a.m. When Rather Than Whether: Developmental Variable Selection - Melissa Dominguez - 9:00-9:20 a.m. Pause, free discussions. 9:20-9:40 How to recycle your SVM code to do feature selection - Andre Elisseeff and Jason Weston - 9:40-10:00 Lasso-type estimators for variable selection - Yves Grandvalet and Stéphane Canu - 10:00-10:30 a.m. Discussion. What are the various statements of the variable selection problem? 4:00-4:20 p.m. Using DRCA to see the effects of variable combinations on classifiers - Ofer Melnik - 4:20-4:40 p.m.Feature selection in the setting of many irrelevant features - Andrew Y. Ng and Michael I. Jordan 4:40-5:00 p.m. Relevant coding and information bottlenecks: A principled approach to multivariate feature selection - Naftali Tishby - 5:00-5:20 p.m. Learning discriminative feature transforms may be an easier problem than feature selection - Kari Torkkola 5:20-5:30 p.m. Pause. 5:30-6:30 p.m. Discussion. Organization of a future workshop with benchmark. 6:30-7:00 p.m. Impromptu talks. Schedule

Outline Relevance to the “concept” Usefulness to the predictor Vocabulary Variable vs. feature

Relevance to the concept System or “Concept” Output 1- Eliminate distracters 2 - Rank (combinations of ) relevant variables Objectives

A big search problem Definition of distracter: if tweaked, no change in input/output relationship for any position of all other knobs. “Exhaustive search”: Check all knob positions. One knob at a time does not work if one variable alone does not control the output. For continuous variables: need experimental design. Greedy “query” strategies.

More difficulties Noisy/bad data (imprecise knobs, reading errors, systematic errors). Lack of data: cannot perform optimum experimental design. Probabilistic definition of a distracter: P(distractor)=fraction of times everything else equal, a change in the position of the knob does not result in a change in output. Continuous case: need to measure state space areas in which a knob has little or no effect.

Yet harder Output Uncontrollable variables Controllable variables Unobservable variables

x1x y=[x 1 +2(x 2 -1)]  (x 1 )  (x 2 ) 00 +  2 +  Tiny example x2x2 x3x x1x1 x2x2 x2x2 x3x

2,0 Theory and practice x1x1 x2x2 x3x3 x2x2 x3x x1x1 x2x2 x3x3 y y y x 2 =0 x 2 =1 x 2 =2 x 1 =0 x 1 =1 x 1 =2 Any x 3 0,0 0,1 0,2 1,0 1,1 1,2 2,1 2,2 x 1,x x1x1 x2x2

Use of a predictor If the system is observed only through given examples of input/outputs or if it is expensive to get a lot of data points: build a predictor. Define criterion of relevance, e.g.  (f(x 1, x 2, x 3 )-f(x 1, x 2 )) 2 dP(x 3 |x 1,x 2 ) and approximate it using empirical data.

Relevance to the concept: weak and strong relevance Kohavi et al.: classification problem. x i is strongly relevant if its removal yields a deterioration of the performance of the Bayes Optimum Classifier. x i is weakly relevant if not strongly relevant and there exists a subset of variables S such that the performance on S  {x i } is better than the performance on S. Features that are neither strongly or weakly relevant are irrelevant.

Usefulness to the predictor New objective: make good predictions. Find a subset of variables that min. an estimate of the generalization error E. Find a subset of size n or less that min. E. Find a subset of min. size for which E  E_all_var +  Model selection pb.: CV, perf. bounds, etc.

Relevance to the concept vs. usefulness to the predictor A relevant variable may not contribute to getting a better predictor (e.g. case of redundant variables). Reciprocally, a variable that helps improving the performance of a predictor may be irrelevant (e.g. a bias value).

Algorithms Filters vs. wrappers. Exhaustive search. Backward elimination vs. forward selection. Other greedy search methods (e.g. best first, beam search, compound operators). Organization of results. Overfitting problems. H64807 R55310 T62947 H08393 T62947 U09564 R88740 M59040 R88740 T94579 H81558 T64012 T86444 H06524 H81558 H06524 U19969 H06524 T94579 T58861 M59040 L08069 H08393 M82919 L03840 U19969 D14812 M82919 L

Classical statistics (compare with random data). Machine Learning (predict accuracy w. test data). Validation with other data (e.g. medical literature). Validation Genes called significant Estimated falsely significant genes

Epilogue Do not confuse relevance to the concept and usefulness to the predictor. Do not confuse correlation and causality. Q1: what are good statements of the variable/feature selection problem? Q2: what are good benchmarks?