Usman Roshan Machine Learning, CS 698

Slides:



Advertisements
Similar presentations
Contingency Table Analysis Mary Whiteside, Ph.D..
Advertisements

Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Chapter18 Determining and Interpreting Associations Among Variables.
CHI-SQUARE TEST OF INDEPENDENCE
Chi Square Test Dealing with categorical dependant variable.
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Spearman Rank-Order Correlation Test
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Example of Simple and Multiple Regression
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Discriminant Analysis
Two Variable Statistics
Multinomial Distribution
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 17 l Chi-Squared Analysis: Testing for Patterns in Qualitative Data.
CHI SQUARE TESTS.
Copyright © 2010 Pearson Education, Inc. Slide
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Chapter 11: Additional Topics Using Inferences 11.1 – Chi-Square: Tests of Independence 11.2 – Chi-Square: Goodness of Fit 11.3 – Testing a Single Variance.
Dimensionality reduction
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Genome-wide association studies
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Applied statistics Usman Roshan.
Applied statistics Usman Roshan.
Parallel chi-square test
Background on Classification
Trees, bagging, boosting, and stacking
Making Use of Associations Tests
Usman Roshan Machine Learning
Roberto Battiti, Mauro Brunato
Feature selection Usman Roshan.
Usman Roshan Machine Learning
Multivariate Methods Berlin Chen
Feature Selection Methods
Presentation transcript:

Usman Roshan Machine Learning, CS 698 Feature selection Usman Roshan Machine Learning, CS 698

What is feature selection? Consider our training data as a matrix where each row is a vector and each column is a dimension. For example consider the matrix for the data x1=(1, 10, 2), x2=(2, 8, 0), and x3=(1, 9, 1) We call each dimension a feature or a column in our matrix.

Feature selection Useful for high dimensional data such as genomic DNA and text documents. Methods Univariate (looks at each feature independently of others) Pearson correlation coefficient F-score Chi-square Signal to noise ratio And more such as mutual information, relief Multivariate (considers all features simultaneously) Dimensionality reduction algorithms Linear classifiers such as support vector machine Recursive feature elimination

Feature selection Methods are used to rank features by importance Ranking cut-off is determined by user Univariate methods measure some type of correlation between two random variables. We apply them to machine learning by setting one variable to be the label (yi) and the other to be a fixed feature (xij for fixed j)

Pearson correlation coefficient Measures the correlation between two variables Formulas: Covariance(X,Y) = E((X-μX)(Y-μY)) Correlation(X,Y)= Covariance(X,Y)/σXσY Pearson correlation = The correlation r is between -1 and 1. A value of 1 means perfect positive correlation and -1 in the other direction

Pearson correlation coefficient From Wikipedia

F-score From Lin and Chen, Feature extraction, 2006

Chi-square test Contingency table We have two random variables: Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Observed=c4 Expected=X4 Observed=c3 Expected=X3 Label=1 Observed=c2 Expected=X2 Observed=c1 Expected=X1 Label=0 Feature=B Feature=A

Signal to noise ratio Difference in means divided by difference in standard deviation between the two classes S2N(X,Y) = (μX - μY)/(σX – σY) Large values indicate a strong correlation

Multivariate feature selection Consider the vector w for any linear classifier. Classification of a point x is given by wTx+w0. Small entries of w will have little effect on the dot product and therefore those features are less relevant. For example if w = (10, .01, -9) then features 0 and 2 are contributing more to the dot product than feature 1. A ranking of features given by this w is 0, 2, 1.

Multivariate feature selection The w can be obtained by any of linear classifiers we have see in class so far A variant of this approach is called recursive feature elimination: Compute w on all features Remove feature with smallest wi Recompute w on reduced data If stopping criterion not met then go to step 2

Feature selection in practice NIPS 2003 feature selection contest Contest results Reproduced results with feature selection plus SVM Effect of feature selection on SVM Comprehensive gene selection study comparing feature selection methods Ranking genomic causal variants with SVM and chi-square

Limitations Unclear how to tell in advance if feature selection will work Only known way is to check but for very high dimensional data (at least half a million features) it helps most of the time How many features to select? Perform cross-validation