Feature Selection Which features work best? One way to rank features: –Make a contingency table for each F –Compute abs ( log ( ad / bc ) ) –Rank the log values ab cd F Madison Hamilton Not F
49 Ranked Features
Linear Discriminant Analysis A technique for classifying data Available in the R statistics package Input: –Table of training data –Table of test data Output: –Classification of test data
Linear Discriminant Analysis: example Input training data: upon 2-letter 3-letter M M M M M H H H H H upon 2-letter 3-letter Input test data: Ouput: m m m m h
Some more LDA results 12 to Madison: –upon, 1-letter, 2-letter –upon, enough, there –upon, there 11 to Madison: –upon, 2-letter, 3-letter < 6 to Madison –2-letter, 3-letter –there, 1-letter, 2-letter
Some more LDA results ClassOutput of lda Features tested 12 Mm m m m m m upon apt Mm m m m m m to upon Mm m m m m m h m m m m mon there Mh m m m m m m m m m m man by Mm m m m m m h m m m h mparticularly probability M m m m m m m h h h m h malso of M m m m h m m h h m m h malways of M h m m h m h h m h m m mof work M m m h m m m h h m h h hthere language M m h m h h m h h h m m hconsequently direction 5 11
Feature Selection Part II Which combinations of features are best for LDA? Are the features independent? We did some random sampling: –Choose features a, b, c, d –Compute x = log a + log b + log x + log d –Compute y = log (a+b+c+d) –Plot x versus y
Selecting more features What happens when more than 4 features are used for the lda? Greedy approach –Add features one at a time from two lists –Perform lda on all features chosen so far Is overfitting a problem?
First few greedy iterations 6 M 6 H h m h h m h m m h m h m 2-letter words 12 M 0 H m m m m m m upon 12 M 0 H m m m m m m 1-letter words 12 M 0 H m m m m m m 5-letter words 11 M 1 H m m m m m h m m m m m m 4-letter words 12 M 0 H m m m m m m there 12 M 0 H m m m m m m enough 11 M 1 H m m m m m m h m m m m m whilst 12 M 0 H m m m m m m 3-letter words 11 M 1 H m m m m m m h m m m m m 15-letter words