Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn.

http://www.isis.ecs.soton.ac.uk Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn

Overview What is a Random Forest? Why do Relevance Identification? Estimating Feature Importance with a Random Forest Node Complexity Compensation Employing Feature Relevance Extension to Feature Selection

Random Forest Combination of base learners using Bagging Uses CART-based decision trees

Random Forest (cont...) Optimises split using Information Gain Selects feature randomly to perform each split Implicit Feature Selection of CART is removed

Feature Relevance: Ranking Analyse Features individually Measures of Correlation to the target Feature is relevant if: Assumes no feature interaction Fails to identify relevant features in parity problem

Feature Relevance: Subset Methods Use implicit feature selection of decision tree induction Wrapper methods Subset search methods Identifying Markov Blankets Feature is relevant if:

Relevance Identification using Average Information Gain Can identify feature interaction Reliability dependant upon node composition Irrelevant features give non-zero relevance

Node Complexity Compensation Some nodes are easier to split Requires each sample to be weighted by some measure of node complexity Data projected on to one-dimensional space For Binary Classification:

Unique & Non-Unique Arrangements Some arrangements are reflections (non- unique) Some arrangements are symmetrical about their centre (unique)

Node Complexity Compensation (cont…) niAuAu OO OE EO 0 EE Au - No. Unique Arrangements

Information Gain Density Functions Node Complexity improves measure of average IG The effect is visible when examining the IG density functions for each feature These are constructed by building a forest and recording the frequencies of IG values achieved by each feature

Information Gain Density Functions RF used to construct 500 trees on an artificial dataset IG density functions recorded for each feature

Employing Feature Relevance Feature Selection Feature Weighting Random Forest uses a Feature Sampling distribution to select each feature. Distribution can be altered in two ways Parallel: Update during forest construction Two-stage: Fixed prior to forest construction

Parallel Control update rate using confidence intervals. Assume Information Gain values have normal distribution. Statistic has a Student’s t distribution with n-1 degrees of freedom Maintain most uniform distribution within confidence bounds

Convergence Rates No. Features Av. Tree Size WBC958.3 Votes1640.4 Ionosphere3470.9 Friedman1077.2 Pima8269.3 Sonar6081.6 Simple957.3

Results Data SetRFCI2S CART WBC0.02260.02590.0226 Sonar0.16570.14620.1710 Votes0.06500.04930.0432 Pima0.23430.23940.2474 Ionosphere0.07250.06810.0661 Friedman0.18650.16900.1490 Simple0.09370.04500.0270 90% of data used for training, 10% for testing Forests of 100 trees were tested and averaged over 100 trials

Irrelevant Features Average IG is the mean of a non-negative sample. Expected IG of an irrelevant feature is non-zero. Performance is degraded when there is a high proportion of irrelevant features.

Expected Information Gain n L - No. examples in left descendant i L - No. positive examples in left descendant

Expected Information Gain No. positive examples No. negative examples

Bounds on Expected Information Gain Upper can be approximated as Lower Bound is given by

Irrelevant Features: Bounds 100 trees built on artificial dataset Average IG recorded and bounds calculated

Friedman FS: CFS:

Simple FS: CFS:

Results Data SetCFSFWFSFW & FS WBC0.02350.02490.02450.0249 Sonar0.22710.17570.16290.1643 Votes0.03980.04640.06500.0439 Pima0.25230.23120.24920.2486 Ionosphere0.06500.06830.07470.0653 Friedman0.16850.15550.14200.1370 Simple0.16530.03930.02830.0303 90% of data used for training, 10% for testing Forests of 100 trees were tested and averaged over 100 trials 100 trees constructed for feature evaluation in each trial

Summary Node complexity compensation improves measure of feature relevance by examining node composition Feature sampling distribution can be updated using confidence intervals to control the update rate Irrelevant features can be removed by calculating their expected performance

Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn.

Similar presentations

Presentation on theme: "Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn.

Similar presentations

Presentation on theme: "Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn."— Presentation transcript:

Similar presentations

About project

Feedback