Download presentation
Presentation is loading. Please wait.
Published byPrimrose Alison Davidson Modified over 9 years ago
1
http://www.isis.ecs.soton.ac.uk Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn
2
Overview What is a Random Forest? Why do Relevance Identification? Estimating Feature Importance with a Random Forest Node Complexity Compensation Employing Feature Relevance Extension to Feature Selection
3
Random Forest Combination of base learners using Bagging Uses CART-based decision trees
4
Random Forest (cont...) Optimises split using Information Gain Selects feature randomly to perform each split Implicit Feature Selection of CART is removed
5
Feature Relevance: Ranking Analyse Features individually Measures of Correlation to the target Feature is relevant if: Assumes no feature interaction Fails to identify relevant features in parity problem
6
Feature Relevance: Subset Methods Use implicit feature selection of decision tree induction Wrapper methods Subset search methods Identifying Markov Blankets Feature is relevant if:
7
Relevance Identification using Average Information Gain Can identify feature interaction Reliability dependant upon node composition Irrelevant features give non-zero relevance
8
Node Complexity Compensation Some nodes are easier to split Requires each sample to be weighted by some measure of node complexity Data projected on to one-dimensional space For Binary Classification:
9
Unique & Non-Unique Arrangements Some arrangements are reflections (non- unique) Some arrangements are symmetrical about their centre (unique)
10
Node Complexity Compensation (cont…) niAuAu OO OE EO 0 EE Au - No. Unique Arrangements
11
Information Gain Density Functions Node Complexity improves measure of average IG The effect is visible when examining the IG density functions for each feature These are constructed by building a forest and recording the frequencies of IG values achieved by each feature
12
Information Gain Density Functions RF used to construct 500 trees on an artificial dataset IG density functions recorded for each feature
13
Employing Feature Relevance Feature Selection Feature Weighting Random Forest uses a Feature Sampling distribution to select each feature. Distribution can be altered in two ways Parallel: Update during forest construction Two-stage: Fixed prior to forest construction
14
Parallel Control update rate using confidence intervals. Assume Information Gain values have normal distribution. Statistic has a Student’s t distribution with n-1 degrees of freedom Maintain most uniform distribution within confidence bounds
15
Convergence Rates No. Features Av. Tree Size WBC958.3 Votes1640.4 Ionosphere3470.9 Friedman1077.2 Pima8269.3 Sonar6081.6 Simple957.3
16
Results Data SetRFCI2S CART WBC0.02260.02590.0226 Sonar0.16570.14620.1710 Votes0.06500.04930.0432 Pima0.23430.23940.2474 Ionosphere0.07250.06810.0661 Friedman0.18650.16900.1490 Simple0.09370.04500.0270 90% of data used for training, 10% for testing Forests of 100 trees were tested and averaged over 100 trials
17
Irrelevant Features Average IG is the mean of a non-negative sample. Expected IG of an irrelevant feature is non-zero. Performance is degraded when there is a high proportion of irrelevant features.
18
Expected Information Gain n L - No. examples in left descendant i L - No. positive examples in left descendant
19
Expected Information Gain No. positive examples No. negative examples
20
Bounds on Expected Information Gain Upper can be approximated as Lower Bound is given by
21
Irrelevant Features: Bounds 100 trees built on artificial dataset Average IG recorded and bounds calculated
22
Friedman FS: CFS:
23
Simple FS: CFS:
24
Results Data SetCFSFWFSFW & FS WBC0.02350.02490.02450.0249 Sonar0.22710.17570.16290.1643 Votes0.03980.04640.06500.0439 Pima0.25230.23120.24920.2486 Ionosphere0.06500.06830.07470.0653 Friedman0.16850.15550.14200.1370 Simple0.16530.03930.02830.0303 90% of data used for training, 10% for testing Forests of 100 trees were tested and averaged over 100 trials 100 trees constructed for feature evaluation in each trial
25
Summary Node complexity compensation improves measure of feature relevance by examining node composition Feature sampling distribution can be updated using confidence intervals to control the update rate Irrelevant features can be removed by calculating their expected performance
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.