Download presentation
Presentation is loading. Please wait.
Published byVictor Thomas Modified over 9 years ago
1
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department of Computer Sciences University of Wisconsin, Madison USA
2
Main Contribution Greedy tree learning algorithms suffer from myopia This is remedied by Lookahead, which is computationally very expensive We present an approach to efficiently address the myopia of tree learners
3
Task Setting Given: m examples of n Boolean attributes each, labeled according to a function f over some subset of the n attributes Do: Learn the Boolean function f
4
TDIDT Algorithm Top Down Induction of Decision Trees Greedy algorithm Chooses feature that locally optimizes some measure of “purity” of class labels Information Gain Gini Index
5
TDIDT Example +110 −010 −101 +000 Valuex3x3 x2x2 x1x1
6
x1x1 x 1 =0 (2+, 1−) x 1 =1 (1−) ─ x 2 =0 (1+) x 2 =1 (1+,1−) x2x2 TDIDT Example
7
Outline Introduction to TDIDT algorithm Myopia and “Hard” Functions Skewing Experiments with Skewing Algorithm Sequential Skewing Experiments with Sequential Skewing Conclusions and Future Work
8
Myopia and Correlation Immunity For certain Boolean functions, no variable has “gain” according to standard purity measures (e.g., entropy, Gini) No variable correlated with class In cryptography, correlation immune Given such a target function, every variable looks equally good (bad) In an application, the learner will be unable to differentiate between relevant and irrelevant variables
9
A Correlation Immune Function f=x 1 x 2 x2x2 x1x1 011 101 110 000
10
Examples In Drosophila, Survival is an exclusive-or function of Gender and the expression of the SxL gene In drug binding (Ligand-Domain), binding may have an exclusive-or subfunction of Ligand Charge and Domain Charge
11
Learning Hard Functions Standard method of learning hard functions with TDIDT: depth-k Lookahead O(mn 2 k+1 -1 ) for m examples in n variables Can we devise a technique that allows TDIDT algorithms to efficiently learn hard functions?
12
Key Idea Correlation immune functions aren’t hard – if the data distribution is significantly different from uniform
13
Example Uniform distribution can be sampled by setting each variable (feature) independently of all others, with probability 0.5 of being set to 1 Consider a distribution where each variable has probability 0.75 of being set to 1.
14
Example x1x1 x2x2 x3x3 f 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1
15
x1x1 x2x2 x3x3 fWeightSum 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1
16
Example x1x1 x2x2 x3x3 fSum 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1
17
Example x1x1 x2x2 x3x3 fSum 00 0 0 1 01 0 1 1
18
Example x1x1 x2x2 x3x3 fSum 10 0 1 1 11 0 0 1
19
Example x1x1 x2x2 x3x3 fWeight 00 00 01 01 10 01 11 00
20
Key Idea Given a large enough sample and a second distribution sufficiently different from the first, we can learn functions that are hard for TDIDT algorithms under the original distribution.
21
Issues to Address How can we get a “sufficiently different” distribution? Our approach: “skew” the given sample by choosing “favored settings” for the variables Not-large-enough sample effects? Our approach: Average “goodness” of any variable over multiple skews
22
Skewing Algorithm For T trials do Choose a favored setting for each variable Reweight the sample Calculate entropy of each variable split under this weighting For each variable that has sufficient gain, increment a counter Split on the variable with the highest count
23
Experiments ID3 vs. ID3 with Skewing (ID3 to avoid issues to do with parameters, pruning, etc.) Synthetic Propositional Data Examples of 30 Boolean variables Target Boolean functions of 2-6 of these variables Randomly chosen targets and randomly chosen hard targets UCI Datasets ( Perlich et al, JMLR 2003 ) 10 fold cross validation Evaluation metric: Weighted Accuracy = average of accuracy over positives and negatives
24
Results (3-variable Boolean functions) Random functionsHard functions
25
Results (4-variable Boolean functions) Random functionsHard functions
26
Results (5-variable Boolean functions) Random functionsHard functions
27
Results (6-variable Boolean functions) Random functionsHard functions
28
Current Shortcomings Sensitive to noise, high-dimensional data Very small signal on the hardest CI functions (parity) given more than 3 relevant variables Only very small gains on real-world datasets attempted so far Few correlation immune functions in practice? Noise, dimensionality, not enough examples?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.