Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun Yat-sen University $ Xavier University of Louisiana # Montclair State University
RDT: Random Decision Tree (Fan et al 03) Encoding data in trees. At each node, an un-used feature is chosen randomly A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen Stop when one of the following happens: A node becomes too small or belong to same class Or the total height of the tree exceeds some limits:
Illustration of RDT B1: {0,1} B2: {0,1} B3: continuous B2: {0,1} B3: continuous B2: {0,1} B3: continuous B3: continous Random threshold 0.3 Random threshold 0.6 B1 chosen randomly B2 chosen randomly B3 chosen randomly
Probabilistic view of decision trees - PETs | Petal.Width< 1.75 setosa 50/0/0 versicolor 0/49/5 virginica 0/1/45 Petal.Length< 2.45 P( setosa |x,θ) = 0 P( versicolor |x,θ) = 49/54 P( virginica |x,θ) = 5/54 Given an example x :, E.g. (C4.5, CART) confidences in the predicted labels the dependence of P(y|x,θ) on θ is non-trivial For example :
Problems of probability estimation via conventional DTs 1.Probability estimates tend to approach the extremes of 1 and Additional inaccuracies result from the small number of examples at a leaf Same probability is assigned to the entire region of space defined by a given leaf. C4.4 (Provost,03) BC44 (Zhang,06), RDT (Fan,03)
PET Algorithms Single or Multiple Model(s) Splitting Criterio n Probability Estimation Method Pruning Strategy Diversity Acquisation C4.5 (Quinlan,93) Single Gain Ratio Frequency Estimation Error- based Pruning N/A C4.4 (Provost,03) Single Gain Ratio Laplace Correction No RDT(Fan,03)Multiple Randomly Chosen Bayesian Averaging No or Depth Constraint Random Manipulation of feature set BaggingPET (Breiman,96) Multiple Gain Ratio Bayesian Averaging No Random Manipulation of training set Popular PET Algorithms
bRDT bRDT is the averaging of RDT and BC44, where RDT is Random Decision Tree and BC44 is Bagged C4.4
Sampling strategy for Task 1 &2 For station Z, negative instances are partitioned into blocks such that the size of each block is Approximately 3 times as that of the positive. ………… …… Positive Negative Block 1 Block n
Task 1 & 2 - Result For V station, row 2 and 3, corresponding to task 1 and 2 The optimal classifiers of task 1 and 2 for station W, X, Y, Z are the same. Thus there s only one row for these 4 stations
Task 1 - ROC
Task 2 - ROC
Task 3 – Feature Expansion Example Three instances with only one feature, A and B are positive while C is negative. A(0.9) B(1.0) C(1.1) Distant (A, B) = Distant (B, C) 0.01 vs Expand A(0.9,0.81,0.64) B(1.0,1.0,0.69) C(1.1,1.21,0.74) Distant (A, B) < Distant (B, C) vs
Task3 – Result of test 3 Parameter-free