Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Kun Zhang,
A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.
Discovering Unrevealed Properties of Probability Estimation Trees: on Algorithm Selection and Performance Explanation Discovering Unrevealed Properties.
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Is Random Model Better? -On its accuracy and efficiency-
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.
Fuzzy Decision Trees Professor J. F. Baldwin. Classification and Prediction For classification the universe for the target attribute is a discrete set.
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Decision Tree.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Cross Domain Distribution Adaptation via Kernel Mapping Erheng Zhong † Wei Fan ‡ Jing Peng* Kun Zhang # Jiangtao Ren † Deepak Turaga ‡ Olivier Verscheure.
Decision Tree Algorithm
Ensemble Learning: An Introduction
Induction of Decision Trees
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Decision Trees Rich Caruana. A Simple Decision Tree ©Tom Mitchell, McGraw Hill, 1997.
Classification.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro.
The joy of Entropy.
Generalized and Heuristic-Free Feature Construction for Improved Accuracy Wei Fan ‡, Erheng Zhong †, Jing Peng*, Olivier Verscheure ‡, Kun Zhang §, Jiangtao.
Decision Tree Learning
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Decision Trees.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning: Ensemble Methods
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Decision Trees (suggested time: 30 min)
Cross Domain Distribution Adaptation via Kernel Mapping
Trees, bagging, boosting, and stacking
Decision Trees Greg Grudic
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Figure 1.1 Rules for the contact lens data.
Introduction to Data Mining, 2nd Edition by
Classifiers Fujinaga.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Classifiers Fujinaga.
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Presentation transcript:

Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun Yat-sen University $ Xavier University of Louisiana # Montclair State University

RDT: Random Decision Tree (Fan et al 03) Encoding data in trees. At each node, an un-used feature is chosen randomly A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen Stop when one of the following happens: A node becomes too small or belong to same class Or the total height of the tree exceeds some limits:

Illustration of RDT B1: {0,1} B2: {0,1} B3: continuous B2: {0,1} B3: continuous B2: {0,1} B3: continuous B3: continous Random threshold 0.3 Random threshold 0.6 B1 chosen randomly B2 chosen randomly B3 chosen randomly

Probabilistic view of decision trees - PETs | Petal.Width< 1.75 setosa 50/0/0 versicolor 0/49/5 virginica 0/1/45 Petal.Length< 2.45 P( setosa |x,θ) = 0 P( versicolor |x,θ) = 49/54 P( virginica |x,θ) = 5/54 Given an example x :, E.g. (C4.5, CART) confidences in the predicted labels the dependence of P(y|x,θ) on θ is non-trivial For example :

Problems of probability estimation via conventional DTs 1.Probability estimates tend to approach the extremes of 1 and Additional inaccuracies result from the small number of examples at a leaf Same probability is assigned to the entire region of space defined by a given leaf. C4.4 (Provost,03) BC44 (Zhang,06), RDT (Fan,03)

PET Algorithms Single or Multiple Model(s) Splitting Criterio n Probability Estimation Method Pruning Strategy Diversity Acquisation C4.5 (Quinlan,93) Single Gain Ratio Frequency Estimation Error- based Pruning N/A C4.4 (Provost,03) Single Gain Ratio Laplace Correction No RDT(Fan,03)Multiple Randomly Chosen Bayesian Averaging No or Depth Constraint Random Manipulation of feature set BaggingPET (Breiman,96) Multiple Gain Ratio Bayesian Averaging No Random Manipulation of training set Popular PET Algorithms

bRDT bRDT is the averaging of RDT and BC44, where RDT is Random Decision Tree and BC44 is Bagged C4.4

Sampling strategy for Task 1 &2 For station Z, negative instances are partitioned into blocks such that the size of each block is Approximately 3 times as that of the positive. ………… …… Positive Negative Block 1 Block n

Task 1 & 2 - Result For V station, row 2 and 3, corresponding to task 1 and 2 The optimal classifiers of task 1 and 2 for station W, X, Y, Z are the same. Thus there s only one row for these 4 stations

Task 1 - ROC

Task 2 - ROC

Task 3 – Feature Expansion Example Three instances with only one feature, A and B are positive while C is negative. A(0.9) B(1.0) C(1.1) Distant (A, B) = Distant (B, C) 0.01 vs Expand A(0.9,0.81,0.64) B(1.0,1.0,0.69) C(1.1,1.21,0.74) Distant (A, B) < Distant (B, C) vs

Task3 – Result of test 3 Parameter-free