Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Random Forest Predrag Radenković 3237/10
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Tree Rong Jin. Determine Milage Per Gallon.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Classification Continued
Three kinds of learning
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Introduction to Directed Data Mining: Decision Trees
Overview DM for Business Intelligence.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Chapter 9 – Classification and Regression Trees
K Nearest Neighbors Classifier & Decision Trees
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Linear Regression. Fitting Models to Data Linear Analysis Decision Trees.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CLASSIFICATION: Ensemble Methods
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Summary „Rough sets and Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Distinguishing the Forest from the Trees 2006 CAS Ratemaking Seminar Richard Derrig, PhD, Opal Consulting Louise Francis, FCAS,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Decision Trees (suggested time: 30 min)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Chapter 6 Classification and Prediction
Predict House Sales Price
ECE 471/571 – Lecture 12 Decision Tree.
Classification and Prediction
(classification & regression trees)
Machine Learning to Predict Experimental Protein-Ligand Complexes
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey

Background D. J. Slate and L. R. Atkin, “Chess 4.5 – the Northwestern University chess program”. In P. W. Frey (Ed.), Chess Skill in Man and Machine, Springer Verlag, 1977, 1978, P.W. Frey,”Algorithmic Strategies for Improving the Performance of Game-Playing Programs”. In D. Farmer, A. Lapedes, N. Packard and B. Wendroff (Eds.), Evolution, Games and Learning, North-Holland Physics Publishing, Amsterdam, P. W. Frey and D. J. Slate, “Letter Recognition Using Holland-Style Adaptive Classifiers”, Machine Learning, 6, 1991,

Database Characteristics Hundreds of Thousands of Records Missing Data Erroneous Data Entries

Forecasting Challenges Categorical Attributes and/or Outcomes Non-Monotonic Relationships between Attributes and the Outcome Skewed or Bimodal Numerical Distributions Non-Additive Attribute Influence on Outcomes Multiple Attribute Combinations that Produce Desirable Outcomes

Recursive Binary Partitioning J.A. Sonquist and J.N. Morgan, “The Detection of Interaction Effects”, Institute of Social Research Monograph no. 35, Chicago: University of Michigan, 1964 G. V. Kass, An Exploratory Technique for Investigating Large Quantities of Categorical Data. Journal of Applied Statistics, 29:2, 1980, L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees, Pacific Grove, CA: Wadsworth, 1984.

Advantages of RBP Rational Treatment of Missing Data Numerical Distribution Is Not Relevant Monotonic Relationship Not Required Okay with Multiple “Flavors” of a Good Outcome Non-Additive Relationships Are Not a Problem Large Data Sets Are an Advantage Computational Time Is Reasonable Methodological Transparency

Problems With RBP A Greedy, Myopic Algorithm Overfits the Training Sample Overshadowing of Useful Attributes

Attacking the Problems Look-Ahead Search Minimum Record Count for Leaf Node Minimum Split Score for Leaf Node Random Perturbation of Attribute Availability at Each Node Random Perturbation of Record Availability at Each Node

Ensemble RBP Split Rule Terminal Nodes Leaf Node Values Missing Values Ensemble of Decision Trees Parameter Tuning

KDD Cup: Preprocessing Removed Attributes with a Constant Value No Normalization Retained Missing Values No Limit on Range of Numerical Attributes Retained Duplicate Attributes No Generation of Additional Features No Modification of Categoric Attributes

KDD Cup: Attribute Selection Preliminary Ensemble Construction for Selection of Attributes Preliminary Traditional RBP for Selection of Attributes

KDD Cup: Model Building Ensemble RBP methodology using Random Attribute Omission at Each Node 40,000 Record Construction Set 10,000 Record Test Set 5-Fold Cross Validation to Select Parameters Final Models Built on 50,000 records

Observations 15,000 Attributes and 50,000 records Binary rather than Numeric Outcomes Categoric Attributes without Identifying Information