Detecting Statistical Interactions with Additive Groves of Trees

Slides:



Advertisements
Similar presentations
Additive Groves of Regression Trees Daria Sorokina Rich Caruana Mirek Riedewald.
Advertisements

Chapter 5 Multiple Linear Regression
Random Forest Predrag Radenković 3237/10
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Copyright © Allyn & Bacon (2007) Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Model Assessment, Selection and Averaging
Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.
Modeling Additive Structure and Detecting Interactions with Additive Groves of Regression Trees Daria Sorokina Joint work with: Rich Caruana, Mirek Riedewald.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Rich Caruana Alexandru Niculescu-Mizil Presented by Varun Sudhakar.
Design of Engineering Experiments - Experiments with Random Factors
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Statistics for Managers Using Microsoft® Excel 5th Edition
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
BA 555 Practical Business Analysis
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Ensemble Learning: An Introduction
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Chapter 11 Multiple Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Correlation and Regression Analysis
Intelligible Models for Classification and Regression
Ensemble Learning (2), Tree and Forest
Relationships Among Variables
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Biswanath Panda, Mirek Riedewald, Daniel Fink ICDE Conference 2010 The Model-Summary Problem and a Solution for Trees 1.
Tests and Measurements Intersession 2006.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
PCB 3043L - General Ecology Data Analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Mining and Decision Support
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Feature Selection and Extraction Michael J. Watts
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Classification and Regression Trees
Information Processing by Neuronal Populations Chapter 6: Single-neuron and ensemble contributions to decoding simultaneously recoded spike trains Information.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Bagging and Random Forests
Linear Regression.
CSE 4705 Artificial Intelligence
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning Reminder - Bagging of Trees Random Forest
A Data Partitioning Scheme for Spatial Regression
Introduction to Machine learning
Presentation transcript:

Detecting Statistical Interactions with Additive Groves of Trees Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink

Domain Knowledge Questions Which features are important? What effects do they have on the response variable? Effect visualization techniques Is it always possible to visualize an effect of a single variable? Toy example: seasonal effect on bird abundance # Birds Season Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Visualizing effects of features Toy example 1: # Birds = F(season, #trees) Averaged seasonal effect Many trees Few trees # Birds # Birds Season Season Season Toy example 2: # Birds = F(season, latitude) Averaged seasonal effect ? South North Interaction # Birds # Birds Season Season Season Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Statistical interactions are NOT correlations ! Statistical interactions are NOT correlations ! Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Statistical Interactions Statistical interactions ≡ non-additive effects among two or more variables in a function F (x1,…,xn) shows no interaction between xi and xj when F (x1,x2,…xn) = G (x1,…,xi-1,xi+1,…,xn) + H (x1 ,…,xj-1,xj+1,…, xn), i.e., G does not depend on xi, H does not depend on xj Example: F(x1,x2,x3) = sin(x1+x2) + x2·x3 x1, x2 interact x2, x3 interact x1, x3 do not interact Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Interaction Detection Approach How to test for an interaction: Build a model from the data (no restrictions). Build a restricted model – this time do not allow interaction of interest. Compare their predictive performance. If the restricted model is as good as the unrestricted – there is no interaction. If it fails to represent the data with the same quality – there is interaction. Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Learning Method Requirements Non-linearity If unrestricted model does not capture interactions, there is no chance to detect them Restriction capability (additive structure) The performance should not decrease after restriction when there are no interactions Most existing prediction models do not fit both requirements at the same time We had to invent our own algorithm that does Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Additive Groves of Regression Trees (Sorokina, Caruana, Riedewald; ECML’07) New regression algorithm Ensemble of regression trees Based on Bagging Additive models Combination of large trees and additive structure Useful properties High predictive performance Captures interactions Easy to restrict specific interactions Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Additive Groves Additive models fit additive components of the response function A Grove is an additive model where every single model is a tree Additive Groves applies bagging on top of single Groves +…+ +…+ +…+ (1/N)· + (1/N)· +…+ (1/N)· Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Interaction Detection Approach How to test for an interaction: Build a model from the data (no restrictions). Build a restricted model – do not allow the interaction of interest. Compare their predictive performance. If the restricted model is as good as the unrestricted – there is no interaction. If it fails to represent the data with the same quality – there is interaction. Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Training Restricted Grove of Trees The model is not allowed to have interactions between features A and B Every single tree in the model should either not use A or not use B + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Training Restricted Grove of Trees The model is not allowed to have interactions between attributes A and B Every single tree in the model should either not use A or not use B Evaluation on the separate validation set no A no B vs. ? + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Training Restricted Grove of Trees The model is not allowed to have interactions between attributes A and B Every single tree in the model should either not use A or not use B Evaluation on the separate validation set no A no B vs. ? + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Training Restricted Grove of Trees The model is not allowed to have interactions between attributes A and B Every single tree in the model should either not use A or not use B Evaluation on the separate validation set no A no B vs. ? + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Training Restricted Grove of Trees The model is not allowed to have interactions between attributes A and B Every single tree in the model should either not use A or not use B no A no B vs. … + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Higher-Order Interactions F(x) shows no K-way interaction between x1, x2, …, xK when F(x) = F1(x\1) + F2(x\2) + … + FK(x\K), where each Fi does not depend on xi (x1+x2+x3)-1 – has a 3-way interaction x1+x2+x3 – has no interactions (neither 2 nor 3-way) x1x2 + x2x3 + x1x3 – has all 2-way interactions, but no 3-way interaction Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Higher-Order Interactions F(x) shows no K-way interaction between x1, x2, …, xK when F(x) = F1(x\1) + F2(x\2) + … + FK(x\K), where each Fi does not depend on xi K-way restricted Grove: K candidates for each tree no x1 no x2 no xK vs. vs. … vs. ? + + + Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Quantifying Interaction Strength Performance measure: standardized root mean squared error Interaction strength: difference in performances of restricted and unrestricted models Significance threshold: 3 standard deviations of unrestricted performance Randomization comes from different data samples (folds, bootstraps…) Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Correlations and Feature Selection Correlations between the variables hurt interaction detection Solution: feature selection. Correlated features will be removed Also, feature selection will leave few variable pairs to check for interactions As opposed to N2 Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Synthetic Data Interactions Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Synthetic Data 1,2 1,2,3 2,3 1,3 Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Synthetic Data 2,7 7,9 Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Synthetic Data x5, x8, x10 have small ranges by construction and do not influence response much. Interactions of all other variables are detected. 9,10 7,10 3,5 Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Synthetic Data X4 is not involved in any interactions Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Elevators Airplane control data set: predict required position of elevators 1 strong 3-way interaction absRoll – absolute value of the roll angle diffRollRate – roll angular acceleration SaTime4 – position of ailerons 4 time steps ago Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: CompAct Predict CPU activity from other computer system parameters A very additive, almost linear data set All detected interactions were fairly small and non-stable Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: Kinematics Simulation of an 8-link robotics arm movements Predict distance between the end and the origin from values of joints angles Highly non-linear data set, contains a 6-way interaction Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Experiments: House Finch Abundance Data Interaction (year, latitude) corresponds to an eye-disease that affected house finches during the decade covered by the dataset Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Summary Statistical interaction detection shows which features should be analyzed in groups We presented a novel technique, based on comparing restricted and unrestricted models Additive Groves is an appropriate learning method for this framework Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Acknowledgements Our collaborators in Computer Science department and Cornell Lab of Ornithology: Wes Hochachka Steve Kelling Art Munson Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Appendix Related work Regression trees Trying to restrict bagged trees Statistical methods (Friedman & Popescu, 2005) (Hooker, 2007) Regression trees Trying to restrict bagged trees Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Regression trees used in Groves Each split optimizes RMSE Parameter α controls the size of the tree Node becomes a leaf if it contains ≤ α·|trainset| cases 0 ≤ α ≤ 1, the smaller α, the larger the tree (Any other type of regression tree could be used.) Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Related work: early statistical methods (Neter et. al Related work: early statistical methods (Neter et. al., 1996) (Ott & Longnecker, 2001) Build a linear model with an interaction term 1x1 + 2x2 + … + nxn + βx1·x2 Test whether β is significantly different from 0 Problem: limited types of interaction Collect data for all combination of parameter values Find value of interaction term for each combination Test whether interaction is significant Problem: not useful for high-dimensional data sets Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Related Work: Partial Dependence Functions (Friedman & Popescu, 2005) No interaction ≡ E\x(F()) + E\z(F()) = E\{x,z}(F()) + E(F()) But only if x and z are distributed independently Create “fake” data points in the data set Check for interactions in the resulting data Problem: fake interactions in the fake data (Hooker, Generalized Functional ANOVA diagnostics, 2007) Real data Real and fake data Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Related Work: Generalized Functional ANOVA Diagnostics (Hooker, 2007) Improvement on partial dependence functions algorithm Estimates joint distribution and penalizes the areas with small density Produces results based on real data High complexity Dense grid External density estimation Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.

Would other ensembles work? Let’s try to restrict bagging in the same way. Assume A and B are both important A is more important than B There is no interaction between A and B First tree: A is more important, the tree without B performs better. Choose the tree without B Second tree: N-th tree: … Now the whole ensemble consists of trees without B! B is important, so the performance dropped But there was no interaction… Daria Sorokina, Rich Caruana, Mirek Riedewald, Daniel Fink Detecting Statistical Interactions with Groves of Trees.