Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Machine Learning and Tree Based Methods

Similar presentations


Presentation on theme: "Introduction to Machine Learning and Tree Based Methods"— Presentation transcript:

1 Introduction to Machine Learning and Tree Based Methods
Ben Hansen, Phoenix R User Group

2 What is the difference between statistics and machine learning?
Larry Wasserman, professor of statistics and machine learning at Carnegie Mellon University

3 What is the difference between statistics and machine learning?
Larry Wasserman, professor of statistics and machine learning at Carnegie Mellon University “None. They are both concerned with the same question: how do we learn from data.” Machine Learning = Better Marketing

4 What is the difference between statistics and machine learning?
Larry Wasserman, professor of statistics and machine learning at Carnegie Mellon University “None. They are both concerned with the same question: how do we learn from data.” Machine Learning = Better Marketing “Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.” Statistics is more concerned with explaining the outcome. “Machine Learning emphasizes high dimensional prediction problems.” Machine Learning is more concerned with predicting the outcome.

5 What is the difference between statistics and machine learning?
Larry Wasserman, professor of statistics and machine learning at Carnegie Mellon University “None. They are both concerned with the same question: how do we learn from data.” Machine Learning = Better Marketing “Statistics emphasizes formal statistical inference (confidence intervals, hypothesis tests, optimal estimators) in low dimensional problems.” Statistics is more concerned with explaining the outcome. “Machine Learning emphasizes high dimensional prediction problems.” Machine Learning is more concerned with predicting the outcome. “Overall, the the two fields are blending together more and more and I think this is a good thing.”

6 Linear Regression Pros: Easy to interpret, clear estimates of the relationship between the predictors and outcome. Cons: Not the strongest predictor, inflexible, relies on strict and often impractical assumptions about the data.

7 Classification and Regression Tree (CART) or Decision Tree
Works by partitioning or segmenting the target variable into homogeneous regions using the predictor variables. Very simple approach to prediction and easy to interpret. Can be displayed graphically, even with multiple dimensions. No need to create dummy variables when dealing with qualitative predictors. However, like linear regression, does not get the same level of prediction accuracy as other machine learning techniques.

8 Regression Tree

9 Regression Tree

10 Classification Tree

11 Classification and Regression Tree Algorithm
Top-Down greedy approach All variables are tried at each step and the best split is chosen. The process repeats at each resulting branch of the tree until the tree is fully grown. Stopping criteria can be assigned using either a cost complexity parameter or setting a minimum node size. Regression Tree The goal is to minimize the sum of the squared error in each leaf node - sum(y – prediction)^2 Classification Tree The goal is to create pure leaf nodes using the gini coefficient: G = sum(pk * (1 – pk)). pk are the number of training instances with class k in the leaf node of interest Other Resources Elements of Statistical Learning, Introduction to Statistical Learning, Statistical Learning MOOC, Lecture Videos

12 Random Forest

13 Random Forest Random Forest Builds many decision trees.
Each decision tree is built differently through a two step random selection process. A different subset of data is randomly sampled for each decision tree. Trees are decorrelated by randomly sampling the variables to be used to make each split at each level of each decision tree. Predictions from all of the different decision trees are then aggregated into a single prediction. Greatly improves prediction accuracy but at the cost of interpretability.

14

15 Random Forest Variable Importance
Regression Sum the total amount that the RSS is decreased in all of the tress for each variable. Classification Sum the total amount that the Gini is decreased in all of the trees for each variable.


Download ppt "Introduction to Machine Learning and Tree Based Methods"

Similar presentations


Ads by Google