Download presentation
Presentation is loading. Please wait.
1
Konstantina Christakopoulou Liang Zeng Group G21
Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina Christakopoulou Liang Zeng Group G21 Related to the Chapter 28: Data Mining
2
Motivation. Machine Learning for Economic Transactions:
Linear Regression is not Enough! Big data size A lot of features: Choose variables Relationships are not only linear!!
3
Connection to the Course: Decision Trees e.g ID3
Challenges of ID3: Cannot handle continuous attributes Prone to outliers 1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes + handle missing attributes + over-fitting by post-pruning 2. Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!
4
ID3 Decision Tree
5
Classification and Regression Trees(CART)
Classification tree is when the predicted outcome is the class to which the data belongs. Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).
6
Classification and Regression Trees(CART)
Predict Titanic survivors using age and class
7
Classification and Regression Trees(CART)
A CART for Survivors of the Titanic using R language
8
Random Forests
9
Random Forests Decision Tree Learning + Many decision trees + One Tree
Choose a bootstrap sample and start to grow a tree At each node: Choose random sample of predictors to make the next decision Repeat many times to grow a forest of trees For prediction: have each tree make its prediction and then a majority vote. Decision Tree Learning + Many decision trees + One Tree + Each DT on a random subset of samples + On all learning samples + Reduce the effect of outliers (no overfitting) + Prone to distortions e.g outliers Random Forest
10
Boosting, Bagging, Bootstrap
Randomization can help! Bootstrap: choose (with replacement) a sample Bagging: averaging across models estimated with several bootstraps Boosting: repeated estimation where misclassified observations are given an increasing weight. Final is an average
11
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.