Konstantina Christakopoulou Liang Zeng Group G21

Konstantina Christakopoulou Liang Zeng Group G21
Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina Christakopoulou Liang Zeng Group G21 Related to the Chapter 28: Data Mining

Motivation. Machine Learning for Economic Transactions:
Linear Regression is not Enough! Big data size A lot of features: Choose variables Relationships are not only linear!!

Connection to the Course: Decision Trees e.g ID3
Challenges of ID3: Cannot handle continuous attributes Prone to outliers 1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes + handle missing attributes + over-fitting by post-pruning 2.  Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!

ID3 Decision Tree

Classification and Regression Trees(CART)
Classification tree is when the predicted outcome is the class to which the data belongs. Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).

Predict Titanic survivors using age and class

A CART for Survivors of the Titanic using R language

Random Forests

Random Forests Decision Tree Learning + Many decision trees + One Tree
Choose a bootstrap sample and start to grow a tree At each node: Choose random sample of predictors to make the next decision Repeat many times to grow a forest of trees For prediction: have each tree make its prediction and then a majority vote. Decision Tree Learning + Many decision trees + One Tree + Each DT on a random subset of samples + On all learning samples + Reduce the effect of outliers (no overfitting) + Prone to distortions e.g outliers Random Forest

Boosting, Bagging, Bootstrap
Randomization can help! Bootstrap: choose (with replacement) a sample Bagging: averaging across models estimated with several bootstraps Boosting: repeated estimation where misclassified observations are given an increasing weight. Final is an average

Thank you!

Konstantina Christakopoulou Liang Zeng Group G21

Similar presentations

Presentation on theme: "Konstantina Christakopoulou Liang Zeng Group G21"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Konstantina Christakopoulou Liang Zeng Group G21

Similar presentations

Presentation on theme: "Konstantina Christakopoulou Liang Zeng Group G21"— Presentation transcript:

Similar presentations

About project

Feedback