Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Random Forest Predrag Radenković 3237/10
Brief introduction on Logistic Regression
Inference for Regression
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Model Assessment, Selection and Averaging
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Resampling techniques
Additive Models and Trees
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Experimental Evaluation
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Ensemble Learning (2), Tree and Forest
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Correlation & Regression
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Review of normal distribution. Exercise Solution.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Statistical Computing
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
1 Nonparametric Methods III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Experimental Evaluation of Learning Algorithms Part 1.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University Nonparametric.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Multiple Logistic Regression STAT E-150 Statistical Methods.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 Probability and Statistics Confidence Intervals.
Statistics Correlation and regression. 2 Introduction Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Estimating standard error using bootstrap
Bootstrap and Model Validation
Chapter 7. Classification and Prediction
Basic Estimation Techniques
C4.5 - pruning decision trees
Basic Estimation Techniques
What is Regression Analysis?
Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Ensemble learning Reminder - Bagging of Trees Random Forest
Regression and Clinical prediction models
Fractional-Random-Weight Bootstrap
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Overview Classical statistical methods from : – Linear regression, hypothesis testing, standard errors, confidence intervals, etc. New statistical methods Post 1980: – Based on the power of electronic computation – Require fewer distributional assumptions than their predecessors How to spend computational wealth wisely?

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Bootstrap Random sample from 164 data points t(x) = How accurate is t(x)? A device for extending SE to estimators other than the mean Suppose t(x) is 25% trimmed mean

Bootstrap Why use a trimmed mean rather than mean(x)? If data is from a long-tailed probability distribution, then the trimmed mean can be substantially more accurate than mean(x) In practice, one does not know a priori if the true probability distribution is long-tailed. The bootstrap can help answer this question.

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Nonparametric Regression Quadratic regression curve at 60% compliance /- 3.08

Nonparametric Regression i.e. – Windowing with nearlest 20% data points – Smooth weight function – Weighted linear regression Nonparametric Regression with loess at 60% compliance /- ? How to find SE?

Nonparametric Regression How to find SE? Bootstrap / with B=50 At 60% compliance QR : / NPR: / On balance, the quadratic estimate should probably be preferred in this case. It would have to have an unusually large bias to undo its superiority in SE.

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Generalized Additive Models Generalized Linear model: – Generalizes linear regression – Linear model related to response variable using a link function Y = g(b 0 + b 1 *X b m *X m ) Additive Model: – Non parametric regression method – Estimate a non parametric function for each predictor – Combine all predictor functions to predict the dependent variable Generalized Additive Model (GAM) : – Blends properties of Additive models with generalized linear model (GLM) – Each predictor function f i (x i ) is fit using parametric or non parametric means – Provides good fits to training data at the expense of interpretability

GAM Case Study Analyze survival of infants after cardiac surgery for heart defects Dataset: 497 infant records Explanatory variables: – Age (Days) – Weight (Kg) – Whether Warm-blood cardiopelgia (WBC) was applied WBC support data: – Of 57 infants who received WBC procedure, 7 died – Of 440 infants who received standard procedure, 133 died

GAM Case Study: Logistic regression results Three parameter regression model – Age, Weight: continuous variables – WBC applied: binary variable Results: – WBC has strong beneficial effect: odds ratio of 3.8:1 – Higher weight => Lower risk of death – Age has no significant effect

GAM Case Study: GAM Analysis Add three individual smooth functions – Use locally weighted scatter plot smoothing (Loess) method Results: – WBC has strong beneficial effect: odds ratio of 4.2:1 – Lighter infants have 55 times more likely to die than heavier infants – Surprising findings from log odds curve for age !

GAM Case Study: Conclusion Traditional regression models may lead to oversimplification – Linear logistic regression forces curves to be straight lines – Vital information regarding effect of age lost in a linear model – More acute problem with large number of explanatory variables GAM analysis exploits computational power to achieve new level of analysis flexibility – A Personal computer can do what required a Mainframe 10 years ago

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Classification and Regression Tree A non parametric technique An ideal analysis method to apply computer algorithms Splits based upon how well the splits can explain variability Once a node is split, the procedure is applied to each “split” recursively

CART Case study Gain insight into causes of duodenal ulcers – Use sample of 745 rats – 1 out of 56 different alkyl nucleophiles administered to each rat – Response: One of three severity levels (1,2,3), 3 being the highest severity Skewed misclassification costs – Severe ulcer misclassification is more expensive than mild ulcer misclassification Analysis tree construction: – Use 745 observations as the training data – Compute ‘apparent’ misclassification rates – Training data misclassification rate has downward bias

CART Case study Classification tree

CART Case study: Observations Optimal size of classification tree is a tradeoff – Higher training errors versus overfitting It is usually better to construct large tree and prune from bottom How to chose optimal size classification tree ? – Use test data on different tree models to understand misclassification rate in each tree – In the absence of test data, use cross validation approach

CART: Cross validation Mimic the use of test sample Standard cross validation approach: – Divide dataset into 10 equal partitions – Use 90% of data as training set and the remaining 10% as test data – Repeat with all different combinations of the training and test data Cross validation misclassification errors found to be 10% higher than the original Cross validation and bootstrapping are closely related – Research on hybrid approaches in progress

Agenda Overview Bootstrap Nonparametric Regression Generalized Additive Models Classification and Regression Trees Conclusion

Computers have enabled a new generation of statistical methods and tools Replace traditional mathematical ways with computer algorithms. Freedom from bell-shaped curve assumptions of the traditional approach Modern Statisticians need to understand: Mathematical tractability is not required for computer based methods Which computer based methods to use When to use each method