A Method for the More Accurate Measurement and Communication of Model Error Scott Fortmann-Roe University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Further Inference in the Multiple Regression Model Hill et al Chapter 8.
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Managerial Economics in a Global Economy
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Multiple Regression Analysis
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Multiple Linear Regression Model
Chapter 14 Introduction to Multiple Regression
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter Topics Types of Regression Models
Linear Regression Example Data
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Simple Linear Regression Analysis
Quantitative Demand Analysis
Lecture 5 Correlation and Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 11 Simple Regression
Chapter 14 Simple Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
You want to examine the linear dependency of the annual sales of produce stores on their size in square footage. Sample data for seven stores were obtained.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Lecture 10: Correlation and Regression Model.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Chap 6 Further Inference in the Multiple Regression Model
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Real Estate Sales Forecasting Regression Model of Pueblo neighborhood North Elizabeth Data sources from Pueblo County Website.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Multiple Regression Reference: Chapter 18 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Chapter 13 Simple Linear Regression
Statistics for Managers using Microsoft Excel 3rd Edition
26134 Business Statistics Week 5 Tutorial
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
LESSON 24: INFERENCES USING REGRESSION
Multiple Regression Models
STA 282 – Regression Analysis
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

A Method for the More Accurate Measurement and Communication of Model Error Scott Fortmann-Roe University of California, Berkeley

1) More accurate assessment of prediction error Predictions Inferences 2) More accurate models 3) More accurate measures of significance 4) Altered inferences and conclusions

Issues with Current Approaches

Measure R 2, p- value, AIC AccuracyAccessibilityAdaptability

Measure Accuracy (R 2 ) AccessibilityAdaptability

House Area House Price

Measure Accuracy Accessibility (p-values) Adaptability

[Given a p-value from an experiment] you have found the probability of the null hypothesis being true. “

Measure AccuracyAccessibility Adaptability (AIC, BIC, …)

The Method: A 3

Does X significantly affect Y? Does the inclusion of X in a model increase our ability to predict Y?

High-Level Statistical Overview  Wraps around any predictive algorithm  Linear Regression, Logistic Regression, Random Forests, …  Cross-validation is used to obtain accurate measure of error  Exact test is used to obtain accurate p-values  No parametric assumptions (other than independence between observations)  (Even independence may be violated if compensated for)

Applications

Housing Market Predicting housing price based on house and market attributes Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management 5: 81–102.

Coefficient Std. Error t-Valuep-Value (Intercept) AGE ROOMS < 0.01 NOX < 0.01 PUPIL/ TEACHER < 0.01 HIGHWAY Adjusted R 2 : 0.60; p-Value < 0.01

CoefficientCrVa R 2 p-Value -Full Model %< 0.01 (Intercept) %0.39 AGE %0.22 ROOMS %< 0.01 NOX %< 0.01 PUPIL/ TEACHER %< 0.01 HIGHWAY %1.00 A 3 : Linear Model

CrVa R 2 p-Value -Full Model-74.3 %< 0.01 AGE- 1.5 %0.01 ROOMS %< 0.01 NOX+ 6.3 %< 0.01 PUPIL/ TEACHER %< 0.01 HIGHWAY- 2.6 %0.03 A 3 : Random Forest Model

Linear Regression Random Forest Support Vector Machines CrVa R Significant at p = 0.05 ROOMS NOX PUPIL/TE ACHER AGE ROOMS NOX PUPIL/TE ACHER HIGHWAY AGE ROOMS NOX PUPIL/TE ACHER Not Significant at p = 0.05 AGE HIGHWAY

Environmental Productivity Measure utility of an ecosystem based on different physical attributes Maestre FT, Quero JL, Gotelli NJ, Escudero A, Ochoa V, et al. (2012) Plant Species Richness and Ecosystem Multifunctionality in Global Drylands. Science 335: 214–218.

CoefficientStd. Errort-Valuep-Value (Intercept) < 0.01 SR SLO < 0.01 SAC < 0.01 C C C C < 0.01 LAT LONG < 0.01 ELE < 0.01 Adjusted R 2 =0.56; p-Value < 0.01

CoefficientCrVa R 2 p-Value -Full Model %< 0.01 (Intercept) %< 0.01 SR %0.01 SLO %0.01 SAC %< 0.01 C %0.91 C %0.15 C %0.28 C %< 0.01 LAT %0.09 LONG %< 0.01 ELE %< 0.01 A 3 : Linear Model

CrVa R 2 p-Value -Full Model-68.3 %< 0.01 SR+ 1.2 %< 0.01 SLO- 1.3 %0.95 SAC+ 4.0 %< 0.01 C %< 0.01 C %0.02 C %0.16 C %< 0.01 LAT+ 0.5 %< 0.01 LONG+ 0.2 %0.02 ELE+ 0.4 %0.02 A 3 : Random Forest Model

Applications Recap  Explained an additional 15-16% of the squared error  Significantly altered inferences and conclusions about the underlying systems

Summary

MethodAccuracyAccessibilityAdaptability R2R2 ★☆☆ ★★★ Adjusted R 2 ★★☆★★★★☆☆ p-Values ★★★★★☆ AIC, BIC and Information Theoretic Techniques ★★★★☆☆★★☆ A3A3 ★★★

Questions….