Exploring Ridge Regression Case Study: Predicting Mortality Rates

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

CHAPTER 3: TWO VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION
Ridge Regression Population Characteristics and Carbon Emissions in China ( ) Q. Zhu and X. Peng (2012). “The Impacts of Population Change on Carbon.
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Parameter Estimation, Dummies, & Model Fit We know mechanically how to “run a regression”…but how are the parameters actually estimated? How can we handle.
CHAPTER 8 MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF INFERENCE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
1 BA 275 Quantitative Business Methods Simple Linear Regression Introduction Case Study: Housing Prices Agenda.
Lecture 24: Thurs., April 8th
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Chapter 8 Forecasting with Multiple Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Prepared by Robert F. Brooker, Ph.D. Copyright ©2004 by South-Western, a division of Thomson Learning. All rights reserved.Slide 1 Managerial Economics.
Chapter 11 Simple Regression
Understanding Multivariate Research Berry & Sanders.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Statistical Methods Statistical Methods Descriptive Inferential
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Regression. Population Covariance and Correlation.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 6 (cont.) Difference Estimation. Recall the Regression Estimation Procedure 2.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
From OLS to Generalized Regression Chong Ho Yu (I am regressing)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
STAT 104 Section 9 Daniel Moon. Agenda Tests of Population mean μ X Comparisons of two means F-test for equal variances Multiple Linear Regression.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 4 Demand Estimation
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
Regression Chapter 6 I Introduction to Regression
Micro Economics in a Global Economy
What is Correlation Analysis?
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
Correlation and Simple Linear Regression
Basic Estimation Techniques
Managerial Economics in a Global Economy
Chapter 4 Demand Estimation
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Simple Linear Regression
Simple Linear Regression
Regression Models - Introduction
Correlation and Simple Linear Regression
Linear Model Selection and regularization
Simple Linear Regression and Correlation
Chapter 11 Variable Selection Procedures
Linear Regression Summer School IFPRI
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Correlation and Simple Linear Regression
Presentation transcript:

Exploring Ridge Regression Case Study: Predicting Mortality Rates Group E: Anna Cojocaru, Ellen Bryer, Irina Matcas

Project Overview and Data Purpose: Understand the use of Ridge Regression in comparison with MLR. Response Variable: Mortality Rate Explanatory Variables: Socioeconomic, Weather, and Pollution related factors Observational Units: 60 US Cities in 1963 Data: from McDonald, G.C. and Schwing, R.C.(1973)’s work entitled “Instabilities of regression estimates relating air pollution to mortality”. We use this problem / this example to talk about ridge regression and explain why it is a good technique

Problems with Data: Multicollinearity- several highly correlated explanatory variables. VIF test: Because there is a lot of correlation in pollution data, it’s often valuable to investigate several highly correlated (non-orthogonal) variables simultaneously. This study addresses the chronic effects of pollution (measured with Hydrocarbons, NO and Sulfur Dioxide) Note: there are other variables (pollutants) which are highly correlated with each of these so cause and effect is not the goal Uses observational data because of sampling issues and the fact that the distribution of all exposures can’t be characterized by a single variable The data is calculated relative pollution potentials (amounts which change by city and dispersion factors which are constant across cities) in each metropolitan area as well as weather factors.

Results: Multiple Linear Regression Best subsets using Mallows Cp (3.62): Precipitation Average temperature in January Average temperature in July Years of Education Percentage non-white Sulfur Dioxide Pollution R2 = 73.48% If Multicollinearity is a problem, then: OLSR won’t give proper weight to the explanatory variables used as predictors. OLS technique yields coefficient estimates too large or with wrong sign. Ridge avoids distortions by adjusting the weights of coefficients according to their stability.A qay to look at relationships between variables when several are highly correlated

Ridge Regression Technique Ridge regression (RR) minimizes the RSS to create an optimal fit. RR uses a ridge trace to determine how correlation between predictors affect coefficient estimates using k. k is a tuning parameter that adjusts the coefficients of the predictors according to their stability k is a constant [0,1] The optimal k gives more reliable coefficients. If k=0 => unbiased estimators, identical to those determined by OLS. If k>0 => OLS overestimated coefficients (biased) and RR shrinks them. By applying ridge regression we shrink the estimated coefficients towards zero - this introduces bias, but reduces variance in the estimated coefficients. As λ increases, the bias increases and the variance decreases. Ridge regression performs particularly well when there is a subset of true coefficients that are small or even zero. It doesn’t do as well when all of the true coefficients are moderately large; however, in this case it can still outperform linear regression over a pretty narrow range of (small) λ values (From other ppt) Resolves the issue of multicollinearity

Applying Ridge to our Data Determine“optimal k”: k≅0.14 Perform variable selection using ridge trace Eliminates stable variables with least predicting power Eliminates two highly unstable variables Eliminate further unstable variables (including July Temperature) The variables # 12&13 are unstable, therefore they can’t hold their predicting power and need to be eliminated.

Results: Ridge Regression Ridge Trace suggests: Precipitation Average temperature in January Population Density Years of Education Percentage non-white Sulfur Dioxide Pollution. R2 = 72.43%

MLR vs. Ridge Regression   Comparing Model Coefficients OLSR Ridge R Intercept 1180.356 988.408 PREC 1.797 1.487 JANT -1.484 -1.633 JULT -2.355 ___ ___ EDUC -13.619 -11.533 DENS 0.004 NONW 4.585 4.145 SO. 0.26 0.245 MAKE CONCLUSIONS HERE MInimizes the risk of over predicting especially as the mortality rate increases Similarities: The coefficients in both models have the same signs. Differences: The coefficient estimates in MLR are larger than the ones estimated by Ridge Regression (due to multicollinearity).