Regression using lm lmRegression.R Basics Prediction World Bank CO2 Data
Simple Linear regression Simple linear model: y = b 1 + x b 2 + error y: the dependent variable x: the independent variable b 1, b 2 : intercept and slope coefficients error: random departures between the model and the response. Coefficients estimated by least squares
Multiple regression y = b 0 + x 1 b 1 + x 2 b 2 + x 3 b 3 + … + error
Annual Boulder Temperatures Temperature is dependent variable, Year is the independent variable Errors =???? Linear =???
CO 2 Emissions by Country Independent: GDP/capita Dependent: CO2 emission Linear?? Errors ??
The R lm function Takes a formula to describe the regression where ~ means equals Works best when the data set is a data frame Returns a complicated list that can be used in summary, predict, print plot lmFit <- lm( y ~ x1 + x2)
Or more generally using a data frame lmFit <- lm( y ~ x1 + x2, data=dataset) dataset$y, dataset$x1, dataset$x2
Analysis of World Bank data set Best to work on a log scale and GDP has the strongest linear relationship Some additional pattern leftover in the residuals Try other variables Try a more complex curve Check the predictions using cross-validation
Leave-one-out Cross-validation Robust way to check a models predictions and the uncertainty measure Four steps: 1.Sequentially leave out each observation 2.Refit model with remaining data 3.Predict the omitted observation 4.Compare prediction and confidence interval to the actual observation A check on the consistency of the statistical model Because omitted observation is not used to make prediction