Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2012, Michael Kalsher
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression, we construct an equation that uses one or more variables (the IV(s) or predictor variable(s)) to predict another variable (the DV or outcome variable). –Predicting from one IV = Simple Regression –Predicting from multiple IVs = Multiple Regression
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Simple Regression: The Model The general equation: Outcome i = (model) + error i In regression the model is linear and we summarize a data set with a straight line. The regression line is determined through the method of least squares. 3
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 4 The Regression Line: Model = “things that define the line we fit to the data” Any straight line can be defined by: -The slope of the line (b 1 ) -The point at which the line crosses the ordinate, termed the intercept of the line (b 0 ) The general equation: Outcome i = (model) + error i … becomes Y i = (b 0 + b 1 X i ) + ε i b 1 and b 0 are termed regression coefficients b 1 tells us what the model looks like (it’s shape) b 0 tells us where the model is in geometric space ε i is the residual term and represents the difference between participant i’s predicted and obtained scores.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 5 Method of Least Squares: Finding the line of best fit The method of least squares selects the line (regression line) that has the lowest sum of squared differences and therefore best represents the observed data. Once we determine the slope (b 1 )and intercept (b 0 ) of the line, we can insert different values of our predictor variable into the model to estimate the value of the outcome variable. Regression Line Slope = b 1 = dy / dx Individual Data Points Residual (Error in Prediction) Sum of residuals = 0 Intercept (Constant) b 0
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Assessing Goodness of Fit Even the best fitting line can be a lousy fit to the data, so we need to assess the goodness of fit of the model against our best estimate--the mean. Let’s consider an example (see Field, p. 201): –A music mogul wants to know how many records her company will sell if she spends £100,000 on advertising. –In the absence of a model of the relationship between advertising and sales, the best guess would be the mean number of record sales (say 200,000)--regardless of amount of advertising. –So, as a basic strategy for predicting the outcome, we could use the mean, because on average it is a good guess. 6
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 7 Assessing Goodness of Fit Represents the total amount of differences present when the most basic model is applied to the data. SS T uses the differences between the observed data and the mean value of Y. Represents the degree of inaccuracy when the best model is fitted to the data. SS R uses the differences between the observed data and the regression line. Shows the reduction in inaccuracy resulting from fitting the regression model to the data. SS M uses the differences between the mean value of Y and the regression line. A large SS M implies the regression model predicts the outcome variable better than the mean. SS T SS R SS M
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 8 Assessing Goodness of Fit A large SS M implies the regression model is much better than using the mean to predict the outcome variable. How big is big? Assessed in two ways: (1) Via R 2 and (2) the F-test (assesses the ratio of systematic to unsystematic variance). R 2 = SS M SS T Represents the amount of variance in the outcome explained by the model relative to how much variance there was to explain. F =F = SS M / df SS R / df = MS M MS R df for SS M = number of variables in the model df for SS R = number of observations minus number of parameters being estimated.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Simple Regression Using SPSS: Predicting Record Sales (Y) from Advertising Budget (X) Record1.sav What’s the overall relationship between record sales and advertising budget?
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher
11
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting a Simple Regression: Overall Fit of the Model 12 SS M SS R SS T MS M MS R The significant “F” test allows us to conclude that the regression model results in significantly better prediction of record sales than the mean value of record sales. Advertising expenditure accounts for 33.5% of the variation in record sales.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 13 Df = 1, 198 F=99.587
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 14 Critical Values for F
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher b 0, the Y intercept b 1, the slope, or the change in the outcome associated with a unit change in the predictor Interpreting a Simple Regression: Model Parameters The ANOVA tells us whether the overall model results in a significantly good prediction of the outcome variable … not about the individual contribution of variables in the model. b 0 = Tells us that when no money is spent on ads, the model predicts 134,140 records will be sold. b 1 =.096. The amount of change in the outcome associated with a unit change in the predictor. Thus, we can predict 96 extra record sales for every £ 1000 in advertising. Regression coefficients should be sig. different from 0 and big relative to their S.E.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 16 Unstandardized Regression Weights Y pred = b 0 + b 1 X Intercept and Slope are in original units of X and Y and so aren’t directly comparable Interpreting a Simple Regression: Model Parameters Standardized Regression Weights Z y(pred) = Z x Standardized regression weights tell us the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor. Richards. (1982). Standardized versus Unstandardized Regression Weights. Applied Psychological Measurement, 6,
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 17 Interpreting a Simple Regression: Using the Model Since we’ve demonstrated the model significantly improves our ability to predict the outcome variable (record sales), we can plug in different values of the predictor variable(s). record sales i = b 0 + b 1 advertising budget i = (0.096 x advertising budget i ) What could the record executive expect if she spent £500,000 in advertising? How about £1,000,000?
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 18 Simple Regression: Supermodel.sav A fashion student interested in the factors that predict salaries of catwalk models collects data from 231 models. For each model, she asks them their salary per day on days they work (salary), their age (age), number of years they have worked as a model (years), and then gets a panel of experts from modeling agencies to rate the attractiveness of each model as a percentage with 100% being perfectly attractive (beauty). Use simple regression to predict the relationship between each of the potential predictor variables (i.e., age, years, beauty) to predict a model’s salary.
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 19
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 20 Attractiveness Age Years
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 21 Attractiveness
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 22 Age
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 23 Years
PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 24