Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation and Simple Linear Regression PSY440 June 10, 2008.

Similar presentations


Presentation on theme: "Correlation and Simple Linear Regression PSY440 June 10, 2008."— Presentation transcript:

1 Correlation and Simple Linear Regression PSY440 June 10, 2008

2 A few points of clarification For the chi-squared test, the results are unreliable if the expected frequency in too many of your cells is too low. A rule of thumb is that the minimum expected frequency should be 5 (i.e., no cells with expected counts less than 5). A more conservative rule recommended by some is a minimum expected frequency of 10. If your minimum is too low, you need a larger sample! The more categories you have the larger your sample must be. SPSS will warn you if you have any cells with expected frequency less than 5.

3 Regarding threats to internal validity One of the strengths of well-designed single- subject research is the use of repeated observations during each phase. Repeated observations during baseline and intervention (during an AB study, e.g.) helps rule out testing, instrumentation (somewhat) and regression. These effects would be unlikely to result in a marked change between experimental phases that is not apparent during repeated observations before and after the phase change.

4 Regarding histograms The difference between a histogram and a bar graph is that the variable on the x axis (which represents the score on the variable being graphed, as opposed to the frequency of observations) is conceptualized as being continuous in a histogram, whereas a bar graph represents discrete categories along the x axis.

5 About the exam…. Exam on Thursday will cover material from the first three weeks of class (lectures 1-6, or everything through Chi- Squared tests). Emphasis of exam will be on generating results with computers (calculations by hand will not be emphasized), and interpreting the results. Exam questions will be based mainly on lecture material and modeled on previous active learning experiences (homework and in-class demonstrations and exercises). Knowledge of material on qualitative methods and experimental & single-subject design is expected.

6 Before we move on….. Any questions?

7 Today’s lecture and next homework Today’s lecture will cover correlation and simple (bivariate) regression. Homework based on today’s lecture will be distributed on Thursday and due on Tuesday (June 17).

8 Correlation A correlation is the association between scores on two variables –age and coordination skills in children, as kids get older their motor coordination tends to improve –price and quality, generally the more expensive something is the higher in quality it is

9 Correlation and Causality Correlational research –Correlation as a statistical procedure is generally used to measure the association between two (or more) continuous variables –Correlation as a kind of research design refers to observational studies in which there is no experimental manipulation.

10 Correlation and Causality Correlational research –Not all “correlational” (i.e., observational) research designs use correlation as the statistical procedure for analyzing the data (example: comparison of verbal abilities between boys and girls - observational study - don’t manipulate gender - but probably analyze mean differences with t-tests). –But: Virtually of the inferential statistical methods (including t-tests, anova, ancova) covered in 440 can be represented in terms of correlational/regression models (general linear model - we’ll talk more about this later). –Bottom line: Don’t confuse design with analytic strategy.

11 Correlation and Causality Correlations (like other linear statistical models) describe relationships between variables, but DO NOT explain why the variables are related Suppose that Dr. Steward finds that rates of spilled coffee and severity of plane turbulence are strongly positively correlated. One might argue that turbulence cause coffee spills One might argue that spilling coffee causes turbulence

12 Correlation and Causation Suppose that Dr. Cranium finds a positive correlation between head size and digit span (roughly the number of digits you can remember). One might argue that bigger your head, the larger your digit span 1 21 24 15 37 One might argue that head size and digit span both increase with age (but head size and digit span aren’t directly related)

13 Correlation and Causation Observational research and correlational statistical methods (including regression and path analysis) can be used to compare competing models of causation, to see which model fits the data best. One might argue that bigger your head, the larger your digit span 1 21 24 15 37 One might argue that head size and digit span both increase with age (but head size and digit span aren’t directly related)

14 Relationships between variables Properties of a statistical correlation –Form (linear or non-linear) –Direction (positive or negative) –Strength (none, weak, strong, perfect) To examine this relationship you should : –Make a scatterplot - a picture of the relationship –Compute the Correlation Coefficient - a numerical description of the relationship

15 Graphing Correlations Steps for making a scatterplot (scatter diagram) 1.Draw axes and assign variables to them 2.Determine range of values for each variable and mark on axes 3.Mark a dot for each person’s pair of scores

16 Scatterplot Y X 1 2 3 4 5 6 123 456 Plots one variable against the other Each point corresponds to a different individual A 6 6 XY

17 Scatterplot Y X 1 2 3 4 5 6 123 456 Plots one variable against the other Each point corresponds to a different individual A 6 6 B 1 2 XY

18 Scatterplot Y X 1 2 3 4 5 6 123 456 Plots one variable against the other Each point corresponds to a different individual A 6 6 B 1 2 C 5 6 XY

19 Scatterplot Y X 1 2 3 4 5 6 123 456 Plots one variable against the other Each point corresponds to a different individual A 6 6 B 1 2 C 5 6 D 3 4 XY

20 Scatterplot Y X 1 2 3 4 5 6 123 456 Plots one variable against the other Each point corresponds to a different individual A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 XY

21 Scatterplot Y X 1 2 3 4 5 6 123 456 Imagine a line through the data points Plots one variable against the other Each point corresponds to a different individual A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 XY Useful for “seeing” the relationship –Form, Direction, and Strength

22 Scatterplots with Excel and SPSS In SPSS, charts menu=>legacy dialogues=>scatter/dot=>simple scatter Click on define, and select which variable you want on the x axis and which on the y axis. In Excel, insert menu=>chart=>xyscatter Specify if variables are arranged in rows or columns and select the cells with the relevant data.

23 Form Non-linearLinear

24 NegativePositive Direction X & Y vary in the same direction As X goes up, Y goes up positive Pearson’s r X & Y vary in opposite directions As X goes up, Y goes down negative Pearson’s r Y X Y X

25 Strength The strength of the relationship –Spread around the line (note the axis scales) –Correlation coefficient will range from -1 to +1 Zero means “no relationship”. The farther the r is from zero, the stronger the relationship –In general when we talk about correlation coefficients: Correlation coefficient = Pearson’s product moment coefficient = Pearson’s r = r.

26 Strength r = 1.0 “perfect positive corr.” r 2 = 100% r = -1.0 “perfect negative corr.” r 2 = 100% r = 0.0 “no relationship” r 2 = 0.0 0.0+1.0 The farther from zero, the stronger the relationship

27 The Correlation Coefficient Formulas for the correlation coefficient: Conceptual FormulaCommon Alternative

28 The Correlation Coefficient Formulas for the correlation coefficient: Conceptual FormulaCommon alternative

29 Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) mean 3.64.0 6 1 2 5 6 3 4 3 2 X Y

30 Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) mean 3.64.0 2.4 0.0 6 1 2 5 6 3 4 3 2 X Y = 6 - 3.6 -2.6 = 1 - 3.6 1.4 = 5 - 3.6 -0.6 = 3 - 3.6 -0.6= 3 - 3.6 Quick check

31 Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 6 1 2 5 6 3 4 3 2 X Y 2.0= 6 - 4.0 -2.0 = 2 - 4.0 2.0= 6 - 4.0 0.0 = 4 - 4.0 -2.0 = 2 - 4.0 Quick check

32 Computing Pearson’s r (using SP) Step 1: SP (Sum of the Products) mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.014.0SP 6 1 2 5 6 3 4 3 2 X Y 4.8 * = 5.2 * = 2.8 * = 0.0 * = 1.2 * =

33 Computing Pearson’s r (using SP) Step 2: SS X & SS Y

34 Computing Pearson’s r (using SP) Step 2: SS X & SS Y mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.014.0 6 1 2 5 6 3 4 3 2 X Y 4.8 5.2 2.8 0.0 1.2 5.76 15.20 SS X 2 =6.76 2 =1.96 2 =0.36 2 = 2 =

35 Computing Pearson’s r (using SP) Step 2: SS X & SS Y mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.014.0 6 1 2 5 6 3 4 3 2 X Y 4.8 5.2 2.8 0.0 1.2 5.76 6.76 1.96 0.36 15.20 2 =4.0 2 = 2 = 2 =0.0 2 = 4.0 16.0 SS Y

36 Computing Pearson’s r (using SP) Step 3: compute r

37 Computing Pearson’s r (using SP) Step 3: compute r mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.014.0 6 1 2 5 6 3 4 3 2 X Y 4.8 5.2 2.8 0.0 1.2 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 SS Y SS X SP

38 Computing Pearson’s r Step 3: compute r 14.015.2016.0 SS Y SS X SP

39 Computing Pearson’s r Step 3: compute r 15.2016.0 SS Y SS X

40 Computing Pearson’s r Step 3: compute r 15.20 SS X

41 Computing Pearson’s r Step 3: compute r

42 Computing Pearson’s r Step 3: compute r Y X 1 2 3 4 5 6 123 456 Appears linear Positive relationship Fairly strong relationship.89 is far from 0, near +1

43 The Correlation Coefficient Formulas for the correlation coefficient: Conceptual FormulaCommon alternative

44 Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) 6 1 2 5 6 3 4 3 2 X Y For this example we will assume the data is from a population

45 Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) Mean 3.6 2.4 -2.6 1.4 -0.6 0.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 SS X Std dev 1.74 For this example we will assume the data is from a population

46 Computing Pearson’s r (using z-scores) Step 1: compute standard deviation for X and Y (note: keep track of sample or population) Mean 3.64.0 2.4 -2.6 1.4 -0.6 2.0 -2.0 2.0 0.0 -2.0 0.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 SS Y Std dev 1.741.79 For this example we will assume the data is from a population

47 Computing Pearson’s r (using z-scores) Step 2: compute z-scores Mean 3.64.0 2.4 -2.6 1.4 -0.6 2.0 -2.0 2.0 0.0 -2.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 1.38

48 Computing Pearson’s r (using z-scores) Step 2: compute z-scores Mean 3.64.0 2.4 -2.6 1.4 -0.6 2.0 -2.0 2.0 0.0 -2.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 1.38 -1.49 0.8 - 0.34 0.0 Quick check

49 Computing Pearson’s r (using z-scores) Step 2: compute z-scores Mean 3.64.0 2.4 -2.6 1.4 -0.6 2.0 -2.0 2.0 0.0 -2.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 1.11.38 -1.49 0.8 - 0.34

50 Computing Pearson’s r (using z-scores) Step 2: compute z-scores Mean 3.64.0 2.4 -2.6 1.4 -0.6 2.0 -2.0 2.0 0.0 -2.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 1.1 -1.1 0.0 -1.1 1.1 0.0 1.38 -1.49 0.8 - 0.34 Quick check

51 Computing Pearson’s r (using z-scores) Step 3: compute r Mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 0.0 1.1 -1.1 0.0 -1.1 1.1 0.0 1.521.38 -1.49 0.8 - 0.34 * =

52 Computing Pearson’s r (using z-scores) Step 3: compute r Mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 0.0 1.1 -1.1 0.0 -1.1 1.1 0.0 1.52 1.64 0.88 0.0 0.37 1.38 -1.49 0.8 - 0.34 4.41

53 Computing Pearson’s r (using z-scores) Step 3: compute r Y X 1 2 3 4 5 6 123 456 Appears linear Positive relationship Fairly strong relationship.88 is far from 0, near +1

54 Correlation in Research Articles Correlation matrix –A display of the correlations between more than two variables Acculturation Why have a “-”? Why only half the table filled with numbers?

55 Correlations with SPSS & Excel SPSS: Analyze => correlate=> bivariate Then select the variables you want correlation(s) for (can select just one pair, or many variables to get a correlation matrix) Try this with height and shoe size in our data. Now try with height, shoe size, mother’s height, and number of shoes owned. Excel: Arrange data for two variables in two columns or rows & use formula bar to request a correlation: =correl(array1,array2)

56 SPSS correlation output

57 Invalid inferences from correlations Why you should always look at the scatter plot before computing (and certainly before interpreting Pearson’s r): Correlations are greatly affected by range of scores in data –Consider height and age relationship –Restricted range example from text (SAT and GPA) Extreme scores can have dramatic effects on correlations –A single extreme score can radically change r, especially when your sample is small. Relations between variables may differ for subgroups, resulting in misleading r values for aggregate data Curvilinear relations not captures by Pearson’s r

58 What to do about a curvilinear pattern If pattern is monotonically increasing or decreasing, convert scores to ranks and compute r (using same formula) based on the rank scores. Result is called Spearman’s Rank Correlation Coefficient or Spearman’s Rho and can be requested in your spss output by checking the appropriate box when you select the variables for which you want correlations. If pattern is more complicated (u-shaped or s- shaped, for example), consult more advanced statistics resources.

59 Coefficient of determination When considering "how good" a relationship is, we really should consider r 2 (coefficient of determination), not just r. This coefficient tells you the percent of the variance in one variable that is explained or accounted for by the other variable.

60 From Correlation to Regression With correlation, we can examine whether variables X & Y are related With regression, we try to predict the value of one variable given what we know about the other variable and the relationship between the two.

61 Regression Last time: “it doesn’t matter which variable goes on the X-axis or the Y-axis” Y X 1 2 3 4 5 6 123456 For regression this is NOT the case The variable that you are predicting goes on the Y-axis (criterion or “dependent” variable) Predicted variable Predicting variable The variable that you are making the prediction based on goes on the X-axis (predictor or “independent” variable) Quiz performance Hours of study

62 Regression Correlation: “Imagine a line through the points” Y X 1 2 3 4 5 6 123456 But there are lots of possible lines One line is the “best fitting line” Regression: compute the equation corresponding to this “best fitting line” Quiz performance Hours of study

63 The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) 2.0 Y X 1 2 3 4 5 6 1234560 Y = intercept, when X = 0

64 The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) 2.0 Change in Y Change in X = slope 0.5 Y X 1 2 3 4 5 6 123456 1 2 0

65 The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) Y X 1 2 3 4 5 6 123456 0 Y = (X)(0.5) + 2.0

66 Regression A brief review of geometry Consider a perfect correlation Y = (X)(0.5) + (2.0) Y X 1 2 3 4 5 6 123456 Can make specific predictions about Y based on X X = 5 Y = ? Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5

67 Regression Y X 1 2 3 4 5 6 123456 Consider a less than perfect correlation The line still represents the predicted values of Y given X Y = (X)(0.5) + (2.0) X = 5 Y = ? Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5

68 Regression Y X 1 2 3 4 5 6 123456 The “best fitting line” is the one that minimizes the error (differences) between the predicted scores (the line) and the actual scores (the points) Rather than compare the errors from different lines and picking the best, we will directly compute the equation for the best fitting line

69 Regression The linear model Y = intercept + slope (X) + error Beta’s (  ) are sometimes called parameters Come in two types: standardized unstanderdized Now let’s go through an example computing these things

70 Scatterplot Using the dataset from our correlation example 6 1 2 5 6 3 4 3 2 X Y Y X 1 2 3 4 5 6 123 456

71 From when we computed Pearson’s r 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.0 4.8 5.2 2.8 0.0 1.2 5.76 6.76 1.96 0.36 4.0 0.0 4.0 14.015.2016.0 SS Y SS X SP

72 Computing regression line (with raw scores) 6 1 2 5 6 3 4 3 2 X Y 14.015.2016.0 SS Y SS X SP mean 3.64.0

73 Computing regression line (with raw scores) 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 Y X 1 2 3 4 5 6 123 456

74 Computing regression line (with raw scores) 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 Y X 1 2 3 4 5 6 123 456 The two means will be on the line

75 Computing regression line (standardized, using z-scores) Sometimes the regression equation is standardized. –Computed based on z-scores rather than with raw scores Mean 3.64.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 2.0 0.0 -2.0 0.0 6 1 2 5 6 3 4 3 2 X Y 5.76 6.76 1.96 0.36 15.20 4.0 0.0 4.0 16.0 Std dev 1.741.79 0.0 1.1 -1.1 0.0 -1.1 1.1 0.0 1.38 -1.49 0.8 - 0.34

76 Computing regression line (standardized, using z-scores) Sometimes the regression equation is standardized. –Computed based on z-scores rather than with raw scores 0.0 1.1 -1.1 0.0 -1.1 1.1 0.0 1.38 -1.49 0.8 - 0.34 Prediction model –Predicted Z score (on criterion variable) = standardized regression coefficient multiplied by Z score on predictor variable –Formula –The standardized regression coefficient ( β ) In bivariate prediction, β = r

77 Computing regression line (with z-scores) mean ZYZY ZXZX 1 2 0 12 0.0 1.1 -1.1 0.0 -1.1 1.1 0.0 1.38 -1.49 0.8 - 0.34 -2 -2

78 Regression Also need a measure of error Y = X(.5) + (2.0) + error Y X 1 2 3 4 5 6 123456 Y X 1 2 3 4 5 6 123456 Same line, but different relationships (strength difference) Y = intercept + slope (X)+ error The linear equation isn’t the whole thing

79 Regression Error –Actual score minus the predicted score Measures of error –r 2 (r-squared) –Proportionate reduction in error Note: Total squared error when predicting from the mean = SS Total =SS Y –Squared error using prediction model = Sum of the squared residuals = SS residual = SS error

80 R-squared r 2 represents the percent variance in Y accounted for by X Y X 1 2 3 4 5 6 123456 Y X 1 2 3 4 5 6 123456 r = 0.8 r = 0.5r 2 = 0.64r 2 = 0.25 64% variance explained 25% variance explained

81 Computing Error around the line Compute the difference between the predicted values and the observed values (“residuals”) Square the differences Add up the squared differences Y X 1 2 3 4 5 6 123 456 Sum of the squared residuals = SS residual = SS error

82 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 Predicted values of Y (points on the line) Sum of the squared residuals = SS residual = SS error

83 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 6.2 = (0.92)(6)+0.688 Predicted values of Y (points on the line) Sum of the squared residuals = SS residual = SS error

84 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 6.2 = (0.92)(6)+0.688 1.6 = (0.92)(1)+0.688 5.3 = (0.92)(5)+0.688 3.45 = (0.92)(3)+0.688 3.45 = (0.92)(3)+0.688 Sum of the squared residuals = SS residual = SS error

85 Computing Error around the line Y X 1 2 3 4 5 6 123 456 Sum of the squared residuals = SS residual = SS error X Y 6 1 2 5 6 3 4 3 2 6.2 1.6 5.3 3.45 6.2 1.6 5.3 3.45

86 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 6.2 0.00 -0.20 0.40 0.70 0.55 -1.45 1.6 5.3 3.45 residuals Sum of the squared residuals = SS residual = SS error Quick check 6 - 6.2 = 2 - 1.6 = 6 - 5.3 = 4 - 3.45 = 2 - 3.45 =

87 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 6.2 0.00 0.04 0.16 0.49 0.30 2.10 3.09 -0.20 0.40 0.70 0.55 -1.45 1.6 5.3 3.45 SS ERROR Sum of the squared residuals = SS residual = SS error

88 Computing Error around the line 6 1 2 5 6 3 4 3 2 X Y mean 3.64.0 6.2 0.00 0.04 0.16 0.49 0.30 2.10 3.09 -0.20 0.40 0.70 0.55 -1.45 1.6 5.3 3.45 SS ERROR Sum of the squared residuals = SS residual = SS error 4.0 0.0 4.0 16.0 SS Y

89 Computing Error around the line Sum of the squared residuals = SS residual = SS error Standard error of estimate (from textbook) is analagous to standard deviation. It is the square root of the average error: s x.y = sqrt(SS error /df) Also, the standard error of estimate is related to r 2 and to the standard deviaion of y: s x.y =s y *sqrt(1-r 2 )

90 Computing Error around the line 3.09 SS ERROR Sum of the squared residuals = SS residual = SS error 16.0 SS Y –Proportionate reduction in error Also (like r 2 ) represents the percent variance in Y accounted for by X In fact, it is mathematically identical to r 2

91 Seeing patterns in the error Residual plots The sum of the residuals should always equal 0 (as should the mean). –the least squares regression line splits the data in half, half of the error is above the line and half is below the line. In addition to summing to zero, we also want the residuals to be randomly distributed. –That is, there should be no pattern to the residuals. –If there is a pattern, it may suggest that there is more than a simple linear relationship between the two variables. Residual plots are very useful tools to examine the relationship even further. –These are basically scatterplots of the residuals (Y obs -Y pred ) against the Explanatory (X) variable (note: the examples actually plot the residuals that have transformed into z-scores).

92 Seeing patterns in the error The residual plot shows that the residuals fall randomly above and below the line. Critically there doesn't seem to be a discernable pattern to the residuals. Residual plot Scatter plot The scatterplot shows a nice linear relationship.

93 Seeing patterns in the error Residual plot The scatterplot also shows a nice linear relationship. The residual plot shows that the residuals get larger as X increases. This suggests that the variability around the line is not constant across values of X. This is referred to as a violation of homogeniety of variance. Scatter plot

94 Seeing patterns in the error The residual plot suggests that a non- linear relationship may be more appropriate (see how a curved pattern appears in the residual plot). Residual plot Scatter plot The scatterplot shows what may be a linear relationship.

95 Regression in SPSS Running the analysis in SPSS is pretty easy –Analyze: Regression: Linear –X or predictor variable(s) go into the ‘independent variable’ field –Y or predicted variable goes into the ‘dependent variable’ field –You can save the residuals as a new variable to plot the residuals against x as shown in the previous slide. You get a lot of output

96 Regression in SPSS The variables in the model r r 2 Unstandardized coefficients Slope (indep var name) Intercept (constant) Standardized coefficients We’ll get back to these numbers in a few weeks

97 In Excel With Data Analysis “Tool Pack” you can perform regression analysis With standard software package, you can get bivariate correlation (which is the same as the standardized regression coefficient), you can create a scatterplot, and you can request a trend line (as we did when plotting data for single- subject research), which is a regression line (what is y and what is x in that case?)

98 Considerations: Slope is dependent on variance of x and y Standardized slope = r (weaker associations between x and y result in flatter slopes) Means as the association becomes weaker, your prediction of y is more influenced by the mean of y than by changes in x. Regression to the mean is a special case of this…..

99 Regression to the mean Sometimes reliability is represented as r values (test-retest, split-half). If you have a test with low test-retest reliability, your score on the first administration is only weakly related to your score on the second administration. It is influenced by a considerable amount of error variance. Score(true)=Score(observed)+/-Error Score-/+Error=Score(observed) Any time you take a measurement, the observed score reflects your true score plus error. The further away your observed score gets from the mean score for the test, the more likely it is that the distance from the mean is due at least in part to error. If error is randomly distributed, then your next observed score is more likely to be closer to the mean than farther from the mean.

100 Regression to the mean If x=obs1 and y=obs2, and the test-retest reliability of your measure is relatively low (say, r=.5), then your first score only helps predict your second score somewhat. Standardized regression equation is y=.5x + error On a standardized test with mean=0 and sd=1, if you get a score above the mean, say 1.2, the first time you take the test, (obs1=x=1.2), and the test-retest reliability is only.5, your predicted score the next time you take the test is.5*1.2=.6. You are more likely to score closer to the mean. This doesn’t mean that you will definitely score closer to the mean, it just means that on average, people who score 1.2 sd above the mean the first time tend to have scores closer to.6 the next time they are tested. This is because the test isn’t that reliable, and the original observation of 1.2 includes error. For the average person with that score (but not for everyone), the error is part of what accounts for the difference between the score and the mean. If your test has higher reliability, then the regression to the mean effect is reduced.

101 Multiple Regression Multiple regression prediction models “fit”“residual”

102 Prediction in Research Articles Bivariate prediction models rarely reported Multiple regression results commonly reported


Download ppt "Correlation and Simple Linear Regression PSY440 June 10, 2008."

Similar presentations


Ads by Google