Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 12, 2013 Correlation and Regression.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Correlation and Linear Regression.
Review ? ? ? I am examining differences in the mean between groups
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Prediction with multiple variables Statistics for the Social Sciences Psychology 340 Spring 2010.
Chapter 4 The Relation between Two Variables
Overview Correlation Regression -Definition
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Statistics for the Social Sciences
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Correlation and Simple Linear Regression PSY440 June 10, 2008.
Introduction to Probability and Statistics Linear Regression and Correlation.
Ch 11: Correlations (pt. 2) and Ch 12: Regression (pt.1) Nov. 13, 2014.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing with Correlation and Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation and Regression
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Introduction to Linear Regression and Correlation Analysis
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 15 Correlation and Regression
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Ch4 Describing Relationships Between Variables. Pressure.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Examining Relationships in Quantitative Research
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 4 Prediction. Predictor and Criterion Variables  Predictor variable (X)  Criterion variable (Y)
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Chapter 14 Correlation and Regression
Statistics for Psychology CHAPTER SIXTH EDITION Statistics for Psychology, Sixth Edition Arthur Aron | Elliot J. Coups | Elaine N. Aron Copyright © 2013.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Regression and Correlation
Regression Analysis.
Reasoning in Psychology Using Statistics
Statistics for the Social Sciences
Correlation and Simple Linear Regression
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Statistics for the Social Sciences
Linear Regression/Correlation
Statistics for the Social Sciences
Ch 4.1 & 4.2 Two dimensions concept
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 12, 2013 Correlation and Regression

Homework #11 due11/14 Ch 15 # 1, 2, 5, 6, 8, 9, 14

Homework #12 due11/19 Homework #12 due11/19 Chapter 16: 1, 2, 7, 8, 10, 22 (Use SPSS), 24 (HW #13 – the last homework – is due on 11/21)

Comparing computational and definitional formulas for SP The way we calculated it last time (definitional): Computational formula (from textbook): Either way, the formula for r is the same:

Computing r using z-scores If z scores were computed using population standard deviations (σ): If z-scores were computed using sample standard deviations (s):

Hypothesis testing with correlation H 0 : No population Correlation H A : There is a population correlation (Can also have directional hypothesis) Use table B6 in appendix B (gives critical values for r, at different sample sizes, based on t/F statistic that can be calculated)

Scatterplots with Excel & SPSS In SPSS, graphs menu=>legacy dialogues=>scatter/dot=>simple scatter Click on define, and select which variable you want on the x axis and which on the y axis. In Excel, insert menu=>chart=>(choose chart type and specify data range)

Correlations with SPSS & Excel SPSS: Analyze => correlate=> bivariate Then select the variables you want correlation(s) for (can select just one pair, or many variables to get a correlation matrix) Try this with height and shoe size in our data. Now try with height, shoe size, mother’s height, and number of shoes owned. Excel: Arrange data for two variables in two columns or rows & use formula bar to request a correlation: =correl(array1,array2)

Invalid inferences from correlations Why you should always look at the scatter plot before computing (and certainly before interpreting Pearson’s r): Correlations are greatly affected by range of scores in data – Consider height and age relationship – Restricted range example from text (IQ and creativity) or consider SAT and GPA Extreme scores can have dramatic effects on correlations – A single extreme score can radically change r, especially when your sample is small. Relations between variables may differ for subgroups, resulting in misleading r values for aggregate data Curvilinear relations not captured by Pearson’s r

What to do about a curvilinear pattern If pattern is monotonically increasing or decreasing, convert scores to ranks and compute r (using same formula) based on the rank scores. Result is called Spearman’s Rank Correlation Coefficient or Spearman’s Rho and can be requested in your spss output by checking the appropriate box when you select the variables for which you want correlations. If pattern is more complicated (u-shaped or s- shaped, for example), consult more advanced statistics resources.

From Correlation to Regression With correlation, we can examine whether variables X & Y are related With regression, we try to predict the value of one variable given what we know about the other variable and the relationship between the two.

Regression In correlation: it doesn’t matter which variable goes on the X-axis or the Y-axis Y X For regression this is NOT the case The variable that you are predicting goes on the Y-axis (criterion or “dependent” variable) Predicted variable Predicting variable The variable that you are making the prediction based on goes on the X-axis (predictor or “independent” variable) Quiz performance Hours of study

Regression Correlation: “Imagine a line through the points” Y X But there are lots of possible lines One line is the “best fitting line” Regression: compute the equation corresponding to this “best fitting line” Quiz performance Hours of study

The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) 2.0 Y X Y = intercept, when X = 0

The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) 2.0 Change in Y Change in X = slope 0.5 Y X

The equation for a line A brief review of geometry Y = (X)(slope) + (intercept) Y X Y = (X)(0.5) + 2.0

Regression A brief review of geometry Consider a perfect correlation Y = (X)(0.5) + (2.0) Y X Can make specific predictions about Y based on X X = 5 Y = ? Y = (5)(0.5) + (2.0) Y = =

Regression Consider a less than perfect correlation Y X The line still represents the predicted values of Y given X Y = (X)(0.5) + (2.0) X = 5 Y = ? Y = (5)(0.5) + (2.0) Y = =

Regression The “best fitting line” is the one that minimizes the error (differences) between the predicted scores (the line) and the actual scores (the points) Y X Rather than compare the errors from different lines and picking the best, we will directly compute the equation for the best fitting line

Regression The linear model Y = intercept + slope (X) + error Betas (β) are sometimes called parameters Come in two types: standardized unstandardized Now let’s go through an example computing these things

Scatterplot Using the dataset from our correlation example X Y Y X

From when we computed Pearson’s r X Y mean SS Y SS X SP

Computing regression line (with raw scores) X Y SS Y SS X SP mean

Computing regression line (with raw scores) X Y mean Y X

Computing regression line (with raw scores) X Y mean Y X The two means will be on the line

Computing regression line (standardized, using z-scores) Sometimes the regression equation is standardized. –Computed based on z-scores rather than with raw scores Mean X Y Std dev

Computing regression line (standardized, using z- scores) Prediction model – Predicted Z score (on criterion variable) = standardized regression coefficient multiplied by Z score on predictor variable – Formula Sometimes the regression equation is standardized. –Computed based on z-scores rather than with raw scores –The standardized regression coefficient ( β ) In bivariate prediction, β = r

Computing regression line (with z-scores) mean ZYZY ZXZX

Regression Also need a measure of error Y = X(.5) + (2.0) + error Y X Y X Same line, but different relationships (strength difference) Y = intercept + slope (X)+ error The linear equation isn’t the whole thing

Regression Error – Actual score minus the predicted score Measures of error – r 2 (r-squared) –Proportionate reduction in error Note: Total squared error when predicting from the mean = SS Total =SS Y –Squared error using prediction model = Sum of the squared residuals = SS residual = SS error

Exam III Results Bimodal Distribution 96, 82,4,4,6,7, 71,2,3,3,3,4,5,7,8,9, 66, 50,1,2,7 46,9 M = s = If you scored at least 70 you are “keeping up with the pack.” If you scored below 70, you need to put forth more effort (please see me if you want or need help!)

R-squared r 2 represents the percent variance in Y accounted for by X Y X Y X r = 0.8 r = 0.5r 2 = 0.64r 2 = % variance explained 25% variance explained

Computing Error around the line Compute the difference between the predicted values and the observed values (“residuals”) Square the differences Add up the squared differences Y X Sum of the squared residuals = SS residual = SS error

Computing Error around the line X Y mean Predicted values of Y (points on the line) Sum of the squared residuals = SS residual = SS error

Computing Error around the line X Y mean = (0.92)(6) Predicted values of Y (points on the line) Sum of the squared residuals = SS residual = SS error

Computing Error around the line X Y mean = (0.92)(6) = (0.92)(1) = (0.92)(5) = (0.92)(3) = (0.92)(3) Sum of the squared residuals = SS residual = SS error

Computing Error around the line Y X Sum of the squared residuals = SS residual = SS error X Y

Computing Error around the line X Y mean residuals Sum of the squared residuals = SS residual = SS error Quick check = = = = =

Computing Error around the line X Y mean SS ERROR Sum of the squared residuals = SS residual = SS error

Computing Error around the line X Y mean SS ERROR Sum of the squared residuals = SS residual = SS error SS Y

Computing Error around the line Also (like r 2 ) represents the percent variance in Y accounted for by X 3.09 SS ERROR Sum of the squared residuals = SS residual = SS error 16.0 SS Y –Proportionate reduction in error In fact, in bivariate regression it is mathematically identical to r 2

Regression in SPSS Running the analysis in SPSS is pretty easy – Analyze: Regression: Linear – X or predictor variable(s) go into the ‘independent variable’ field – Y or predicted variable goes into the ‘dependent variable’ field You get a lot of output

Regression in SPSS The variables in the model r r 2 Unstandardized coefficients Slope (indep var name) Intercept (constant) Standardized coefficients We’ll get back to these numbers in a few weeks

In Excel With Data Analysis “Tool Pack” you can perform regression analysis With standard software package, you can get bivariate correlation (which is the same as the standardized regression coefficient), you can create a scatterplot, and you can request a trend line, which is a regression line (what is y and what is x in that case?)

Multiple Regression Multiple regression prediction models “fit”“residual”

Prediction in Research Articles Bivariate prediction models rarely reported Multiple regression results commonly reported

Multiple Regression Typically researchers are interested in predicting with more than one explanatory variable In multiple regression, an additional predictor variable (or set of variables) is used to predict the residuals left over from the first predictor.

Multiple Regression Y = intercept + slope (X) + error Bi-variate regression prediction models

Multiple Regression Multiple regression prediction models “fit” “residual” Y = intercept + slope (X) + error Bi-variate regression prediction models

Multiple Regression Multiple regression prediction models First Explanatory Variable Second Explanatory Variable Fourth Explanatory Variable whatever variability is left over Third Explanatory Variable

Multiple Regression Predict test performance based on: First Explanatory Variable Second Explanatory Variable Fourth Explanatory Variable whatever variability is left over Third Explanatory Variable Study time Test time What you eat for breakfast Hours of sleep

Multiple Regression Predict test performance based on: Study time Test time What you eat for breakfast Hours of sleep Typically your analysis consists of testing multiple regression models to see which “fits” best (comparing R 2 s of the models) versus For example:

Multiple Regression Response variable Total variability it test performance Total study time r =.6 Model #1: Some co-variance between the two variables R 2 for Model =.36 64% variance unexplained If we know the total study time, we can predict 36% of the variance in test performance

Multiple Regression Response variable Total variability it test performance Test time r =.1 Model #2: Add test time to the model Total study time r =.6 R 2 for Model =.49 51% variance unexplained Little co-variance between these test performance and test time We can explain more the of variance in test performance

Multiple Regression Response variable Total variability it test performance breakfast r =.0 Model #3: No co-variance between these test performance and breakfast food Total study time r =.6 Test time r =.1 R 2 for Model =.49 51% variance unexplained Not related, so we can NOT explain more the of variance in test performance

Multiple Regression Response variable Total variability it test performance breakfast r =.0 We can explain more the of variance But notice what happens with the overlap (covariation between explanatory variables), can’t just add r’s or r 2 ’s Total study time r =.6 Test time r =.1 Hrs of sleep r =.45 R 2 for Model =.60 40% variance unexplained Model #4: Some co-variance between these test performance and hours of sleep

Multiple Regression The “least squares” regression equation when there are multiple intercorrelated predictor (x) variables is found by calculating “partial regression coefficients” for each x A partial regression coefficient for x 1 shows the relationship between y and x 1 while statistically controlling for the other x variables (or holding the other x variables constant)

Multiple Regression The formula for the partial regression coefficient is : b 1 = (r Y1 -r Y2 r 12 )/(1-r 12 2 )*(s Y /s 1 ) Where r Y1 =correlation of x 1 and y r Y2 =correlation of x 2 and y r 12 =correlation of x 1 and x 2 s Y =standard deviation of y, s 1 =standard deviation of x 1

Multiple Regression Multiple correlation coefficient (R) is an estimate of the relationship between the dependent variable (y) and the best linear combination of predictor variables (correlation of y and y-pred.) R=Cov(y,y-pred)/s y s ypred R 2 tells you the amount of variance in y explained by the particular multiple regression model being tested.

Multiple Regression in SPSS Setup as before: Variables (explanatory and response) are entered into columns A couple of different ways to use SPSS to compare different models

Regression in SPSS Analyze: Regression, Linear

Multiple Regression in SPSS Method 1: enter all the explanatory variables together – Enter: All of the predictor variables into the Independent Variable field Predicted (criterion) variable into Dependent Variable field

Multiple Regression in SPSS The variables in the model r for the entire model r 2 for the entire model Unstandardized coefficients Coefficient for var1 (var name) Coefficient for var2 (var name)

Multiple Regression in SPSS The variables in the model r for the entire model r 2 for the entire model Standardized coefficients Coefficient for var1 (var name)Coefficient for var2 (var name)

Multiple Regression –Which coefficient to use, standardized or unstandardized? –Unstandardized coefficients are easier to use if you want to predict a raw score based on raw scores (no z-scores needed). –Standardized coefficients are nice to directly compare which variable is most “important” in the equation

Multiple Regression in SPSS Predicted (criterion) variable into Dependent Variable field First Predictor variable into the Independent Variable field Click the Next button Method 2: enter first model, then add another variable for second model, etc. –Enter:

Multiple Regression in SPSS Method 2 cont: – Enter: Second Predictor variable into the Independent Variable field Click Statistics

Multiple Regression in SPSS –Click the ‘R squared change’ box

Multiple Regression in SPSS The variables in the first model (math SAT) Shows the results of two models The variables in the second model (math and verbal SAT)

Multiple Regression in SPSS The variables in the first model (math SAT) r 2 for the first model Coefficients for var1 (var name) Shows the results of two models The variables in the second model (math and verbal SAT) Model 1

Multiple Regression in SPSS The variables in the first model (math SAT) Coefficients for var1 (var name) Coefficients for var2 (var name) Shows the results of two models r 2 for the second model The variables in the second model (math and verbal SAT) Model 2

Multiple Regression in SPSS The variables in the first model (math SAT) Shows the results of two models The variables in the second model (math and verbal SAT) Change statistics: is the change in r 2 from Model 1 to Model 2 statistically significant?

Cautions in Multiple Regression We can use as many predictors as we wish but we should be careful not to use more predictors than is warranted. –Simpler models are more likely to generalize to other samples. –If you use as many predictors as you have participants in your study, you can predict 100% of the variance. Although this may seem like a good thing, it is unlikely that your results would generalize to any other sample and thus they are not valid. –You probably should have at least 10 participants per predictor variable (and probably should aim for about 30).