Statistics for the Social Sciences

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Correlation and regression
Probability & Statistical Inference Lecture 9
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Prediction with multiple variables Statistics for the Social Sciences Psychology 340 Spring 2010.
Overview Correlation Regression -Definition
Objectives (BPS chapter 24)
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Correlation and Simple Linear Regression PSY440 June 10, 2008.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Regression, Residuals, and Coefficient of Determination Section 3.2.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Correlation and Regression
Inference for regression - Simple linear regression
Ch4 Describing Relationships Between Variables. Pressure.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Relationships between variables Statistics for the Social Sciences Psychology 340 Spring 2010.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Examining Relationships in Quantitative Research
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Regression Regression relationship = trend + scatter
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 12, 2013 Correlation and Regression.
Chapter 4 Prediction. Predictor and Criterion Variables  Predictor variable (X)  Criterion variable (Y)
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
REGRESSION MODELS OF BEST FIT Assess the fit of a function model for bivariate (2 variables) data by plotting and analyzing residuals.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Linear Regression.
Reasoning in Psychology Using Statistics
Statistics for the Social Sciences
Correlation and Simple Linear Regression
Multiple Regression.
(Residuals and
Statistics for the Social Sciences
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
No notecard for this quiz!!
Correlation and Simple Linear Regression
Statistics for the Social Sciences
M248: Analyzing data Block D UNIT D2 Regression.
Simple Linear Regression and Correlation
Product moment correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction

Outline (for week) Simple bi-variate regression, least-squares fit line The general linear model Residual plots Using SPSS Multiple regression Comparing models, (?? Delta r2)

Regression Last time: with correlation, we examined whether variables X & Y are related This time: with regression, we try to predict the value of one variable given what we know about the other variable and the relationship between the two.

Regression Last time: “it doesn’t matter which variable goes on the X-axis or the Y-axis” The variable that you are predicting goes on the Y-axis (criterion variable) Predicted variable For regression this is NOT the case Y X 1 2 3 4 5 6 Quiz performance Predicting variable The variable that you are making the prediction based on goes on the X-axis (predictor variable) Hours of study

Regression Last time: “Imagine a line through the points” But there are lots of possible lines Y X 1 2 3 4 5 6 One line is the “best fitting line” Today: learn how to compute the equation corresponding to this “best fitting line” Quiz performance Hours of study

The equation for a line A brief review of geometry Y = intercept, when X = 0 Y X 1 2 3 4 5 6 Y = (X)(slope) + (intercept) 2.0

The equation for a line A brief review of geometry X 1 2 3 4 5 6 Y = (X)(slope) + (intercept) 0.5 2.0 1 2 Change in Y Change in X = slope

The equation for a line A brief review of geometry X 1 2 3 4 5 6 Y = (X)(slope) + (intercept) Y = (X)(0.5) + 2.0

Regression A brief review of geometry Consider a perfect correlation X = 5 Y = ? Y X 1 2 3 4 5 6 Y = (X)(0.5) + (2.0) Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5 Can make specific predictions about Y based on X

Regression Consider a less than perfect correlation The line still represents the predicted values of Y given X X = 5 Y = ? Y X 1 2 3 4 5 6 Y = (X)(0.5) + (2.0) Y = (5)(0.5) + (2.0) Y = 2.5 + 2 = 4.5 4.5

Regression The “best fitting line” is the one that minimizes the error (differences) between the predicted scores (the line) and the actual scores (the points) Y X 1 2 3 4 5 6 Rather than compare the errors from different lines and picking the best, we will directly compute the equation for the best fitting line

Regression The linear model Y = intercept + slope (X) + error Beta’s () are sometimes called parameters Come in two types: standardized unstanderdized Now let’s go through an example computing these things

Scatterplot Using the dataset from our correlation lecture Y X Y 6 6 6 6 6 1 2 5 5 6 4 3 3 4 2 3 2 1 X 1 2 3 4 5 6

From the Computing Pearson’s r lecture mean 3.6 4.0 2.4 -2.6 1.4 -0.6 0.0 2.0 -2.0 4.8 5.2 2.8 1.2 5.76 6.76 1.96 0.36 X Y 6 6 1 2 5 6 3 4 3 2 14.0 15.20 16.0 SSY SSX SP

Computing regression line (with raw scores) X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0 14.0 15.20 16.0 SSY SSX SP

Computing regression line (with raw scores) Y X 1 2 3 4 5 6 X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0

Computing regression line (with raw scores) Y X Y 6 6 6 1 2 5 5 6 4 The two means will be on the line 3 3 4 2 3 2 1 mean 3.6 4.0 X 1 2 3 4 5 6

Computing regression line (standardized, using z-scores) Sometimes the regression equation is standardized. Computed based on z-scores rather than with raw scores X Y 6 6 2.4 5.76 2.0 4.0 1.38 1.1 1 2 -2.6 6.76 -2.0 4.0 -1.49 -1.1 5 6 1.4 1.96 2.0 4.0 0.8 1.1 3 4 -0.6 0.36 0.0 0.0 - 0.34 0.0 3 2 -0.6 0.36 -2.0 4.0 - 0.34 -1.1 Mean 3.6 4.0 0.0 15.20 0.0 16.0 0.0 0.0 1.74 1.79 Std dev

Computing regression line (standardized, using z-scores) Sometimes the regression equation is standardized. Computed based on z-scores rather than with raw scores Prediction model Predicted Z score (on criterion variable) = standardized regression coefficient multiplied by Z score on predictor variable Formula 1.38 1.1 -1.49 -1.1 0.8 1.1 - 0.34 0.0 - 0.34 -1.1 The standardized regression coefficient (β) In bivariate prediction, β = r 0.0 0.0

Computing regression line (with z-scores) ZY 2 1 1.38 1.1 ZX -1.49 -1.1 -2 -1 1 2 -1 0.8 1.1 - 0.34 0.0 -2 - 0.34 -1.1 0.0 0.0 mean

Regression The linear equation isn’t the whole thing Y = intercept + slope (X) + error The linear equation isn’t the whole thing Also need a measure of error Y = X(.5) + (2.0) + error Y = X(.5) + (2.0) + error Same line, but different relationships (strength difference) Y X 1 2 3 4 5 6 Y X 1 2 3 4 5 6

Regression Error Actual score minus the predicted score Measures of error r2 (r-squared) Proportionate reduction in error Note: Total squared error when predicting from the mean = SSTotal=SSY Squared error using prediction model = Sum of the squared residuals = SSresidual= SSerror

R-squared r2 represents the percent variance in Y accounted for by X 1 2 3 4 5 6 Y X 1 2 3 4 5 6 64% variance explained 25% variance explained

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror Compute the difference between the predicted values and the observed values (“residuals”) Y X 1 2 3 4 5 6 Square the differences Add up the squared differences

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 Predicted values of Y (points on the line) 4.0

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 = (0.92)(6)+0.688 1 2 5 6 3 4 3 2 mean 3.6 Predicted values of Y (points on the line) 4.0

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 = (0.92)(6)+0.688 1 2 1.6 = (0.92)(1)+0.688 5 6 5.3 = (0.92)(5)+0.688 3 4 3.45 = (0.92)(3)+0.688 3 2 3.45 = (0.92)(3)+0.688 mean 3.6 4.0

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror X Y Y X 1 2 3 4 5 6 6 6 6.2 6.2 1.6 5.3 3.45 1 2 1.6 5 6 5.3 3 4 3.45 3 2 3.45

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror residuals X Y 6 6 6.2 6 - 6.2 = -0.20 1 2 1.6 2 - 1.6 = 0.40 5 6 5.3 6 - 5.3 = 0.70 3 4 3.45 4 - 3.45 = 0.55 3 2 3.45 -1.45 2 - 3.45 = mean 3.6 4.0 Quick check 0.00

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror X Y 6 6 6.2 -0.20 0.04 1 2 1.6 0.40 0.16 5 6 5.3 0.70 0.49 3 4 3.45 0.55 0.30 3 2 3.45 -1.45 2.10 mean 3.6 4.0 0.00 3.09 SSERROR

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror 4.0 0.0 16.0 SSY X Y 6 6 6.2 -0.20 0.04 1 2 1.6 0.40 0.16 5 6 5.3 0.70 0.49 3 4 3.45 0.55 0.30 3 2 3.45 -1.45 2.10 mean 3.6 4.0 0.00 3.09 SSERROR

Computing Error around the line Sum of the squared residuals = SSresidual = SSerror Proportionate reduction in error Also (like r2) represents the percent variance in Y accounted for by X In fact, it is mathematically identical to r2 16.0 3.09 SSY SSERROR

Seeing patterns in the error Residual plots The sum of the residuals should always equal 0 (as should the mean). the least squares regression line splits the data in half, half of the error is above the line and half is below the line. In addition to summing to zero, we also want there the residuals to be randomly distributed. That is, there should be no pattern to the residuals. If there is a pattern, it may suggest that there is more than a simple linear relationship between the two variables. Residual plots are very useful tools to examine the relationship even further. These are basically scatterplots of the residuals () against the Explanatory (X) variable (note: the examples actually plot the residuals that have transformed into z-scores).

Seeing patterns in the error Scatter plot Residual plot The scatterplot shows a nice linear relationship. The residual plot shows that the residuals fall randomly above and below the line. Critically there doesn't seem to be a discernable pattern to the residuals.

Seeing patterns in the error Scatter plot Residual plot The residual plot shows that the residuals get larger as X increases. This suggests that the variability around the line is not constant across values of X. This is referred to as a violation of homogeniety of variance. The scatterplot also shows a nice linear relationship.

Seeing patterns in the error Scatter plot Residual plot The scatterplot shows what may be a linear relationship. The residual plot suggests that a non-linear relationship may be more appropriate (see how a curved pattern appears in the residual plot).

Regression in SPSS Running the analysis is pretty easy Analyze: Regression: Linear Predictor variables go into the ‘independent variable’ field (Predicted variable) goes into the “dependent variable’ field You get a lot of output

Regression in SPSS The variables in the model r r2 We’ll get back to these numbers in a few weeks Unstandardized coefficients Slope (indep var name) Intercept (constant) Standardized coefficients

Multiple Regression Multiple regression prediction models “fit” “residual”

Prediction in Research Articles Bivariate prediction models rarely reported Multiple regression results commonly reported