Theme 6. Linear regression

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chapter 10 Curve Fitting and Regression Analysis
Understanding the General Linear Model
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Statistics for the Social Sciences
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Correlation and Regression Analysis
Relationships Among Variables
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Chapter 8: Bivariate Regression and Correlation
Linear Regression and Correlation
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Chapter 6 & 7 Linear Regression & Correlation
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Part IV Significantly Different: Using Inferential Statistics
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Linear Regression
Multiple Regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Regression Analysis.
Computations, and the best fitting line.
Correlation and Simple Linear Regression
The Lease Squares Line Finite 1.3.
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Multiple Regression.
Linear Regression Prof. Andy Field.
Simple Linear Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 29: Multiple Regression*
Multiple Regression.
Correlation and Simple Linear Regression
Statistics for the Social Sciences
Least Squares Method: the Meaning of r2
M248: Analyzing data Block D UNIT D2 Regression.
Simple Linear Regression and Correlation
Product moment correlation
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Regression & Prediction
Regression & Correlation (1)
Introduction to Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Regression Part II.
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Theme 6. Linear regression 1. Introduction. 2. The equation of the line. 3. The least squares criterion. 4. Graphical representation. 5. Standardized regression coefficients. 6. The coefficient of determination. 7. Introduction to multiple regression.

Introduction The establishment of a correlation between two variables is important, but this is considered a first step when predicting one variable from the other. (Or others, in the case of multiple regression; ie., multiple predictors.) Of course, if we know that the variable X is closely related to Y, this means that we can predict Y from X. We are now in the field of prediction. (Obviously if X is unrelated to Y, X does not serve as predictor of Y.) Note: We will use the terms "regression" and "prediction" as almost synonymous. (The reason for the use of the term "regression" is old, and has remained as such.)

Introduction To simplify, we will focus (for simplicity) in the case that the relationship between X and Y is linear. Performance Of course, the issue now is how to get the "best" line that connects the dots. We need a criterion. While there are other criteria, the most commonly used, and we will see here, it is the least squares criterion. IQ Least squares criterion: it minimizes the squared distances of the points on the line.

Review of the equation of a line Y=A+BX A is the intercept (This is where the short straight the Y axis) B is the slope (observe that in the case of positive relationships, B is positive, in the case of the negative ratio, B is negative, if there is no relation, B will be approximately 0) PErformance IQ If we want to predict Y from X, we need to calculate (in the case of linear relationship) the regression of Y on (from) X.

Calculation of the linear regression equation (Y on X) The least squares criterion gives us a value of A and one of B, such that Y’ Performance (Y) Is minimal IQ (X)

Calculation of the linear regression equation (Y on X) IQ (X) Performance (Y) 120 10 100 9 90 4 110 6

Calculation of the linear regression equation (Y on X) The least-squares line is: Y’=-8’5+0’15X Is minimal It is 11.5 in this case Important notes: -Each Unit in IQ increases the grde in 0’15. Although in this case, the following does not make sense, a person with IQ of 0, would be predicted a grade of -8.5

Calculation of the linear regression equation (Y on X) Formulae... In direct scores intercept slope Note: Both A and B can be easily obtained in any calculator with "LR" option (Linear Regression)

Calculation of the linear regression equation (Y on X) then Y’=-8’5+0’15X

Calculation of the linear regression equation (Y on X) The formulas in differential scores Notice that the average of X and Y will mean in typical score 0 intercept IMPORTANT: B = b That is, the slope differential scores is the same as in direct scores slope Therefore, the regression in differential scores is in our case: y’=0’15x

Calculation of the linear regression equation (Y on X) Formulas in standarized scores As in differential scores intercept This is actually Pearson’s coefficient slope Therefore, the regression line in standard scores is in our case: zy’ =0’703zx

Calculation of the linear regression equation (Y on X) OUTPUT from SPSS Intercept and slow (stand.scores) Ord. y pendiente (punt.directas) Note: in standardized scores, the slope matches Pearsons’ r coefficiente.

Calculation of the linear regression equation (Y on X) We know that y And from Theme 5 And also therefore

Calculation of the linear regression equation (Y on X) Therefore, and

Prediction errors in the regression line of Y on X Observed scores Predicted scores Prediction errors with the equation The question now is how much variance is reduced by using the regression of Y on X (ie, having X as a predictor) compared to the case where we did not have the regression line

Prediction errors in the regression of Y on X If we had no predictors, what could score for scores of Y? In this case, since the least squares criterion, whether we are in Y and   lack data on X, our best estimate of Y will be your average Recall that the average minimizes the sum of the differences quadratic es minimal If we use the average as a predictor, the variance of the predictions will be

Prediction errors in the regression of Y on X But if we have a predictor X, the variance will be This is the variance of Y not explained by X It can be proven that And as a result

How good is the prediction of the regression line How good is the prediction of the regression line? The coefficient of determination as an index of the goodness of fit of our model (the regression line) We just showed that This is called coefficient of determination. It indicates how good the fit of the regression line (or in general linear model). It is bounded between 0 and 1. If all the points in the scatterplot are on the line (with a slope different from 0), then it will be 0, and the coefficient of determination is 1

The coefficient of determination and the proportion of variance associated / explained / Common (2) Let's start with a tautology This expression indicates that the observed score for the ith subject is equal to the predicted score for said subject over a prediction error. It can be shown that the predicted scores and the prediction errors are independent, so we can indicate the following Total variance of Y Variance of the predicted data in Y Variance of the residuals (errors) when using the equation to predict Y

The coefficient of determination and the proportion of variance associated / explained / Common (2) From the previous slide, we have As we know that therefore In short, the coefficient of determination measures the proportion of the variance of Y that is associated / explained by the predictor X

Introduction to multiple linear regression (1) We have seen the case of a predictor (X) and a variable predicted (Y), and obtained the regression of Y on X by the least squares method. Given the nature of human behavior, in which each observed behavior can be influenced by different variables, We can have several predictors X1, X2, .. .. to predict Y (or if you prefer, several predictors, X2, X3, ...., to predict X1). This is the case of multiple regression. So far we has Dependent variable But now we will have k predictors: Predictors (independent) variabless

Introduction to multiple linear regression (2) Recta regresión Introduction to multiple linear regression (2) It is important that you realize that the B2, B3, ..., weights are similar to those we saw in the case of the regression line. For instance Such coefficients represent how important predictor respective variable in the regression equation. As was the case in the regression line (notice that the case of 1 predictor is a particular case of multiple regression), A represents where the multiple regression hyperplane short axis of the predicted variable. For simplicity, the whole process is usually done by computer, so we will not see the formulas...

Introduction to multiple linear regression (3) In raw scores, the regression equation is that we know In differential scores, remember that A is 0 in the regression; the same applies in the regression equation. . And applying the same logic, the value of the weights is the same as we had in direct scores etcétera

Introduction to multiple linear regression (4) Data(N=5) Rendim Ansied Neurot 9 3 5 3 12 15 6 8 8 2 9 7 7 7 6 As in the case of one predictor

The general linear model The general linear model underlies much of the statistical tests that are conducted in psychology and other social sciences. To say a few: Regression analyses (already seen) Analysis of Variance (2nd semester will be) T-Test (2nd term) Analysis of Covariance Analysis of clusters (cluster analysis) -Factorial analysis -Discriminant analysis ….

The general linear model (2) Clearly, the regression analyses we have seen is a particular case of the general linear model or Observed = Predicted + prediction error In general terms

The general linear model The general expression is Y: Dependent Variable X1, X2, ..., independent variables (predictors of Y) e: random error B1, B2, ..., they are the weights that determine the contribution of each independent variable.