Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.

Slides:



Advertisements
Similar presentations
Sleeping and Happiness
Advertisements

Lesson 10: Linear Regression and Correlation
Bivariate Analyses.
Objectives (BPS chapter 24)
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Linear Regression and Correlation
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Lecture 5 Correlation and Regression
Correlation & Regression
Correlation and Regression
Lecture 16 Correlation and Coefficient of Correlation
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Chapter 6 & 7 Linear Regression & Correlation
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Part IV Significantly Different: Using Inferential Statistics
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Practice You collect data from 53 females and find the correlation between candy and depression is Determine if this value is significantly different.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
You can calculate: Central tendency Variability You could graph the data.
With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per.
SPSS SPSS Problem # (7.19) 7.11 (b) You can calculate: Central tendency Variability You could graph the data.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Correlation and Regression
Chapter 13 Simple Linear Regression
Simple Bivariate Regression
Regression Analysis: Statistical Inference
Correlation, Bivariate Regression, and Multiple Regression
Practice. Practice Practice Practice Practice r = X = 20 X2 = 120 Y = 19 Y2 = 123 XY = 72 N = 4 (4) 72.
10.2 Regression If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is.
Review Guess the correlation
Correlation and Simple Linear Regression
Regression 10/29.
Simple Linear Regression
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 14: Correlation and Regression
Quantitative Methods Simple Regression.
Correlation and Simple Linear Regression
What if. . . You were asked to determine if psychology and sociology majors have significantly different class attendance (i.e., the number of days a person.
Practice N = 130 Risk behaviors (DV; Range 0 – 4) Age (IV; M = 10.8)
Practice N = 130 Risk behaviors (DV; Range 0 – 4) Age (IV; M = 10.8)
Correlation and Regression
Correlation and Simple Linear Regression
You can calculate: Central tendency Variability You could graph the data.
Simple Linear Regression and Correlation
Product moment correlation
Inferential Statistics
Introduction to Regression
Sleeping and Happiness
Chapter Thirteen McGraw-Hill/Irwin
3 basic analytical tasks in bivariate (or multivariate) analyses:
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Review

. . . . .

Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

Computational formula

Correlation

Hypothesis testing of r Is there a significant relationship between X and Y (or are they independent)? Are two independent correlations significantly different than each other?

Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster of scores around this regression line (i.e., the correlation coefficient)

. . . . .

Regression Equation Y = a + bX Where: Y = value predicted from a particular X value a = point at which the regression line intersects the Y axis b = slope of the regression line X = X value for which you wish to predict a Y value

Regression

How to draw the regression line . . . . .

Hypothesis Testing Have learned How to calculate r as an estimate of relationship between two variables How to calculate b as a measure of the rate of change of Y as a function of X Next determine if these values are significantly different than 0

Testing b The significance test for r and b are equivalent If X and Y are related (r), then it must be true that Y varies with X (b). Important to learn b significance tests for multiple regression

Calculate t-observed b = Slope Sb = Standard error of slope

Multiple Regression Good news! No Math Bad news! Too complicated to do by hand Bad news! Almost all conceptual

Causal Models X (IV) the cause of Y (DV)

Causal Models X (IV) is the cause of Y (DV) This is an assumption – causation is not demonstrated with statistics! X Y

Remember Candy Depression Charlie 5 55 Augustus 7 43 Veruca 4 59 Mike 108 Violet 65

Remember Y = 127 + -13.26(X) COV = -30.5 N = 5 r = -.81 Sx = 1.52 Sy = 24.82

Causal Models -13.26 Candy Depression

Example Data collected from 15 people Salary Years since Ph.D. Publications

Example Predict the salary of a person from the time since their Ph.D. (in years)

Example Predict the salary of a person from the time since their Ph.D. (in years) Y = 51,670 + 1218(X) What do these mean? $51,670 a person tends to earn after graduating (Years = 0) Each year after that a person’s salary increase $1,218 a year

Causal Models 1,218 Years since Ph.D. Salary

Example Predict the salary of a person from the number of publications they have

Causal Models 334 Publications Salary

What if we have two IVs? It is possible to use two IVs at the same time to predict a DV Use both publications and years since Ph.D. to predict salary

Causal Models Publications 334 Salary 1,218 Years since Ph.D.

Causal Models Publications 334 Salary 1,218 Years since Ph.D. How to interpret values if IVs are independent

Causal Models Publications 334 Salary 1,218 Years since Ph.D. Problem: Information provided by publications and Years is probably somewhat redundant

Causal Models Publications Salary r = .66 Years since Ph.D.

Causal Models Publications Salary r = .66 Years since Ph.D. Must estimate these regression coefficients so this relationship is taken into account (called “partial regression coefficients”)

Regression Coefficients Basic logic is exactly the same as normal regression Least squares Has one intercept and each of the IVs has one slope

Regression Coefficients bo the intercept b1 the slope of the first IV b2 the slope of the second IV bp the slope of p IV Y = bo + b1 (X1) + b2 (X2) +. . . .+ bp (Xp)

Example Predict the salary of a person from the number of publications they have and the years since they got their Ph.D.

Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications

Causal Models Publications 122 Salary r = .66 977 Years since Ph.D.

Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?

Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who just graduated (years = 0) with 2 publications likely earn?

Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?

Regression Coefficients Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications What does a person who graduated 10 years ago with no publications make?

Question Current Problem Y = Salary X1 = Years since Ph.D.; X2 = Publications Which IV has a greater “effect” of salary?

Standardized Regression Coefficients Conceptually the same as standardizing all variables and then doing regression analysis Why does this work? Example with Years predicting Salary

Standardized Regression Coefficients With a single predictor – Unstandardized 1,218 Years since Ph.D. Salary With a single predictor -- Standardized .71 Years since Ph.D. Salary

Standardized Regression Coefficients With single IV Correlation between years and salary (r = .71) is the SAME as the standardized regression weight!

Standardized Regression Coefficients β1 = Standardized Regression of first IV β2 = Standardized Regression of second IV βp = Standardized Regression of p IV β0 = Intercept (always = 0)

Remember Publications 122 Salary r = .66 977 Years since Ph.D.

Standardized Publications .21 Salary r = .66 .57 Years since Ph.D.

Regression Coefficients Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) Which IV has a greater “effect” of salary? Can interpret in SD units

Regression Coefficients Current Problem Yz = Standardized Salary Z1 = Years since Ph.D. (Standardized) Z2 = Publications (Standardized) What would you predict the salary to be if a person’s Years = 1.2 and a persons publications = -.50? Interpret what these values mean!

Testing the full model How well does the model predict? The fit test for the full model and its significance are equal for both standardized and unstandardized models

Person Z1 Z2 ZY 1 -1.26 .35 -.83 2 -.53 -.24 3 -.63 -1.40 -.84 4 .63 1.23 1.56 5 1.26 .36

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

Multiple R

Multiple R

Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

Multiple R Commonly used as R2 Can be tested for significance Pros and Cons Can be tested for significance Does the set of variables (taken together) predict Y at better than chance levels? H1 : R* > 0 Ho : R* = 0

Person Z1 Z2 ZY Pred 1 -1.26 .35 -.83 -.477 2 -.53 -.24 -.287 3 -.63 -.53 -.24 -.287 3 -.63 -1.40 -.84 -1.097 4 .63 1.23 1.56 1.01 5 1.26 .36 .862 r = .902

Significance testing for Multiple R p = number of predictors N = total number of observations

Significance testing for Multiple R p = number of predictors N = total number of observations

Significance testing for Multiple R p = number of predictors N = total number of observations

Significance testing for Multiple R Fcrit Page # 737 Need two df Numerator df = p Denominator df = N – p - 1

Significance testing for Multiple R Fcrit Need two df Numerator df = p Denominator df = N – p - 1 F (2, 2) = 19.00

Multiple R If F > Fcrit reject Ho and accept H1 If F < or = Fcrit fail to reject Ho Current problem – fail to reject Ho These two variables do not predict the outcome

Practice The teaching salary example Based on 15 people Two IVs

Significance testing for Multiple R p = number of predictors N = total number of observations

Significance testing for Multiple R F crit (2,12) = 3.89

Multiple R If F > Fcrit reject Ho and accept H1 If F < or = Fcrit fail to reject Ho Current problem – accept H1 These two variables do predict the outcome

Detour Moving back to issues of correlation This will help with . . .

Testing for Significance Once an equation is created (standardized or unstandardized) typically test for significance. Two levels 1) Level of each regression coefficient 2) Level of the entire model

How strong is the relationship between publications and salary if we partial out the effect of years? What this is saying

Salary

Salary Publications

r2 SP = .35 Salary Publications

r2 SP = .35 Salary Publications r2 is a ratio = Variance explained / Total Variance Total variance of Salary = 1 (standardized)

r2 SP = .35 .65 Salary Publications

e Salary a b c Publications Years

? e Salary a b c Publications Years

? e Salary a b c Publications Years How strong is the relationship between publications and salary if we partial out the effect of years?

Semipartial correlation of publications and salary Years

Semipartial correlation of publications and salary Years Multiple R2 = a + c + b

Multiple R

R2 = .53 or a + b + c e Salary a b c Publications Years

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c e Salary Publications Years

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c)

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = (a + b + c) – (b + c) or R2 – r2sy

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years a = R2 – r2sy = .53 - .50 = .03 Thus semipartial correlation = .17

R2 = .53 or a + b + c r2SY = .50 or b + c r2SP = .35 or a + c So what is just “a”? e Salary a b c Publications Years What is the correlation between years and salary controlling for publications?