CORRELATION AND MULTIPLE REGRESSION ANALYSIS By PROF. KAMBALE F.J. DEPARTMENT OF ECONOMICS, Rayat Shikshan Sanstha’s S.M.JOSHI COLLEGE HADAPSAR, PUNE
Business Physical Sciences Areas where Health & STATISTICS Medicine Economics, Engineering, Marketing, Computer Science Physical Sciences Astronomy, Chemistry, Physics Areas where STATISTICS are used Health & Medicine Genetics, Clinical Trials, Epidemiology, Pharmacology Environment Agriculture, Ecology, Forestry, Animal Populations Government Census, Law, National Defense
Measures associations between two variables. Used to CORRELATION ANALYSIS Simple Correlation Measures associations between two variables. Used to establish relationship between variables. Karl Pearson’s method is commonly used to establish relationship. The correlation can be Positive, Negative, Perfect and zero. It takes values between -1 to +1
Correlation Measures the relative strength of the linear relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship
Correlation coefficient Pearson’s Correlation Coefficient is standardized covariance (unitless):
Estimation Var(x) = = 2.916666 Var(y) = 5.83333 Cov(xy) = 2.91666 R2=“Coefficient of etermination” = SSexplained/TSS Interpretation of R2: 50% of the total variation in the sum of the two dice is explained by the roll on the first die. Makes perfect intuitive sense!
Scatter Plots of Data with Various Correlation Coefficients Y Y Y X X X r = -1 r = -.6 r = 0 Y Y Y X X X r = +1 r = +.3 r = 0 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation Linear relationships Curvilinear relationships Y Y X X Y Y X X Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation Strong relationships Weak relationships Y Y X X Y Y Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Linear Correlation No relationship Y X Y X Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
Some calculation formulas… Note: Easier computation formulas:
PARTIAL CORRELATION Partial Correlation The correlation between two variable may be because of other variables in the study. In partial correlation the effect of other variable is eliminated or kept constant
MULTIPLE CORRELATION Multiple Correlation Definition: Joint effect of K independent variables on the dependent variable is studied Coefficient of multiple determination (R square) The square of the multiple correlation is called
Multiple Correlation Consider following Multiple Linear Regression equation Y= a +b1 X1 +b2 X2 +b3 X3+e Y: Yield of Crop ( Dependent variable ) X1: N-Levels X2: Phosphorous levels X3: Potash Levels Let R square = 0.45 CONCLUSION: Indicated that 45% variation in dependent variable ( Y has been explained by three independent variables in the study. It has application in all fields
Regression Analysis Regression: The functional relationship between two or more variables is studied in regression. Regression Coefficient: Measures change in the values of dependent variable for unit change in the values of independent variable.
Multiple Regression Multiple Regression Y= a +b1 X1 +b2 X2 +b3 X3+b4 X4 + b5 X+…..+ e Y: Dependent variable X1, X2, X3…… are Independent Variables R Square : Measures variation covered by various independent variables in dependent variable It has application in all fields
Multivariate regression pitfalls Multi-collinearity Residual confounding Overfitting
Regression coefficient The regression coefficient is the slope of the regression line and tells you what the nature of the relationship between the variables is. How much change in the independent variables is associated with how much change in the dependent variable. The larger the regression coefficient the more change.
Regression Line Regression line is the best straight line description of the plotted points and use can use it to describe the association between the variables. If all the lines fall exactly on the line then the line is 0 and you have a perfect relationship.
Regression Coefficients B - These are the values for the regression equation for predicting the dependent variable from the independent variable. These are called unstandardized coefficients because they are measured in their natural units. As such, the coefficients cannot be compared with one another to determine which one is more influential in the model, because they can be measured on different scales.
Coefficients for Two Independent Variables This chart looks at two variables and shows how the different bases affect the B value. That is why you need to look at the standardized Beta to see the differences.
USING SPSS When you run regression analysis on SPSS you get a 3 tables. Each tells you something about the relationship. The first is the model summary. The R is the Pearson Product Moment Correlation Coefficient. In this case R is .736 R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.
R-Square R-Square is the proportion of variance in the dependent variable (income per capita) which can be predicted from the independent variable (level of education). This value indicates that 54.2% of the variance in income can be predicted from the variable education. R-Square is also called the coefficient of determination.
Adjusted R-square The adjusted R-square attempts to yield a more honest value to estimate the R-squared for the population. The value of R-square was .542, while the value of Adjusted R-square was .532. There isn’t much difference because we are dealing with only one variable. When the number of observations is small and the number of predictors is large, there will be a much greater difference between R-square and adjusted R-square. .
ANOVA If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent variables does not reliably predict the dependent variable.
Regression Coefficients Beta - The are the standardized coefficients. These are the coefficients that you would obtain if you standardized all of the variables in the regression, including the dependent and all of the independent variables, and ran the regression. By standardizing the variables before running the regression, you have put all of the variables on the same scale, and you can compare the magnitude of the coefficients to see which one has more of an effect. You will also notice that the larger betas are associated with the larger t-values.
How to translate a typical table Regression Analysis Level of Education by Income per capita
Multiple Regression Single
Single Regression Multiple Regression
Research Designs Sampling Technique Random Sampling Purposive Sampling
Simple Random Sampling Sample Survey Designs Simple Random Sampling Samples are selected randomly without disturbing the population Stratified Random Sampling Population is first divided in to homogeneous sub groups called strata and from each strata samples are randomly selected Probability Proportionate To Size (PPS) Samples are selected proportionate to size of population
Economics and Social Science Simple Growth Rate Gives growth in absolute form Compound Growth Rate Indicates percent per annum growth in Output
Economics and Social Science Index Numbers Change over base year is studied Time Series Analysis Trend over a period of time generally taken as years Seasonal Variation Analysis related to change over season
Statistical Softwares SPSS SAS SIS STAT M STAT INDO STAT
THERE IS STRENGTH IN NUMBERS And Always Remember THERE IS STRENGTH IN NUMBERS
THANK YOU