Regression Analysis.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and Linear Regression.
Review ? ? ? I am examining differences in the mean between groups
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Chapter 9.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Relationships Among Variables
Correlation & Regression
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Example of Simple and Multiple Regression
Lecture 15 Basics of Regression Analysis
Introduction to Linear Regression and Correlation Analysis
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation & Regression
Examining Relationships in Quantitative Research
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Examining Relationships in Quantitative Research
Correlation & Regression Analysis
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 Correlation and Regression Analysis Lecture 11.
Regression Chapter 5 January 24 – Part II.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Stats Methods at IC Lecture 3: Regression.
Statistical analysis.
Regression Analysis.
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Statistical analysis.
Regression Analysis.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Correlation and Regression
7.3 Best-Fit Lines and Prediction
12 Inferential Analysis.
Simple Linear Regression
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
Correlation and Regression
Stats Club Marnie Brennan
7.3 Best-Fit Lines and Prediction
CORRELATION ANALYSIS.
Correlation and Simple Linear Regression
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Simple Linear Regression
12 Inferential Analysis.
Simple Linear Regression and Correlation
Product moment correlation
An Introduction to Correlational Research
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
Inferential Statistics
Regression Forecasting and Model Building
Regression Analysis.
Chapter Nine: Using Statistics to Answer Questions
Introduction to Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Regression Part II.
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Regression Analysis

Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is wise to conduct a scatter plot analysis. The reason? Regression analysis assumes a linear relationship. If you have a curvilinear relationship or no relationship, regression analysis is of little use.

Types of Lines

Scatter plot This is a linear relationship It is a positive relationship. As population with BA’s increases so does the personal income per capita.

Regression Line Regression line is the best straight line description of the plotted points and use can use it to describe the association between the variables. If all the lines fall exactly on the line then the line is 0 and you have a perfect relationship.

Things to remember Regressions are still focuses on association, not causation. Association is a necessary prerequisite for inferring causation, but also: The independent variable must preceded the dependent variable in time. The two variables must be plausibly lined by a theory, Competing independent variables must be eliminated.

Regression Table The regression coefficient is not a good indicator for the strength of the relationship. Two scatter plots with very different dispersions could produce the same regression line.

Regression coefficient The regression coefficient is the slope of the regression line and tells you what the nature of the relationship between the variables is. How much change in the independent variables is associated with how much change in the dependent variable. The larger the regression coefficient the more change.

Pearson’s r To determine strength you look at how closely the dots are clustered around the line. The more tightly the cases are clustered, the stronger the relationship, while the more distant, the weaker. Pearson’s r is given a range of -1 to + 1 with 0 being no linear relationship at all.

Reading the tables When you run regression analysis on SPSS you get a 3 tables. Each tells you something about the relationship. The first is the model summary. The R is the Pearson Product Moment Correlation Coefficient. In this case R is .736 R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.

R-Square R-Square is the proportion of variance in the dependent variable (income per capita) which can be predicted from the independent variable (level of education).  This value indicates that 54.2% of the variance in income can be predicted from the variable education.  Note that this is an overall measure of the strength of association, and does not reflect the extent to which any particular independent variable is associated with the dependent variable.  R-Square is also called the coefficient of determination.

Adjusted R-square As predictors are added to the model, each predictor will explain some of the variance in the dependent variable simply due to chance.  One could continue to add predictors to the model which would continue to improve the ability of the predictors to explain the dependent variable, although some of this increase in R-square would be simply due to chance variation in that particular sample.  The adjusted R-square attempts to yield a more honest value to estimate the R-squared for the population.   The value of R-square was .542, while the value of Adjusted R-square was .532. There isn’t much difference because we are dealing with only one variable.  When the number of observations is small and the number of predictors is large, there will be a much greater difference between R-square and adjusted R-square. By contrast, when the number of observations is very large compared to the number of predictors, the value of R-square and adjusted R-square will be much closer.

ANOVA The p-value associated with this F value is very small (0.0000). These values are used to answer the question "Do the independent variables reliably predict the dependent variable?".  The p-value is compared to your alpha level (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably predict the dependent variable".  If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent variables does not reliably predict the dependent variable. 

Coefficients B - These are the values for the regression equation for predicting the dependent variable from the independent variable.  These are called unstandardized coefficients because they are measured in their natural units.  As such, the coefficients cannot be compared with one another to determine which one is more influential in the model, because they can be measured on different scales. 

Coefficients This chart looks at two variables and shows how the different bases affect the B value. That is why you need to look at the standardized Beta to see the differences.

Coefficients Beta - The are the standardized coefficients. These are the coefficients that you would obtain if you standardized all of the variables in the regression, including the dependent and all of the independent variables, and ran the regression.  By standardizing the variables before running the regression, you have put all of the variables on the same scale, and you can compare the magnitude of the coefficients to see which one has more of an effect.  You will also notice that the larger betas are associated with the larger t-values.

How to translate a typical table Regression Analysis Level of Education by Income per capita

Part of the Regression Equation b represents the slope of the line It is calculated by dividing the change in the dependent variable by the change in the independent variable. The difference between the actual value of Y and the calculated amount is called the residual. The represents how much error there is in the prediction of the regression equation for the y value of any individual case as a function of X.

Comparing two variables Regression analysis is useful for comparing two variables to see whether controlling for other independent variable affects your model. For the first independent variable, education, the argument is that a more educated populace will have higher-paying jobs, producing a higher level of per capita income in the state. The second independent variable is included because we expect to find better-paying jobs, and therefore more opportunity for state residents to obtain them, in urban rather than rural areas.

Single Multiple Regression

Single Regression Multiple Regression

Perceptions of victory

Regression To see how the poll influenced the perception of the victor we first examine the breakdown of opinion regarding the question, “Who do you think has the best chance of winning in your local riding.” Subsequently we conduct a logistic regression technique using this question as the dependent variable. The dependent variable was recoded into a dummy variable, scored zero or one. The independent variables were recoded into dichotomous variables, coded as zero or one. When three values were used, the middle value was recoded as a .5. “Media use” was measured on whether the individual had read the Windsor Star on the previous day – reading the Windsor Star was coded as a one, all other responses were coded zero. Some might question why we did not compare television versus newspaper effects given the research on the influence of television news on political matters. The answer is two-fold. First, because this study focused on the local race in the federal election, the newspaper coverage simply had the most attention to the local candidates in addition to the publication of the poll. In addition, several studies have shown that when a person’s self-reported use of newspapers is correlated with knowledge, it demonstrates that newspaper readers have more knowledge than television viewers (for review of this literature see Chaffee and Frank 1996, 52). Since this study ultimately seeks to see what information is learned, newspaper readers of the Windsor Star is the most valid measure. “Time” was another independent variable. As Lang and Lang argue, poll effects are dependent on when in the campaign the poll is released. “Early polls have potentially greater impact because people are less familiar with the issue or candidate; opinions have not yet firmed” (Lang and Lang 1984, 135; Scheufel and Moy 2000, 11). In general, people will start to acquire information about campaign issues and candidates, the closer to an election day (Dutwin 2000, 23). For our purposes, we measure time in terms of the date or week that the polls were released (whether it was before the polls were released). The “partisan” variable used the question regarding which part the person thought they were closer to. For “voter choice”, we combined the questions on who they intended to vote for with the follow up question on who they were leaning towards. For “media use” we examined the question on which media the person was exposed to the previous day. These variable examine pre-existing beliefs as well as the media consumption. The three socio-demographic variables – “union membership”, “certain to vote” and “education” – tap into other factors that may play a role in the vote. Since Windsor is strongly associated with the auto sector and unionism, it was considered an important independent variable. Question woding for these variables can be found in Appendix B.