Association for Interval Level Variables

Slides:



Advertisements
Similar presentations
Copyright © 2012 by Nelson Education Limited. Chapter 13 Association Between Variables Measured at the Interval-Ratio Level 13-1.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Correlation and Linear Regression.
Review ? ? ? I am examining differences in the mean between groups
Correlation and regression Dr. Ghada Abo-Zaid
Correlation and Regression
Chapter 4 The Relation between Two Variables
Describing Relationships Using Correlation and Regression
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Sociology 601 Class 17: October 28, 2009 Review (linear regression) –new terms and concepts –assumptions –reading regression computer outputs Correlation.
Correlation and Linear Regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
9. SIMPLE LINEAR REGESSION AND CORRELATION
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
PPA 415 – Research Methods in Public Administration
SIMPLE LINEAR REGRESSION
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 3: Examining Relationships
SIMPLE LINEAR REGRESSION
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Business Statistics - QBM117 Statistical inference for regression.
Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.
Leon-Guerrero and Frankfort-Nachmias,
Week 9: Chapter 15, 17 (and 16) Association Between Variables Measured at the Interval-Ratio Level The Procedure in Steps.
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Correlation and Linear Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 16 Correlation and Coefficient of Correlation
SIMPLE LINEAR REGRESSION
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Correlation and Linear Regression
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Chapter 6 & 7 Linear Regression & Correlation
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Correlation & Regression
Examining Relationships in Quantitative Research
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Lecture 10: Correlation and Regression Model.
Examining Relationships in Quantitative Research
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 15 Association Between Variables Measured at the Interval-Ratio Level.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Statistics in Applied Science and Technology
Regression and Correlation
Correlation and Regression
Correlation and Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Association for Interval Level Variables Chapter 15

Introduction When referring to interval-ratio variables a commonly used synonym for association is correlation We will be looking for the existence, strength, and direction of the relationship We will only look at bivariate relationships in this chapter

Scattergrams The first step is to construct and examine a scattergram Example in the book Analysis of how dual wage-earner families cope with housework They want to know if the number of children in the family is related to the amount of time the husband contributes to housekeeping chores

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Construction of a Scattergram Draw two axes of about equal length and at right angles to each other Put the independent (X) variable along the horizontal axis (the abscissa) and the dependent (Y) variable along the vertical axis (the ordinate) For each person, locate the point along the abscissa that corresponds to the scores of that person on the X variable Draw a straight line up from that point and at right angles to the axis Then locate the point along the ordinate that corresponds to the score of that same case on the Y variable Place a dot there to represent the case, and then repeat with all cases

Regression Line and its Purpose It checks for linearity of the data points on the scattergram It gives information about the existence, strength, and direction of the association It is used to predict the score of a case on one variable from the score of that case on the other variable It is a floating mean through all the data points

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Existence of a Relationship Two variables are associated if the distributions of Y change for the various conditions of X The scores along the abscissa (number of children) are conditions of values of X The dots above each X value can be thought of as the conditional distributions of Y (scores on Y for each value of X) In other words, Y tends to increase as X increases

Existence of a Relationship The existence of a relationship is reinforced by the fact that the regression line lies at an angle to the X axis (the abscissa) There is no linear relationship between two interval-level variables when the regression line on a scattergram is parallel to the horizontal axis

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Strength of the Association The strength of the association is judged by observing the spread of the dots around the regression line A perfect association between variables can be seen on a scattergram when all dots lie on the regression line The closer the dots to the regression line, the stronger the association So, for a given X. there should not be much variety on the Y variable

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Direction of the Relationship The direction of the relationship can be judged by observing the angle of the regression line with respect to the abscissa The relationship is positive when the line slopes upward from left to right The association is negative when it slopes down Your book shows a positive relationship, because cases with high scores on X also tend to have high scores on Y For a negative relationship, high scores on X would tend to have low scores on Y, and vice versa Your book also shows a zero relationship—no association between variables, in that they are randomly associated with each other

Linearity The key assumption (first step in the five-step model) with correlation and regression is that the two variables have an essentially linear relationship The points or dots must form a pattern of a straight line It is important to begin with a scattergram before doing correlations and regressions If the relationship is nonlinear, you may need to treat the variables as if they were ordinal rather than interval-ratio

Regression and Prediction The final use of the scattergram is to predict scores of cases on one variable from their score on the other May want to predict the number of hours of housework a husband with a family of four children would do each week You use regression to predict outside the range of the data with caution, since you do not have any data points to show what happens beyond the scope of the data—it may have suddenly gone down

The Predicted Score on Y The symbol for this is Y’, or Y prime, though in other books, it is most often Y hat, but that symbol is difficult to do on a computer or to print in books It is found by first locating the score on X (X=4, for four children) and then drawing a straight line from that point on the abscissa to the regression line From the regression line, another straight line parallel to the abscissa is drawn across to the Y axis or ordinate Y’ is found at the point where the line from the regression line crosses the Y axis Or, you can compute Y’ = a + bX Y’ is the expected Y value for a given X

Formula for the Regression Line The formula for a straight line that fits closest to the conditional means of Y Y = a + bX Where Y = score on the dependent variable a = the Y intercept or the point where the regression line crosses the Y axis b = the slope of the regression line or the amount of change produced in Y by a unit change in X X = score on the independent variable

Regression Line The position of the least-squares regression line is defined by two elements The Y intercept and the slope of the line It also crosses the point where the mean of X meets the mean of Y The weaker the effect of X on Y (the weaker the association between the variables) the lower the value of the slope (b) If the two variables are unrelated, the least-squares regression line would be parallel to the abscissa, and b would be 0 (the line would have no slope)

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Equations for the Slope of the Regression Line You need to compute “b” first, since it is needed in the formula for “a” Slope: Which is the covariance of X and Y divided by the variance of X

Interpretation of the Value of the Slope If you put your scattergram on graph paper, you can see that as X increases one box, “b” is how many units that Y increases on the regression line So, a slope of .69 indicates that, for each unit increase in X, there is an increase of .69 units in Y If the slope is 1.5, for every unit of change in X, there is an increase of 1.5 units in Y They refer to units, since correlation and regression allow you to compare apples and oranges—two completely different variables

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family

Interpretation of “b” cont. So, to find what one unit of X is or one unit of Y is, you have to go back to the labels for each variable For the example in your book which has a “b” (beta) of .69 The addition of each child (an increase of one unit in X—one unit is one child) Results in an increase of .69 hours of housework being done by the husband (an increase of .69 units—or hours—in Y)

Formula for the Intercept of the Regression Line

Interpretation of the Intercept The intercept for the example in the book is 1.49 The least-squares regression line will cross the Y axis at the point where Y equals 1.49 You need a second point to draw the regression line You can begin at Y of 1.49, and for the next value of X, which is 1 child, you will go up .69 units of Y Or, you can use the intersection of the mean of X and the mean of Y—the regression line always goes through this point

Interpretation of “a” cont. Most of the time, you can’t interpret the value of the intercept Technically, it is the value that Y would take if X were zero But, most often a zero X is not meaningful Or, in the case in your book, zero is outside the range of the data You don’t have any information about the hours of housework that husbands do when they have no children Technically, the intercept of 1.49 is the amount of predicted housework a husband with zero children would do, but you can’t say that with certainty

Least Squares Regression Line Now that you know “a” and “b”, you can fill in the full least-squares regression line Y = a + bX Y = (1.49) + (.69) X This formula can be used to predict scores on Y as was mentioned earlier For any value of X, it will give you the predicted value of Y (Y’) The predictions of husband’s housework are “educated guesses” The accuracy of our predictions will increase as relationships become stronger (as dots are closer to the regression line)

The Correlation Coefficient (Pearson’s r) Pearson’s r varies from 0 to plus or minus 1 With 0 indicating no association And + 1 and – 1 indicating perfect positive and perfect negative relationships The definitional formula for Pearson’s r is in your book Similar to the formula for b (beta), the numerator is the covariation between X and Y (usually called the covariance)

Interpretating r and r-squared Interpretation of “r” will be the same as all the other measures of association An “r” of .5 would be a moderate positive linear relationship between the variables Interpretation of the Coefficient of Determination (r-squared) The square of Pearson’s r is also called the coefficient of determination While “r” measures the strength of the linear relationship between two variables But values between 0 and 1 or -1 have no direct interpretation

Interpretation, cont. The coefficient of determination can be interpreted with the logic of PRE (proportional reduction in error) First Y is predicted while ignoring the information supplied by X Second the independent variable is taken into account when predicting the dependent When working with variables measured at the interval-ratio level, the predictions of Y under the first condition (while ignoring X) will be the mean of the Y scores (Y bar) for every case We know that the mean of any distribution is closer than any other point to all the scores in the distribution

Interpretation, cont. Will make many errors in predicting Y The amount of error is shown in Figure 16.6 The formula for the error is the sum of (Y minus Y bar) squared This is called the total variation in Y, meaning the total amount that all the points are off the mean of Y The next step will be to find the extent to which knowledge of X improves our ability to predict Y (Will we make predictions that come closer to the actual points than predictions made using the mean of Y?)

Interpretation, cont. If the two variables have a linear relationship, then predicting scores on Y from the least-squares regression equation will use knowledge of X and reduce our errors of prediction The formula for the predicted Y score for each value of X will be: Y’ = a + bX This is also the formula for the regression line

Unexplained Variation That suggests that some of the variation in Y is unexplained by X The proportion of the total variation in Y unexplained by X can also be found by subtracting the value of r-squared from 1.00 Other variables will be needed to explain one hundred percent of the variation in Y (the dependent variable)

Unexplained Variation, cont. Unexplained variation is usually attributed to the influence of three things: Some combination of other variables, as in the example of the husband’s housework Measurement error People over or under estimate how much time they spend doing housework Random chance Your sample may be biased, particularly if it is small

Testing Pearson’s r for Significance When “r” is based on data from a random sample, you need to test “r” for its statistical significance When testing Pearson’s r for significance, the null hypothesis is that there is no linear association between the variables in the population from which the sample was drawn We will use the t distribution for this test

Assumptions for the Significance Test We make some additional assumptions in Step 1 Need to assume that both variables are normal in distribution Need to assume that the relationship between the two variables is roughly linear in form The third new assumption involves the concept of homoscedasticity

Homoscedasticity A homoscedastistic relationship is one where the variance of the Y scores is uniform for all values of X If the Y scores are evenly spread above and below the regression line for the entire length of the line, the relationship is homoscedastistic If the variance around the regression line is greater at one end or the other, the relationship is heteroscedastistic A visual inspection of the scattergram is usually sufficient to find the extent the relationship conforms to the assumptions of linearity and homoscedasticity If the data points fall in a roughly symmetrical, cigar-shaped pattern, whose shape can be approximated with a straight line, then it is appropriate to proceed with this test of significance

Scattergram of Relationship Between the Two Variables Regression of Husband’s Hours of Housework By The Number of Children in the Family