ASSOCIATION Practical significance vs. statistical significance

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Association for Interval Level Variables
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
PPA 415 – Research Methods in Public Administration
Chapter Eighteen MEASURES OF ASSOCIATION
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Elaboration Elaboration extends our knowledge about an association to see if it continues or changes under different situations, that is, when you introduce.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Review Regression and Pearson’s R SPSS Demo
Chapter 8: Bivariate Regression and Correlation
Example of Simple and Multiple Regression
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Association between Variables Measured at the Nominal Level.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Examining Relationships in Quantitative Research
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
Regression and Correlation
Regression Analysis.
Simple Bivariate Regression
Final Project Reminder
Final Project Reminder
REGRESSION G&W p
Introduction to Regression Analysis
Correlation and Simple Linear Regression
Political Science 30: Political Inquiry
Modern Languages Projection Booth Screen Stage Lecturer’s desk broken
Correlation and regression
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Correlation and Regression
Statistics for the Social Sciences
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Understanding Research Results: Description and Correlation
Correlation and Simple Linear Regression
Association Between Variables Measured at Nominal Level
Correlation and Regression
Correlation and Simple Linear Regression
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Correlation and Regression
Simple Linear Regression and Correlation
Statistics II: An Overview of Statistics
Product moment correlation
An Introduction to Correlational Research
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Warsaw Summer School 2017, OSU Study Abroad Program
Chapter Thirteen McGraw-Hill/Irwin
3 basic analytical tasks in bivariate (or multivariate) analyses:
1. Nominal Measures of Association 2. Ordinal Measure s of Associaiton
Honors Statistics Review Chapters 7 & 8
1. Nominal Measures of Association 2. Ordinal Measure s of Associaiton
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

ASSOCIATION Practical significance vs. statistical significance The strength of relationship between 2 variables Knowing how much variables are related may enable you to predict the value of 1 variable when you know the value of another Practical significance vs. statistical significance Statistically significant does not mean strong, or meaningful

Review for Nominal or Nominal-Ordinal Combinations 2 based measures 2 captures the strength of a relationship but also N. To get a “pure” measure of strength, you have to remove influence of N Phi Cramer's V PRE measures Improvement in prediction you get by knowing the IV (or reduction in error) Lambda

2 LIMITATIONS OF LAMBDA 1. Asymmetric Value of the statistic will vary depending on which variable is taken as independent 2. Misleading when one of the row totals is much larger than the other(s) For this reason, when row totals are extremely uneven, use a chi square-based measure instead

MEASURE OF ASSOCIATION—BOTH VARIABLES ORDINAL GAMMA For examining STRENGTH & DIRECTION of “collapsed” ordinal variables (<6 categories) Like Lambda, a PRE-based measure Range is -1.0 to +1.0

GAMMA Logic: Applying PRE to PAIRS of individuals Prejudice Lower Class Middle Class Upper Class Low Kenny Tim Kim Middle Joey Deb Ross High Randy Eric Barb

GAMMA CONSIDER KENNY-DEB PAIR In the language of Gamma, this is a “same” pair direction of difference on 1 variable is the same as direction on the other If you focused on the Kenny-Eric pair, you would come to the same conclusion Prejudice Lower Class Middle Class Upper Class Low Kenny Tim Kim Middle Joey Deb Ross High Randy Eric Barb

GAMMA NOW LOOK AT THE TIM-JOEY PAIR In the language of Gamma, this is a “different” pair direction of difference on one variable is opposite of the difference on the other Prejudice Lower Class Middle Class Upper Class Low Kenny Tim Kim Middle Joey Deb Ross High Randy Eric Barb

GAMMA Logic: Applying PRE to PAIRS of individuals Formula: same – different same + different

GAMMA Prejudice Lower Class Middle Class Upper Class Low Kenny Tim Kim If you were to account for all the pairs in this table, you would find that there were 9 “same” & 9 “different” pairs Applying the Gamma formula, we would get: 9 – 9 = 0 = 0.0 18 18 Prejudice Lower Class Middle Class Upper Class Low Kenny Tim Kim Middle Joey Deb Ross High Randy Eric Barb Meaning: knowing how 2 people differ on social class would not improve your guesses as to how they differ on prejudice. In other words, knowing social class doesn’t provide you with any info on a person’s level of prejudice.

GAMMA 3-case example Applying the Gamma formula, we would get: 3 – 0 = 3 = 1.00 3 3 Prejudice Lower Class Middle Class Upper Class Low Kenny Middle Deb High Barb

Gamma: Example 1 Examining the relationship between: FEHELP (“Wife should help husband’s career first”) & FEFAM (“Better for man to work, women to tend home”) Both variables are ordinal, coded 1 (strongly agree) to 4 (strongly disagree)

Gamma: Example 1 Based on the info in this table, does there seem to be a relationship between these factors? Does there seem to be a positive or negative relationship between them? Does this appear to be a strong or weak relationship?

GAMMA Do we reject the null hypothesis of independence between these 2 variables? Yes, the Pearson chi square p value (.000) is < alpha (.05) It’s worthwhile to look at gamma. Interpretation: There is a strong positive relationship between these factors. Knowing someone’s view on a wife’s “first priority” improves our ability to predict whether they agree that women should tend home by 75.5%.

USING GSS DATA Construct a contingency table using two ordinal level variables Are the two variables significantly related? How strong is the relationship? What direction is the relationship?

ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES

Scattergrams Allow quick identification of important features of relationship between interval-ratio variables Two dimensions: Scores of the independent (X) variable (horizontal axis) Scores of the dependent (Y) variable (vertical axis) (

3 Purposes of Scattergrams To give a rough idea about the existence, strength & direction of a relationship The direction of the relationship can be detected by the angle of the regression line 2. To give a rough idea about whether a relationship between 2 variables is linear (defined with a straight line) 3. To predict scores of cases on one variable (Y) from the score on the other (X)

IV and DV? What is the direction of this relationship?

IV and DV? What is the direction of this relationship?

The Regression line Properties: The sum of positive and negative vertical distances from it is zero The standard deviation of the points from the line is at a minimum The line passes through the point (mean x, mean y) Bivariate Regression Applet

Regression Line Formula Y = a + bX Y = score on the dependent variable X = the score on the independent variable a = the Y intercept – point where the regression line crosses the Y axis b = the slope of the regression line SLOPE – the amount of change produced in Y by a unit change in X; or, a measure of the effect of the X variable on the Y

Regression Line Formula Y = a + bX y-intercept (a) = 102 slope (b) = .9 Y = 102 + (.9)X This information can be used to predict weight from height. Example: What is the predicted weight of a male who is 70” tall (5’10”)? Y = 102 + (.9)(70) = 102 + 63 = 165 pounds EXAMPLE WITH HEIGHT WEIGHT…. SAMPLE OF MALES INTERPRETATION OF THIS SLOPE: FOR EACH 1-INCH INCREASE IN HEIGHT, WEIGHT INCREASES 0.9 OF A POUND. QUESTIONS? PUT THIS ON THE BOARD: Y = 102 + (.9)(70) = 102 + 63 = 165 pounds

Example 2: Examining the link between # hours of daily TV watching (X) & # of cans of soda consumed per day (Y) Case # Hours TV/ Day (X) Cans Soda Per Day (Y) 1 2 3 6 4 5 7 8 9 10 2.9, 3.6

Example 2 The regression line for this problem: Example 2: Examining the link between # hours of daily TV watching (X) & # of cans of soda consumed per day. (Y) The regression line for this problem: Y = 0.7 + .99x If a person watches 3 hours of TV per day, how many cans of soda would he be expected to consume according to the regression equation? y intercept (a) is 0.7 & slope (b) is .99

The Slope (b) – A Strength & A Weakness We know that b indicates the change in Y for a unit change in X, but b is not really a good measure of strength Weakness It is unbounded (can be >1 or <-1) making it hard to interpret The size of b is influenced by the scale that each variable is measured on

Pearson’s r Correlation Coefficient By contrast, Pearson’s r is bounded a value of 0.0 indicates no linear relationship and a value of +/-1.00 indicates a perfect linear relationship

Pearson’s r Y = 0.7 + .99x sx = 1.51 sy = 2.24 Converting the slope to a Pearson’s r correlation coefficient: Formula: r = b(sx/sy) r = .99 (1.51/2.24) r = .67 SIMPLE FORMULA TO TRANSFORM A b INTO AN r… Strong positive relationship between x & y. So, Pearson’s r is superior to b (the slope) for discussing the association between two interval-ratio variables in that Pearson’s r is bounded (score of -1 to 1). The major advantage of this is that you can look at the association b/t two variables with very different scales.

The Coefficient of Determination The interpretation of Pearson’s r (like Cramer’s V) is not straightforward What is a “strong” or “weak” correlation? Subjective The coefficient of determination (r2) is a more direct way to interpret the association between 2 variables r2 represents the amount of variation in Y explained by X You can interpret r2 with PRE logic: predict Y while ignoring info. supplied by X then account for X when predicting Y

Coefficient of Determination: Example Without info about X (hours of daily TV watching), the best predictor we have is the mean # of cans of soda consumed (mean of Y) The green line (the slope) is what we would predict WITH info about X BOARD: MEAN # OF CANS OF SODA CONSUMED IS 3.6. IF WE DIDN’T HAVE INFO ON X, THIS WOULD BE OUR BEST PREDICTION. THIS LINE COMES VERY CLOSE TO SOME POINTS, BUT IS QUITE FAR AWAY FROM OTHERS. If we know nothing about x, what is our best guess about the number of self-reported crimes that they are engaged in? Our “best guess” will always be the mean value of y (variation around the mean is always a minimum). This is because the scores of any variable vary less around the mean than around any other point. If we predict the mean of Y for every case, we will make fewer errors of prediction than if we predict any other value for Y. In other words, squared deviations from a mean are at a minimum: ∑(Y-Y)2 = minimum BOARD: VARIATION AROUND THE MEAN IS ALWAYS THE MINIMUM POSSIBLE So, if we knew nothing about X, the squared deviations from our prediction will sum up to the total variation in y. The vertical lines from the actual scores to the predicted score (Y bar) represent the amount of error we would make when predicting Y while ignoring X. THIS IS TOTAL VARIATION IN Y. Now, when we know X, we can calculate a regression equation and make predictions about Y using the regression coefficient. If the two variables have a linear relationship, then predicting scores on Y from the least squares regression equation will incorporate X and improve our ability to predict Y. So, our next step is to determine the extent to which knowledge of X improves our ability to predict Y. This sum is called the explained variation – tells us how our ability to predict Y when taking X into account. It is a measure of how much better our prediction has gotten (as opposed to just using the mean of y. In other words, it represents the proportion of the variation in y that is explained by x). VERTICAL LINES (REPRESENTING ERROR BETWEEN PREDICTED & OBSERVED) ARE SHORTER

Coefficient of Determination Conceptually, the formula for r2 is: r2 = Explained variation Total variation “The proportion of the total variation in Y that is attributable or explained by X.” The variation not explained by r2 is called the unexplained variation Usually attributed to measurement error, random chance, or some combination of other variables The formula for r2 is explained variation/total variation – that is, the proportion of the total variation in Y that is attributable to or explained by X. Just like other PRE measures, r2 indicates precisely the extent to which x helps us predict, understand, or explain X (much less ambiguous than r). The variation that is not explained by x (1-r2) is referred to as the unexplained variation (or, the difference between our best prediction of Y with X and the actual scores. Unexplained variation is usually attributed to the influence of some combination of other variables (THIS IS HUGE, AND IS SOMETHING YOU’LL BE GETTING INTO A LOT MORE IN 3152 – RARELY IN THE “REAL” SOCIAL WORLD CAN ALL VARIATION IN A FACTOR BE EXPLAINED BY JUST ONE OTHER FACTOR – THE ECONOMY, CRIME, ETC ARE COMPLEX PHENOMENA), measurement error, or random chance.

Coefficient of Determination Interpreting the meaning of the coefficient of determination in the example: Squaring Pearson’s r (.67) gives us an r2 of .45 Interpretation: The # of hours of daily TV watching (X) explains 45% of the total variation in soda consumed (Y) So, to continue with our example, squaring the Pearson’s r for the relationship b/t # of children in a family & number of self-reported crimes committed in a 6-month span gives us an r2 of… INTERPRETATION: ANOTHER PRE-BASED MEASURE… On the other hand, maybe there’s another variable out there that does a better job at predicting how many crimes a person commits. Perhaps, for example, it is the number of delinquent peers they hang out with, or if they hang out with any delinquent peers. In this case, the r2 might be higher for another factor. In multiple regression, we can consider both simultaneously – have 2 x’s predicting or explaining variation in the same y. (PRELUDE TO MULTIVARIATE REGRESSION)

Another Example: Relationship between Mobility Rate (x) & Divorce rate (y) The formula for this regression line is: Y = -2.5 + (.17)X 1) What is this slope telling you? 2) Using this formula, if the mobility rate for a given state was 45, what would you predict the divorce rate to be? 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation? 4) Calculate & interpret the coefficient of determination (r2) MOBILITY RATES OF STATES

Another Example: Relationship between Mobility Rate (x) & Divorce rate (y) The formula for this regression line is: Y = -2.5 + (.17)X 1) What is this slope telling you? 2) Using this formula, if the mobility rate for a given state was 45, what would you predict the divorce rate to be? 3) The standard deviation (s) for x=6.57 & the s for y=1.29. Use this info to calculate Pearson’s r. How would you interpret this correlation? 4) Calculate & interpret the coefficient of determination (r2) MOBILITY RATES OF STATES

Regression Output Scatterplot Regression Graphs  Legacy  Simple Scatter Regression Analyze  Regression  Linear Example: How much you work predicts how much time you have to relax X = Hours worked in past week Y = Hours relaxed in past week

Hours worked x Hours relaxed

Regression Output Model Summary Model R R Square dimension0 1 .209a Adjusted R Square Std. Error of the Estimate dimension0 1 .209a .044 .043 2.578 a. Predictors: (Constant), NUMBER OF HOURS WORKED LAST WEEK Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) 5.274 .236   22.38 .000 NUMBER OF HOURS WORKED LAST WEEK -.038 .005 -.209 -7.160 a. Dependent Variable: HOURS PER DAY R HAVE TO RELAX

Correlation Matrix Analyze  Correlate  Bivariate Correlations   NUMBER OF HOURS WORKED LAST WEEK HOURS PER DAY R HAVE TO RELAX DAYS OF ACTIVITY LIMITATION PAST 30 DAYS Pearson Correlation 1 -.209** -.061* Sig. (2-tailed) .000 .040 N 1139 1123 1122 -.021 .483 1154 1146 1155 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Measures of Association Level of Measurement (both variables) Measures of Association “Bounded”? PRE interpretation? NOMINAL Phi Cramer’s V Lambda NO* YES NO ORDINAL Gamma INTERVAL-RATIO b (slope) Pearson’s r r2 * But, has an upper limit of 1 when dealing with a 2x2 table.