Inferential Statistics

Slides:



Advertisements
Similar presentations
Correlation and Linear Regression.
Advertisements

Describing Relationships Using Correlation and Regression
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 6: Correlational Research Examine whether variables are related to one another (whether they vary together). Correlation coefficient: statistic.
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Correlation Chapter 9.
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Chapter Seven The Correlation Coefficient. Copyright © Houghton Mifflin Company. All rights reserved.Chapter More Statistical Notation Correlational.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Relationships Among Variables
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Covariance and correlation
Correlation.
Chapter 15 Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Hypothesis of Association: Correlation
Correlations. Outline What is a correlation? What is a correlation? What is a scatterplot? What is a scatterplot? What type of information is provided.
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
Investigating the Relationship between Scores
Correlation.
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.
Chapter 14 Correlation and Regression
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Chapter 12 Understanding Research Results: Description and Correlation
Statistical analysis.
Regression and Correlation
Regression Analysis.
Correlation analysis is undertaken to define the strength an direction of a linear relationship between two variables Two measurements are use to assess.
Introduction to Regression Analysis
Statistical analysis.
Statistics: The Z score and the normal distribution
Correlation and Regression
Elementary Statistics
Correlation and Regression
Chapter 15: Correlation.
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Descriptive Analysis and Presentation of Bivariate Data
Theme 7 Correlation.
Correlation and Regression
Linear Regression/Correlation
The Pearson Correlation
Product moment correlation
An Introduction to Correlational Research
4/4/2019 Correlations.
Correlation and the Pearson r
Inferential Statistics
Reasoning in Psychology Using Statistics
Introduction to Regression
Chapter 15 Correlation Copyright © 2017 Cengage Learning. All Rights Reserved.
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
Reasoning in Psychology Using Statistics
Honors Statistics Review Chapters 7 & 8
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Inferential Statistics Correlations Inferential Statistics

Overview Correlation coefficients Scatterplots Calculating Pearson’s r Interpreting correlation coefficients Calculating & interpreting coefficient of determination Determining statistical significance Calculating Spearman’s correlation coefficient

Correlation Reflects the degree of relation between variables Calculation of correlation coefficient Direction + (positive) or – (negative) Strength (i.e., magnitude) Further away from zero, the stronger the relation Form of the relationship Correlation – reflects the degree of relation between variables. We calculate a correlation when we want to examine it there is an association or link between two variables. As previously discuss, a correlation coefficient ranges from -1.00 to +1.00. If you ever calculate a correlation coefficient and end up with something like 1.23 or -1.88, you know you have done something wrong. The +/- sign of the coefficient let’s us know what the direction of the correlation is. Positive means that the variables are going in the same direction (tend to both increase and decrease in the same direction). For example, as study time increases, grades tend to increase. Negative means the variables are going in opposite directions. For example, as depression increases, amount of work completed decreases. The strength or magnitude of the correlation is gleaned from the absolute value of the correlation coefficient. The further away from zero, the stronger the relation. Both -1.00 and +1.00 represent perfect relationships and are the strongest possible correlations. They represent a highly consistent relation between the two variables. The closer to zero, the more inconsistency in the relationship – in which the variables don’t seem to be associated. We can also get information about the form of the relationship. That is, is it linear or curvilinear? Typically, this pattern can be seen in scatterplots. For this class, we will focus on linear relationships.

Check yourself Indicate whether the following statements suggest a positive or negative relationship: High school students with lower IQs have lower GPAs More densely populated areas have higher crime rates Heavier automobiles yield poorer gas mileage More anxious people willingly spend more time performing a simple repetitive task Go through the items above to determine whether they represent positive or negative correlations. Answers: High school students with lower IQs have lower GPAs. There is a positive correlation such that among high school students lower IQ is associated with lower GPA. More densely populated areas have higher crime rates. Positive – as population increases, crime rates tend to also increase. Heavier automobiles yield poorer gas mileage. Negative – as automobile weight increases, gas mileages tends to decrease. More anxious people willingly spend more time performing a simple repetitive task. Positive – as anxiety increases, willingness to spend time on repetitive tasks increases.

Scatterplots Scatterplots can be used to represent the relation between variables. One variable is labeled X and one is labeled Y. Typically, the X variable is the predictor variable (in this case wine consumed) and the Y variable is the criteria or outcome variable (in this case hear disease deaths). Each data point is a point on the graph is essentially a participant. Researchers plot participants’ scores on each of the variable to visually see if there is a association between the two variables. In this case, we can see that there is a negative relationship between the two.

Correlation & Scatterplots Exam1 X Exam2 Y Participant1 100 95 Participant2 60 65 Participant3 75 80 Participant4 85 Participant5 Participant6 70 Participant7 r = .91 Good idea to graph the data Determine the form of the relationship (linear or nonlinear) Determine if there are any outliers (extreme scores) Rough guess as to what the correlation coefficient will be Benefits of scatterplot Form of relation Any possible outliers? Rough guess of r

Correlation & Scatterplots Number of Arrests X GPA Y Participant1 4.0 Participant2 5 3.7 Participant3 10 2.8 Participant4 20 2.5 Participant5 30 1.0 Graph these data.

Correlation & Scatterplots Number of Arrests X GPA Y Participant1 4.0 Participant2 5 3.7 Participant3 10 2.8 Participant4 20 2.5 Participant5 30 1.0 # of times arrested

Pearson’s r Formula SP = Sum of products (of deviations) SSx = Sum of Squares of X SSy = Sum of Squares of Y By far the most common correlation is the Pearson correlation (or the Pearson product moment correlation) which measures the degree of straight-line relationships between two variables on ratio scales of measurement. To calculate Person correlation, it is necessary to introduce one new concept: the sum of products of deviation, or SP. In the past, we used a similar concept, sum of squared deviations to measure variability of a single variable. Now we will use SP to measure the amount of covariability between two variables. The denominator is the square root of the sum of squares (SS) for the X scores times the sum of squares (SS) for the Y scores. Notice the subscript is used to differentiate the two sums of squares. We will get back to the denominator in a second.

Pearson’s r Calculating SP Find X & Y deviations for each individual Definitional formula Computational formula Find X & Y deviations for each individual Find product of deviations for each individual Sum the products Similar to calculating SS, there are two formulas we can use to calculate the sums of products (SP). One is the definitional formula. First, find the deviations from the mean for X and Y separately, Second, find the product of deviations for each individual, Third, sum the products. The other is the computational formula.

Example #1 Calculating SP – Definitional Formula Step 2: Multiply the deviations from the mean X Y 2 4 3 5 7 X-MX Y-MY 2-3 = -1 2-4 = -2 4-4 = 0 3-3 = 0 3-4 = -1 5-3 = 2 7-4 = 3 (X-MX)(Y-MY) (-1)(-2) = 2 (-1)(0) = 0 (0)(-1) = 0 (2)(3) = 6 Step 3: Sum the products Here, you can see the steps for the definitional formula. First, we find the deviations for X and Y separately. So, we determine the mean for our X scores, which in this case is 3. We subtract three from each X score to find how much each score deviates from the X mean. Then we repeat this process for the Y scores. We determine the mean for our Y scores, which is 4. Next, we subtract the mean (4) from each Y score to determine how much each score deviates from the Y mean. Second, we find the product of the deviations – we multiply the deviations from the mean for our X scores with the deviations from the mean for our Y scores. Lastly, we sum the products. This is our SP. SX = 12 SY = 16 MX = SX/n = 12/4 = 3 MY = SY/n = 16/4 = 4 Step 1: Find deviations for X and Y separately

Example #1 Calculating SP – Computational Formula Y 2 4 3 5 7 XY 4 8 9 35 For the computational formula, the sum of products equals the sum of X times Y minus the sum or X times the sum of Y divided by N The sum of X is 12 The sum of Y is 16 The sum of X times Y is 56 So to put in the formulas it would be: 56-(12*16)/4 56-192/4 56-48=8 SX = 12 SY = 16 MX = SX/n = 12/4 = 3 MY = SY/n = 16/4 = 4

Calculating Pearson’s r Calculate SP Calculate SS for X Calculate SS for Y Plug numbers into formula Here are all the steps needed to calculate Pearon’s r. First, we calculate SP using either the definitional or computational formula. Note, SP can be a positive or negative number. Second, we calculate SS for X scores using either the definitional or computational formula. Third, we calculate SS for Y scores using either the definitional or computational formula. Lastly, we plug each of the numbers into the formula.

Calculating Pearson’s r Calculate SP Calculate SS for X Calculate SS for Y Plug numbers into formula Here are the numbers from the previous problem in which we’ve already determined that SP is 8. Try completing steps 2-4 on your own. X Y 2 4 3 5 7

Example #1 - Answers Calculating Pearson’s r Y 2 4 3 5 7 X-MX Y-MY 2-3 = -1 2-4 = -2 4-4 = 0 3-3 = 0 3-4 = -1 5-3 = 2 7-4 = 3 (X-MX)(Y-MY) (-1)(-2) = 2 (-1)(0) = 0 (0)(-1) = 0 (2)(3) = 6 XY 4 8 9 35 X2 Y2 4 16 9 25 49 (X-MX)2 (Y-MY)2 1 4 9 Here are the answers. You can see that I’ve used both the definitional and computational formulas. SX = 12 SY = 16 MX = SX/n = 12/4 = 3 MY = SY/n = 16/4 = 4

Pearson’s r r = covariability of X and Y variability of X and Y separately It is important to think about what that correlation coefficient really is.

Using Pearson’s r Prediction Validity Reliability Although Pearon’s r can be used in many different ways, here are a few examples. It is important for prediction. If two variables are known to be related in some systematic way, it is then possible to use one of the variables to make accurate predications about the other. It is also important for determining validity. Remember, back we discussed scale validation – that is, are we measuring what we are suppose to be measuring. We talked about several types of variability, such as construct validity and predictive validity. It can also be used to say something about reliability. That is, the consistency between two measures. For example, remember test-retest and interrater reliability. Ideally, we would want a strong positive correlation between scores on tests or observers’ ratings.

Verbal Descriptions 1) r = -.84 between total mileage & auto resale value 2) r = -.35 between the number of days absent from school & performance on a math test 3) r = -.05 between height & IQ 4) r = .03 between anxiety level & college GPA 5) r = .56 between age of schoolchildren & reading comprehension level Take a moment to interpret these correlation coefficients. Cars with more total miles tend to have lower resale values. Students with more absences from school tend to score lower on math tests. Little or no relationship exists between height and IQ Little or no relationship exists between anxiety level and college IQ Older schoolchildren tend to have better reading comprehension It is important that in your verbal descriptions to steer clear of any causal claims.

Interpreting correlations Describe a relationship between 2 vars Correlation does not equal causation Directionality Problem Third-variable Problem Restricted range Obscures relationship When interpreting correlation coefficients it is important to remember that ‘correlation does not equal causation’. We previously discussed the directionality problem – that is, if the two variables are measured at similar points in time, we cannot be sure which variable is the cause and which is the effect. Does depression cause low self-esteem? Or does low self-esteem cause depression? There is also the third variable problem, in which is possible that a third unmeasured variable may be responsible for both. Perhaps it is chemical imbalance in serotonin levels that influences both depression and self-esteem. We also have to be careful concerning the restriction of range, arises when the range between the lowest and highest scores on one or both variables is limited. This will reduce the accuracy of the correlation coefficient, producing a coefficient that is smaller than it would be if the range were not restricted.

Interpreting correlations Outliers Can have BIG impact on correlation coefficient We also have to beware of outliers (i.e., extreme scores), which can have a big impact on the correlation coefficient. Here you can see that from subjections A through E, the correlation is -.08. Basically, no correlation between the two variables. However, which subject F is added into the mix, we see that the correlation jumps to .86 suggesting a strong positive correlation. It is clear though that this relationship is being driven by one subject who is very different from the other subjects. Source: Figure is from page 462 of the Gravetter and Wallnau textbook.

Interpreting correlations Strength & Prediction Coefficient of determination r2 Proportion of variability in one variable that can be determined from the relationship w/ the other variable r = .60, then r2 = .36 or 36% 36% of the total variability in X is consistently associated with variability in Y “predicted” and “accounted for” variability When multiplied by 100, the coefficient of determination indicates the percentage of variance accounted for. Essentially, it is the proportion of variablity in one variable that can be determined from the relationship with the other variable. It is used to determine effect size for correlations. Low values for r2 indicate that factors, other than the two variables of interest, are influencing the relationship. For example, if we had a correlation of .60 – then 36% of the variability in the Y scores can be predicted from the relationship with X scores. Example (hours watching TV/amount of aggression) r = .875 r2 = 76.56% The relationship between hours spent watching TV and amount of aggression accounts for a large percentage of the variance 76.56% of the variability in the Y scores can be predicted from the relationship with X There are few other factors that influence this relationship

Mini-Review Correlations Calculation of Pearson’s r Using Pearson’s r Sum of product deviations Using Pearson’s r Verbal descriptions Interpretation of Pearson’s r

Example #2 Practice – Calculate Pearson’s r Calculate SP Calculate SS for X Calculate SS for Y Plug numbers into formula Ex 1 X Y 2 9 1 10 3 6 8 4 Complete this problem.

Example #2 SP = S(X-MX)(Y-MY) SSY SSX X Y 2 9 1 10 3 6 8 4 X-MX Y-MY 8 4 X-MX Y-MY 2-2 = 0 9-7 = 2 1-2 = -1 10-7 = 3 3-2 = 1 6-7 = -1 0-2 = -2 8-7 = 1 4-2 = 2 2-7 = -5 (X-MX)(Y-MY) (0)(2) = 0 (-1)(3) = -3 (1)(-1) = -1 (-2)(1) = -2 (2)(-5) = -10 XY 18 10 8 X2 Y2 4 81 1 100 9 36 64 16 (X-MX)2 (Y-MY)2 4 1 9 25 SX = 10 SY = 35 MX = SX/n = 10/5 = 2 MY = SY/n = 35/5 = 7 SSX

Hypothesis Testing Making inferences based on sample information Is it a statistically significant relationship? Or simply chance? We’ve already discussed using a sample mean to answer questions about a population mean So, we know some of the basic principles H0: r = 0 In population – no effect, no difference, no relationship H1: r not equal 0 In population – a change, a relationship, a difference

Conceptually - Degrees of freedom Knowing M (the mean) restricts variability in sample 1 score will be dependent on others n = 5, SX = 20 If we know first 3 scores If we know first 4 scores Score X1 = 6 X2 = 4 X3 = 2 X4 X5 Σx = 20 = 3 With n=5, there can be only 4 df = 5 With a population, we find the deviation for each score by measuring its distance from the population mean (x – “mew”) With the sample, the population mean is unknown – must measure distances from sample mean. This requires that you know the sample mean to compute deviations. Knowing the value of mean places a restriction on the variability of the scores in the sample. Meaning, one score will always be restricted (i.e., dependent on the other ones). Because we have to know M for computation of s2 (which is variance), one score will always be restricted (i.e. dependent on the other ones) A sample with n scores will generally have n-1 scores that can vary, but the last score is restricted. We call this the sample has n-1 degrees of freedom For example, if we have 5 scores and we know the sum of the 5 scores is 20. If you know the first 3 scores, scores 4 and 5 could be almost any combination If you know the first 4 scores, the 5th score is determined --- here it must be 3

Correlations – Degrees of freedom There are no degrees of freedom when our sample size is 2. When there are only two points on a scatterplot, they will fit perfectly on a straight line. Thus, for correlations df = n – 2 For correlations, the degrees of freedom = n – 2 With n = 2 (essentially as sample size of two people), there are no degrees of freedom. When there are only two points, they will ALWAYS fit perfectly on a straight line. Because the first two points always produce a perfect correlation – the sample is free to vary only when the data set contains more than two points.

Using table to determine significance Find degrees of freedom Correlations: df = n – 2 Use level of significance (e.g., a = .05) for two-tailed test to find column in Table Determine critical value Value calculated r must equal or exceed to be significant Compare calculated r w/ critical value If calculated r less than critical value = not significant APA The correlation between hours watching television and amount of aggression is not significant, r (3) = -.80, p > .05. In order to determine statistical significant consult a table – like the one found here - http://www.statisticssolutions.com/table-of-critical-values-pearson-correlation/ Step 1: Examine the df column (N – 2). N stands for the number of pairs of scores. (Example, 5 pairs of scores, then 3 degrees of freedom) Step 2: Find the column for .05, two-tailed test Table for critical values for the Pearson correlation Step 3: Determine the critical value (the value that our calculated r must equal or exceed to be significant). For example, the critical value = .8783 Step 4: Compare your calculated r with the table value to determine significance For example, r = .875, which is LESS than .8783. Not significant. Step 5: Write an APA format conclusion. IF NOT SIGNIFICANT: The correlation between hours watching television and amount of aggression is not significant, r(3) = .88, p > .05. IF SIGNIFICANT: The more hours people spend watching television, the greater the amount of aggression, r(3) = .90, p > .05. OR there was a strong positive correlation between hours spent watching television and aggression amounts, r(3) = .900, p > .05. Notice that our correlation coefficient of .875 is statistically significant, but the correlation is strong. This is because our sample size is so small. Notice, the effect that sample size has on statistical significance. Think about sample size

Difference between pair of ranks Spearman correlation Used when: Ordinal data If 1 variable is on ratio scale, then change scores for that variable into ranks Difference between pair of ranks Spearman correlation is another type of correlation coefficient. When both of our variables are on an ordinal scale of measurement, then Spearman is the appropriate correlation. It can also be used when we have one ordinal level variable in one interval/ration level variable. We would actually need to first convert the interval/ratio variable into an ordinal level variable. It can also be calculated for nonlinear relationships – “measure consistency of a relationship between x and y, independent of the specific form of the relationship.”

Example #3: Spearman correlation 1st race 2nd race 4 3 1 2 9 8 6 5 7 Try calculating this correlation. First, notice that we have two variables place in race 1 and place in race 2. There are nine individuals who ran in two races and we would like to know if there is a relationship between where they place in the first race and where they place in the second race. Place is on an ordinal scale of measurement.

Example #3: Spearman - Answers 1st race 2nd race 4 3 1 2 9 8 6 5 7 D 1 -1 2 -2 D2 1 4 In terms of calculation, we determine the difference (D) between places for the two races by subtracting the two columns. Next, we square the difference (D2). Then we find the sum of D2 Next, plug the numbers into the formula. Following our order of operations. We multiply the sum of D2 (18) by 6 = 108. Next, we tackle the bottom half of the formula by squaring n, which is 9, = 81. Subtract 1 from 81 to get 80. Multiply 80 by n (9) to get 720. Divide 108 by 720 to get .15. Subtract .15 from 1 to get .85.

Example #4: Spearman Correlation Two movie raters each watched the same six movies. Is there are relationship between the raters’ rankings? Example Rater 1 Rater 2 1 6 2 4 3 5 Try this problem.

Example #4: Spearman - answers Rater 1 Rater 2 1 6 2 4 3 5 D D2 -5 25 -2 4 1 3 9 5 Here are the answers.

Spearman rs (from SPSS) Pearson r (from SPSS) Here you can see the SPSS output for both Pearson r and Spearman. In the first box, you can see that correlation between anxiety and noise is .87 (strong, positive correlation) and that it is statistically significant at the .001 level. Spearman rs (from SPSS)

Example #5: Pearson’s r Participant Motivation (X) Depression (Y) 1 3 8 2 6 4 3 9 2 4 2 2 Sketch a scatterplot. Calculate the correlation coefficient. Determine if it is statistically significant at the .05 level for a 2-tailed test. Write an APA format conclusion.