Download presentation
1
Chi Square & Correlation
2
Nonparametric Test of Chi2
Used when too many assumptions are violated in T-Tests: Sample size too small to reflect population Data are not continuous and thus not appropriate for parametric tests based on normal distributions. χ2 is another way of showing that some pattern in data is not created randomly by chance. X2 can be one or two dimensional. X2 deals with the question of whether what we observed is different from what is expected
3
Calculating X2 What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence) Male Female Voted for Bush 25 50 Voted for Kerry 50 50 50 100 NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS
4
Calculating X2 What would a contingency table look like if a perfect relationship exists between gender and voting for Bush? Male Female Voted for Bush 50 Voted for Kerry
5
Calculating the expected value
The expected frequency of the cell in the ith row and jth column Fi = The total in the ith row marginal Fj = The total in the jth column marginal N = The grand total, or sample size for the entire table Expected Voted for Bush = 50x50 / 100 = 25
6
Nonparametric Test of Chi2
Again, the basic question is what you are observing in some given data created by chance or through some systematic process? O= Observed frequency E= Expected frequency
7
Nonparametric Test of Chi2
The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K). Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.
8
Let’s do a X2 50 (50-25)2/25=25 (0 - 25)2 /25=25 (50-25)2 /25=25
Male Female Voted for Bush 50 Voted For Kerry What would X2 be when there is statistical independence?
9
Let’s corroborate with SPSS
10
Testing for significance
How do we know if the relationship is statistically significant? We need to know the df (df= (R-1) (C-1) ) (2-1)(2-1)= 1 We go to the X2 distribution to look for the critical value (CV= 3.84) We conclude that the relationship gender and voting is statistically significant. Male Female Voted for Bush 20 30 Voted for Kerry X2= 4
11
When is X2 appropriate to use?
X2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data Nominal X nominal (gender and voting preferences) Nominal and ordinal (gender and opinion for W)
12
X2 can also be used with larger tables
Opinion of Bush MALE FEMALE Favorable 40 5 Indifferent 10 20 Unfavorable 15 55 45 (19.4) (15.8) 30 (.88) (.72) 70 (8.6) (6.9) 65 80 145 X2= Do we reject the null hypothesis?
13
Correlation (Does not mean causation)
We want to know how two variables are related to each other Does eating doughnuts affect weight? Does spending more hours studying increase test scores? Correlation means how much two variables overlap with each other
14
Types of Correlations X (cause) Y (effect) Correlation Values
Increases Positive 0 to1 Decreases 0 to 1 Negative -1 to 0 Increase Does not change Independent
15
Conceptualizing Correlation
Measuring Development Strong Weak GPD POP WEIGHT GDP EDUCATION Correlation will be associated with what type of validity?
16
Correlation Coefficient
17
Home Value & Square footage
Log value Log sqft value2 sqft2 Val * sqft 5.13 4.02 5.2 4.54 27.04 23.608 4.53 3.53 4.79 3.8 14.44 18.202 4.78 3.86 4.72 4.17 29.15 23.92 141.95 95.96 116.56
18
Correlation Coefficient
19
Rules of Thumb Size of correlation coefficient General Interpretation
Very Strong Strong Moderate Weak Very Weak or no relationship
20
Multiple Correlation Coefficients
21
Limitation of correlation coefficients
They tell us how strong two variables are related However, r coefficients are limited because they cannot tell anything about: Causation between X and Y Marginal impact of X on Y What percentage of the variation of Y is explained by X Forecasting Because of the above Ordinary Least Square (OLS) is most useful
22
Do you have the BLUES? B for Best (Minimum error) L for Linear (The form of the relationship) U for Un-bias (does the parameter truly reflect the effect?) E for Estimator
23
Home value and sq. Feet Does the above line meet the BLUE criteria?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.