Copyright © 2005 by Evan Schofer

Slides:



Advertisements
Similar presentations
Tests of Significance and Measures of Association
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Bivariate Analyses.
Chapter 13: The Chi-Square Test
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
PSY 307 – Statistics for the Behavioral Sciences
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Don’t spam class lists!!!. Farshad has prepared a suggested format for you final project. It will be on the web
Chi-square Test of Independence
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Hypothesis Testing Using The One-Sample t-Test
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Inferential Statistics
Chapter 14 in 1e Ch. 12 in 2/3 Can. Ed. Association Between Variables Measured at the Ordinal Level Using the Statistic Gamma and Conducting a Z-test for.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Copyright © 2005 by Evan Schofer
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Hypothesis Testing II The Two-Sample Case.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Chapter 9: Testing Hypotheses
1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ.
Sociology 5811: Lecture 14: ANOVA 2
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
© Copyright McGraw-Hill 2004
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
BUSINESS MATHEMATICS & STATISTICS. LECTURE 44 Hypothesis testing : Chi-Square Distribution Part 2.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Hypothesis Testing Review
The Chi-Square Distribution and Test for Independence
Chapter 10 Analyzing the Association Between Categorical Variables
Inference on Categorical Data
Analyzing the Association Between Categorical Variables
Presentation transcript:

Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Final project proposals due Nov 15 Get started now!!! Find a dataset figure out what hypotheses you might test Today: Wrap up Crosstabs If time remains, we’ll discuss project ideas…

Review: Chi-square Test Chi-Square test is a test of independence Null hypothesis: the two categorical variables are statistically independent There is no relationship between them H0: Gender and political party are independent Alternate hypothesis: the variables are related, not independent of each other H1: Gender and political party are not independent Test is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables.

Review: Expected Cell Values If two variables are independent, cell values will depend only on row & column marginals Marginals reflect frequencies… And, if frequency is high, all cells in that row (or column) should be high The formula for the expected value in a cell is: fi and fj are the row and column marginals N is the total sample size

Review: Chi-square Test The Chi-square formula: Where: R = total number of rows in the table C = total number of columns in the table Eij = the expected frequency in row i, column j Oij = the observed frequency in row i, column j Assumption for test: Large N (>100) Critical value DofF: (R-1)(C-1).

Chi-square Test of Independence Example: Gender and Political Views Let’s pretend that N of 68 is sufficient Women Men Democrat O11: 27 E11: 23.4 O12 : 10 E12 : 13.6 Republican O21 : 16 E21 : 19.6 O22 : 15 E22 : 11.4

Chi-square Test of Independence Compute (E – O)2 /E for each cell Women Men Democrat (23.4 – 27)2/23.4 = .55 (13.6 – 10)2/13.6 = .95 Republican (19.6 – 16)2/19.6 = .66 (11.4 – 15)2/15 = .86

Chi-Square Test of Independence Finally, sum up to compute the Chi-square c2 = .55 + .95 + .66 + .86 = 3.02 What is the critical value for a=.05? Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1 According to Knoke, p. 509: Critical value is 3.84 Question: Can we reject H0? No. c2 of 3.02 is less than the critical value We cannot conclude that there is a relationship between gender and political party affiliation.

Chi-square Test of Independence Weaknesses of chi-square tests: 1. If the sample is very large, we almost always reject H0. Even tiny covariations are statistically significant But, they may not be socially meaningful differences 2. It doesn’t tell us how strong the relationship is It doesn’t tell us if it is a large, meaningful difference or a very small one It is only a test of “independence” vs. “dependence” Measures of Association address this shortcoming.

Measures of Association Separate from the issue of independence, statisticians have created measures of association They are measures that tell us how strong the relationship is between two variables Weak Association Strong Association Women Men Dem. 51 49 Rep. Women Men Dem. 100 Rep.

Crosstab Association:Yule’s Q Appropriate only for 2x2 tables (2 rows, 2 columns) Label cell frequencies a through d: a b c d Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship. Yule’s Q captures that in a measure 0 = no association. -1, +1 = strong association

Crosstab Association:Yule’s Q Rule of Thumb for interpreting Yule’s Q: Bohrnstedt & Knoke, p. 150 Absolute value of Q Strength of Association 0 to .24 “virtually no relationship” .25 to .49 “weak relationship” .50 to .74 “moderate relationship” .75 to 1.0 “strong relationship”

Crosstab Association:Yule’s Q Example: Gender and Political Party Affiliation Women Men Dem 27 10 Rep 16 15 Calculate “bc” bc = (10)(16) = 160 a b c d Calculate “ad” ad = (27)(15) = 405 -.48 = “weak association”, almost “moderate”

Association: Other Measures Phi () Very similar to Yule’s Q Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc. Gamma (G) Based on a very different method of calculation Not limited to 2x2 tables Requires ordered variables Tau c (tc) and Somer’s d (dyx) Same basic principle as Gamma Several Others discussed in Knoke, Norusis.

Crosstab Association: Gamma Gamma, like Q, is based on comparing “diagonal” to “off-diagonal” cases. But, it does so differently Jargon: Concordant pairs: Pairs of cases where one case is higher on both variables than another case Discordant pairs: Pairs of cases for which the first case (when compared to a second) is higher on one variable but lower on another

Crosstab Association: Gamma Example: Approval of candidates Cases in “Love Trees/Love Guns” cell make concordant pairs with cases lower on both Hate Trees Trees OK Love Trees Love Guns 1205 603 71 Guns = OK 659 1498 452 Hate Guns 431 467 1120 All 71 individuals can be a pair with everyone in the lower cells. Just Multiply! (71)(659+1498+ 431+467) = 216,905 conc. pairs

Crosstab Association: Gamma More possible concordant pairs The “Love Guns/Trees are OK” cell and the “Trees = OK/Love Guns” cells also can have concordant pairs These 603 can pair with all those that score lower on approval for Guns & Trees (603)(659 + 431) = 657,270 conc. pairs Hate Trees Trees = OK Love Trees Love Guns 1205 603 71 Guns = OK 659 1498 452 Hate Guns 431 467 1120 These can pair lower too! (452)(431 + 467) = 405,896 conc. pairs

Crosstab Association: Gamma Discordant pairs: Pairs where a first person ranks higher on one dimension (e.g. approval of Trees) but lower on the other (e.g., app. of Guns) Hate Trees Trees = OK Love Trees Love Guns 1205 603 71 Guns = OK 659 1498 452 Hate Guns 431 467 1120 The top-left cell is higher on Guns but lower on Trees than those in the lower right. They make pairs: (1205)(1498 + 452 + 467 + 1120) = 4,262,085 discordant pairs

Crosstab Associaton: Gamma If all pairs are concordant or all pairs are discordant, the variables are strongly related If there are an equal number of discordant and concordant pairs, the variables are weakly associated. Formula for Gamma: ns = number of concordant pairs nd = number of discordant pairs

Crosstab Association: Gamma Calculation of Gamma is typically done by computer Zero indicates no association +1 = strong positive association -1 = strong negative association It is possible to do hypothesis tests on Gamma To determine if population gamma differs from zero Requirements: random sample, N > 50 See Knoke, p. 155-6.

Crosstab Association Final remarks: You have a variety of possible measures to assess association among variables. Which one should you use? Yule’s Q and Phi require a 2x2 table Larger ordered tables: use Gamma, Tau-c, Somer’s d Ideally, report more than one to show that your findings are robust.

Odds Ratios Odds ratios are a powerful way of analyzing relationships in crosstabs Many advanced categorical data analysis techniques are based on odds ratios Review: What is a probability? p(A) = # of outcomes that are “A” divided by total number of outcomes To convert a frequency distribution to a probability distribution, simply divide frequency by N The same can be done with crosstabs: Cell frequency over N is probability.

Odds Ratios If total N = 68, probability of drawing cases is: Women Dem 27 / 68 10 / 68 Rep 16 / 68 15 / 68 Women Men Dem .397 .147 Rep .235 .220

Odds Ratios Odds are similar to probability… but not quite Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A Note: Denominator is different that probability Ex: Probability of rolling 1 on a 6-sided die = 1/6 Odds of rolling a 1 on a six-sided die = 1/5 Odds can also be calculated from probabilities:

Odds Ratios Conditional odds = odds of being in one category of a variable within a specific category of another variable Example: For women, what are the odds of being democrat? Instead of overall odds of being democrat, conditional odds are about a particular subgroup in a table Women Men Dem 27 10 Rep 16 15 Conditional odds of being democrat are: 27 / 16 = 1.69 Note: Odds for women are different than men

Odds Ratios If variables in a crosstab are independent, their conditional odds are equal Odds of falling into one category or another are same for all values of other variable If variables in a crosstab are associated, conditional odds differ Odds can be compared by making a ratio Ratio is equal to 1 if odds are the same for two groups Ratios much greater or less than 1 indicate very different odds.

Odds Ratios c d Formula for Odds Ratio in 2x2 table: Women Men Dem 27 10 Rep 16 15 a b c d Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395 Interpretation: men have .395 times the odds of being a democrat compared to women Inverted value (1/.395=2.5) indicates odds of women being democrat = 2.5 is times men’s odds

Odds Ratios: Final Remarks 1. Cells with zeros cause problems for odds ratios Ratios with zero in denominator are undefined. Thus, you need to have full cells 2. Odds ratios can be used to measure assocation Indeed, Yule’s Q is based on them 3. Odds ratios form the basis for most advanced categorical data analysis techniques For now it may be easier to use Yule’s Q, etc. But, if you need to do advanced techniques, you will use odds ratios.

Tests for Difference in Proportions Another approach to small (2x2) tables: Instead of making a crosstab, you can just think about the proportion of people in a given category More similar to T-test than a Chi-square test Ex: Do you approve of Pres. Bush? (Yes/No) Sample: N = 86 women, 80 men Proportion of women that approve: PW = .70 Proportion of men that approve: PM = .78 Issue: Do the populations of men/women differ? Or are the differences just due to sampling variability

Tests for Difference in Proportions Hypotheses: Again, the typical null hypothesis is that there are no differences between groups Which is equivalent to statistical independence H0: Proportion women = proportion men H1: Proportion women not = proportion men Note: One-tailed directional hypotheses can also be used.

Tests for Difference in Proportions Strategy: Figure out the sampling distribution for differences in proportions Statisticians have determined relevant info: 1. If samples are “large”, the sampling distribution of difference in proportions is normal The Z-distribution can be used for hypothesis tests 2. A Z-value can be calculated using the formula:

Tests for Difference in Proportions Standard error can be estimated as: Where:

Difference in Proportions: Example Q: Do you approve of Pres. Bush? (Yes/No) Sample: N = 86 women, 80 men Women: N = 86, PW = .70 Men: N = 80, PW = .78 Total N is “Large”: 166 people So, we can use a Z-test Use a = .05, two-tailed Z = 1.96

Difference in Proportions: Example Use formula to calculate Z-value And, estimate the Standard Error as:

Difference in Proportions: Example First: Calculate Pboth:

Difference in Proportions: Example Plug in Pboth=.739:

Difference in Proportions: Example Finally, plug in S.E. and calculate Z:

Difference in Proportions: Example Results: Critical Z = 1.96 Observed Z = .739 Conclusion: We can’t reject null hypothesis Women and Men do not clearly differ in approval of Bush