Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.

Slides:



Advertisements
Similar presentations
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Advertisements

Contingency Table Analysis Mary Whiteside, Ph.D..
POL242 October 9 and 11, 2012 Jennifer Hove. Questions of Causality Recall: Most causal thinking in social sciences is probabilistic, not deterministic:
2013/12/10.  The Kendall’s tau correlation is another non- parametric correlation coefficient  Let x 1, …, x n be a sample for random variable x and.
Measures of Association for contingency tables 4 Figure 8.2 : lambda – association; +-1: strong; near 0: weak Positive association: as value of the independent.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Basic Statistics The Chi Square Test of Independence.
Bivariate Analysis Cross-tabulation and chi-square.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Chapter 13: The Chi-Square Test
Three important questions Three important questions to ask: 1. Whether column % change? 2. Is the relationship significant? (.05 as chi square significance.
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Association Predicting One Variable from Another.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Session 7.1 Bivariate Data Analysis
Chi Square Test Dealing with categorical dependant variable.
Chi-square Test of Independence
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Problem 1: Relationship between Two Variables-1 (1)
Chapter 14 in 1e Ch. 12 in 2/3 Can. Ed. Association Between Variables Measured at the Ordinal Level Using the Statistic Gamma and Conducting a Z-test for.
Week 11 Chapter 12 – Association between variables measured at the nominal level.
Measures of Association for Contingency Tables. Measures of Association General measures of association that can be used with any variable types. Measures.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Copyright © 2005 by Evan Schofer
Lecture 8 Chi-Square STAT 3120 Statistical Methods I.
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
230 Jeopardy Unit 4 Chi-Square Repeated- Measures ANOVA Factorial Design Factorial ANOVA Correlation $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500.
1 Measuring Association The contents in this chapter are from Chapter 19 of the textbook. The crimjust.sav data will be used. cjsrate: RATE JOB DONE: CJ.
INFO 515Lecture #91 Action Research More Crosstab Measures INFO 515 Glenn Booker.
In the Lab: Working With Crosstab Tables Lab: Association and the Chi-square Test Chapters 7, 8 and 9 1.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
T- and Z-Tests for Hypotheses about the Difference between Two Subsamples.
1 Lecture 7 Two-Way Tables Slides available from Statistics & SPSS page of Social Science Statistics Module I Gwilym Pryce.
Non-parametric Measures of Association. Chi-Square Review Did the | organization| split | Type of leadership for organization this year? | Factional Weak.
1 Lecture 7: Two Way Tables Graduate School Quantitative Research Methods Gwilym Pryce
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Chapter 11, 12, 13, 14 and 16 Association at Nominal and Ordinal Level The Procedure in Steps.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
ANALYSIS PLAN: STATISTICAL PROCEDURES
The Pearson Product-Moment Correlation Coefficient.
March 30 More examples of case-control studies General I x J table Chi-square tests.
Practice Problem: Lambda (1)
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
Answers to Practice Questions Lambda #11.4 (2/3e) or 13.4 (1e) Gamma #12.4 (2/3e) or 14.4 (1e)
Measures of Association June 25, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
Copyright © 2012 by Nelson Education Limited. Chapter 12 Association Between Variables Measured at the Ordinal Level 12-1.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 9 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Association Between Variables Measured at the Ordinal Level
Final Project Reminder
Final Project Reminder
Chapter 11 Chi-Square Tests.
Chapter 14 in 1e Ch. 12 in 2/3 Can. Ed.
Summarising and presenting data - Bivariate analysis
Nominal/Ordinal Level Measures of Association
Nominal/Ordinal Level Measures of Association
Chapter 11 Chi-Square Tests.
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Chapter 11 Chi-Square Tests.
Presentation transcript:

Describing Association for Discrete Variables

Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories

1. Ordered categories e.g., “High,” “Medium,” and “Low” [both variables must be ordered] 2. Non-ordered categories e.g., “Yes” and “No”

Relationships between two variables may be either 1. symmetrical or 2. asymmetrical

Symmetrical means that we are only interested in describing the extent to which two variables “hang around together” [non-directional] Symbolically, X  Y

Asymmetrical means that we want a measure of association that yields a different description of X’s influence on Y from Y’s influence on X [directional] Symbolically, X  Y Y  X

Ordered Categories Asymmetrical Relationship No Yes Yule’s Q Cramer’s V Gamma (G) Lambda ( ) Somers’ d yx No Yes

For symmetrical relationships between two non-ordered variables, there are two choices: 1. Yule’s Q (for 2x2 tables) 2. Cramer’s V (for larger tables)

Respondents in the 1997 General Social Survey (GSS 1997) were asked: Were they strong supporters of any political party (yes or no)?; and, Did they vote in the 1996 presidential election (yes or no)? Party Identification Not Strong Strong Total Voting Voted a b a + b Turnout Not Voted c d c + d Total a + c b + d a+b+c+d

Party Identification Not Strong Strong Total Voting Voted Turnout Not Voted Total ,331

Q = [(339)(318) - (615)(59)] / [(339)(318) + (615)(59)] = [(107,801) - (36,285)] / [(107,801) + (36,285)] = (71,516) / (144,086) = 0.496

What does this mean? Yule’s Q varies from 0.00 (statistical independence; no association) to (perfect direct association) and – 1.00 (perfect inverse association)

Use the following rule of thumb (for now): 0.00 to 0.24"No relationship" 0.25 to 0.49"Weak relationship" 0.50 to 0.74"Moderate relationship" 0.75 to 1.00"Strong relationship" Yule’s Q = "... represents a moderate positive association between party identification strength and voting turnout."

Party Identification Not Strong Strong Total Voting Voted Turnout Not Voted Total ,331

What would be the value of Yule's Q? Q = [(954)(377) - (0)(0)] / [(954)(377) + (0)(0)] = [(359,658) - (0)] / [(359,658) + (0)] = (359,658) / (359,658) = 1.000

Party Identification Not Strong Strong Total Voting Voted Turnout Not Voted Total ,331

In this case, Yule's Q would be: Q = [(477)(189) - (477)(188)] / [(477)(189) + (477)(188)] = [(90,153) - (89,676)] / [(90,153) + (89,676)] = (477) / (179,829) = 0.003

Obviously Yule's Q can only be calculated for 2 x 2 tables. For larger tables (e.g., 3 x 4 tables having three rows and four columns), most statistical programs such as SAS report the Cramer's V statistic. Cramer's V has properties similar to Yule's Q, but since it is computed from  2 it cannot take negative values: Where min(R – 1) or (C – 1) means either number of rows less one or number of columns less one, whichever is smaller, and N is sample size.

In the example above,  2 = and Cramer's V is = 0.196

For asymmetrical relationships between two non-ordered variables, the statistic of choice is: Lambda ( )

Lambda is calculated as follows: = [(Non-modal responses on Y) - (Sum of non-modal responses for each category of X)] / (Non-modal responses on Y)

Party Identification Not Strong Strong Total Voting Voted Turnout Not Voted Total ,331

In this example, = [(377) - ( )] / (377) = [(377) - (377)] / (377) = (0) / (377) = 0.00

For symmetrical relationships between two variables having ordered categories, the statistic of choice is: Gamma (G)

where n s are concordant pairs and n d are discordant pairs

The concepts of concordant and discordant pairs are simple and are based on a generalization of the diagonal and off-diagonal in the Yule’s Q statistic.

To construct concordant pairs: "Starting with the upper right cell (i.e., the first row, last column in the table), add together all frequencies in cells below AND to the left of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the left) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the concordant pairs."

To illustrate, take the crosstabulation below which shows the relationship between a measure of social class and respondents' satisfaction with their current financial situation: Social Class Financially SatisfiedLowerWorking Middle UpperTotal Very well More or less Not at all Total ,442

Social Class Financially SatisfiedLowerWorking Middle UpperTotal Very well More or less Not at all Total ,442

For this table, the calculations are: 36 x ( ) = 35, x ( ) = 140, x ( ) = 8, x ( ) = 6, x ( ) = 79, x (43) = 13,287 These are NOT the value of the concordant pairs; they are the values that must be added together to determine the value of concordant pairs. n s = (35, , , , , ,287) n s = 283,730

To construct discordant pairs: "Starting with the upper left cell (i.e., the first row, first column in the table), add together all frequencies in cells below AND to the right of this cell, then multiply that sum by the cell frequency. Move to the next cell (i.e., still row one, but now one column to the right) and do the same thing. Repeat until there are NO cells to the left AND below the target cell. Then sum up all these products to form the value for the discordant pairs."

Social Class Financially SatisfiedLowerWorking Middle UpperTotal Very well More or less Not at all Total ,442

For the discordant pairs in this table, the calculations are: 10 x ( ) = 9, x ( ) = 59, x (19 + 7) = 6, x ( ) = 5, x (84 + 7) = 28, x (7) = 2,401 Again, these are NOT the value of the disconcordant pairs; they are the values that must be added together to determine the value of disconcordant pairs. n d = (9, , , , , ,401) n d = 111,248

G = [(283,730) - (111,248)] / [(283,730) + (111,248)] = (172,482) / (394,978) = 0.437

For asymmetrical relationships between two variables having ordered categories, the statistic of choice is: Somers’ d yx

For this crosstabulation, we specify Social Class (the column variable) as the independent variable (X) and Financial Satisfaction (the row variable) as the dependent variable (Y). Social Class (X) Financially Satisfied (Y)LowerWorking Middle UpperTotal Very well More or less Not at all Total ,442

Somers' d yx statistic is created by adjusting concordant and discordant pairs for tied pairs on the dependent variable (Y). In the example we have been using example, the only asymmetrical relationship that makes sense is one with the dependent variable (Y) as the row variable. Therefore Somers' d yx will be shown only for this situation, that is, for tied pairs on the row variable. (Tied pairs for the column variable follow the identical logic.) A tied pair is all respondents who are identical with respect to categories of the dependent variable but who differ on the category of the independent variable to which they belong. In the case of financial satisfaction, it is all respondents who express the same satisfaction level but who identify themselves with different social classes. In other words, for ties for a dependent row variable it is all the observations in the other cells in the same row.

The computational rule is: Target the upper left hand cell (in the first row, first column); multiply its value by the sum of the cell frequencies to right in the same row; move to the cell to the right and multiply its value by the sum of the cell frequencies to right in the same row; repeat until there are no more cells to the right in the same row; then move to the first cell in the next row (first column) and repeat until there are no more cells in the table having cells to the right. Add up these products.

Social Class Financially SatisfiedLowerWorking Middle UpperTotal Very well More or less Not at all Total ,442

Here, the products are: 10 x ( ) = 4, x ( ) = 37, x (36) = 9, x ( ) = 12, x ( ) = 111, x (19) = 6, x ( ) = 12, x (84 + 7) = 17, x (7) = 588 Thus, tied pairs (T r ) for rows equals T r = (4, , , , , , , , ) = 211,898

In this example, Somers' d yx = [(283,730) - (111,248)] / [(283,730) + (111,248) + (211,898)] = (172,482) / (606,976) = 0.284

Ordered Categories Asymmetrical Relationship No Yes Yule’s Q Cramer’s V Gamma (G) Lambda ( ) Somers’ d yx No Yes

Using SAS to Produce Two-Way Frequency Distributions and Statistics Using SAS to Produce Two-Way Frequency Distributions and Statistics libname mystuff 'a:\'; libname library 'a:\'; options formchar='|----|+|---+=|-/\ *' ps=66 nodate nonumber; proc freq data=mystuff.marriage; tables church*married / expected all; title1 ‘Crosstabulation for Discrete Variables'; run;

Crosstabulation for Discrete Variables TABLE OF CHURCH BY MARRIED CHURCH MARRIED Frequency| Expected | Percent | Row Pct | Col Pct |Divorced|Married |Never |Separate|Widowed | Total Annually | 74 | 269 | 129 | 18 | 43 | 533 | | | | | | | 5.09 | | 8.87 | 1.24 | 2.96 | | | | | 3.38 | 8.07 | | | | | | | Monthly | 30 | 149 | 50 | 10 | 26 | 265 | | | | | | | 2.06 | | 3.44 | 0.69 | 1.79 | | | | | 3.77 | 9.81 | | | | | | | Never | 32 | 85 | 34 | 6 | 16 | 173 | | | | | | | 2.20 | 5.85 | 2.34 | 0.41 | 1.10 | | | | | 3.47 | 9.25 | | | | | | 9.70 | Weekly | 34 | 289 | 63 | 17 | 80 | 483 | | | | | | | 2.34 | | 4.33 | 1.17 | 5.50 | | 7.04 | | | 3.52 | | | | | | | | Total

Crosstabulation for Discrete Variables STATISTICS FOR TABLE OF CHURCH BY MARRIED Statistic DF Value Prob Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V Statistic Value ASE Gamma Kendall's Tau-b Stuart's Tau-c Somers' D C|R Somers' D R|C Pearson Correlation Spearman Correlation Lambda Asymmetric C|R Lambda Asymmetric R|C Lambda Symmetric Uncertainty Coefficient C|R Uncertainty Coefficient R|C Uncertainty Coefficient Symmetric Sample Size = 1454

Exercise Compute values for Lambda ( ), Gamma (G) and Somers' d yx for the following two-way frequency distribution. Assume that the row variable, self-described health, is the dependent (Y) variable. Education Degree Level Self-Described Health Less than H.S. H.S. Jr.Co. Col. Grad.Sch. Total Excellent Good Fair Poor Total ,458

Answers 1. The modal responses on Y (self-described health) are 696. Therefore, the non-modal responses are = 762. For each category of self-described health, the non-modal responses total 754. Therefore, Lambda = ( ) / 762 = Concordant pairs (n s ) = 320,060 and discordant pairs (n d ) = 130,272 Gamma = ( ) / ( ) = / = Tied pairs (T r ) = 227,737 Therefore, Somers' d yx = ( ) / ( ) = / = 0.280