Presentation is loading. Please wait.

Presentation is loading. Please wait.

Different Scales, Different Measures of Association

Similar presentations


Presentation on theme: "Different Scales, Different Measures of Association"— Presentation transcript:

1 Different Scales, Different Measures of Association
Scale of Both Variables Measures of Association Nominal Scale Pearson Chi-Square: χ2 Ordinal Scale Spearman’s rho Interval or Ratio Scale Pearson r

2 Chi-Square (χ2) and Frequency Data
Up to this point, the inference to the population has been concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer. We used these scores to make the inferences about population means. To be sure not all research questions involve score data. Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale. The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

3 1. Determine Appropriate Test
Chi Square is used when both variables are measured on a nominal scale. It can be applied to interval or ratio data that have been categorized into a small number of groups. It assumes that the observations are randomly sampled from the population. All observations are independent (an individual can appear only once in a table and there are no overlapping categories). It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.

4 Non Parametric Test Population is not normally distributed
Highly skewed Option 1: Increase Sample Size Option 2: Use Non Parametric test Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis.

5 Assumptions / Limitations
Data is from a random sample. A sufficiently large sample size is required (at least 20) Actual count data (not percentages) Adequate cell sizes should be present. (>5 in all cells- if less number present apply Yates correction) Observations must be independent.

6 2. Establish Level of Significance
α is a predetermined value The convention α = .05 α = .01 α = .001

7 3. Determine The Hypothesis: Whether There is an Association or Not
Ho : The two variables are independent Ha : The two variables are associated

8 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Mean difference between pairs of values

9 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Observed frequencies Expected frequency Mean difference between pairs of values Expected frequency

10 5. Determine Degrees of Freedom
df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable

11 6. Compare computed test statistic against a tabled/critical value
The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable The critical tabled values are based on sampling distributions of the Pearson chi-square statistic If calculated 2 is greater than 2 table value, reject Ho

12 Example Suppose a researcher is interested in voting preferences on Food Bill. A questionnaire was developed and sent to a random sample of 90 voters. The researcher also collects information about the political party membership of the sample of 90 respondents.

13 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row Congress 10 30 50 BJP 15 40 f column 25 n = 90

14 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row Congress 10 30 50 BJP 15 40 f column 25 n = 90 Observed frequencies

15 Bivariate Frequency Table or Contingency Table
Row frequency Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Congress 10 30 50 BJP 15 40 f column 25 n = 90

16 Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row Congress 10 30 50 BJP 15 40 f column 25 n = 90 Column frequency

17 1. Determine Appropriate Test
Party Membership ( 2 levels) and Nominal Voting Preference ( 3 levels) and Nominal

18 2. Establish Level of Significance
Alpha of .05

19 3. Determine The Hypothesis
Ho : There is no association between responses to the Food Bill survey and the party membership in the population. Ha : There is an association between responses to the Food Bill survey and the party membership in the population.

20 4. Calculating Test Statistics
Favor Neutral Oppose f row Congress fo =10 fe =13.9 fo =30 fe=22.2 50 BJP fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90

21 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Congress fo =10 fe =13.9 fo =30 fe=22.2 50 BJP fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 50*25/90

22 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Favor Neutral Oppose f row Democrat fo =10 fe =13.9 fo =30 fe=22.2 50 Republican fo =15 fe =11.1 fe =17.8 40 f column 25 n = 90 = 40* 25/90

23 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics =

24 5. Determine Degrees of Freedom
df = (R-1)(C-1) = (2-1)(3-1) = 2

25 6. Compare computed test statistic against a tabled/critical value
α = 0.05 df = 2 Critical tabled value = 5.991 Test statistic, 11.03, exceeds critical value Null hypothesis is rejected Congress and BJP differ significantly in their opinions on Food Bill.

26 The chi-square test will be used to test for the "goodness to fit" between observed and expected data The manager of ABC ice-cream parlour has to take a decision regarding how much of each flavor of ice-cream he should stock so that the demands of the customer are satisfied. The ice-cream supplies claims that among the four most popular flavors, 62% customers prefer vanilla,18% choclate,12% strawberry and 8 % mango. A random sample of 200 customers produce the result below. At the alpha=0.05 significance level , test the claim that the percentages given by the supplies are correct? Flavor Vanilla Chocolate Strawberry Mango Number Preferring 120 40 18 22

27 Flavor Observed Frequency Expected Frequency O-E (O-E)2 (O-E)2/E Vanilla 120 124 -4 16 0.129 Chocolate 40 36 4 0.444 Strawberry 18 24 -6 1.500 Mango 22 6 2.250 Total 4.323 The computed value of chi-square is 4.323 % degree of freedom % degree of freedom % degree of freedom H0-There is no difference between observed frequency and expected frequency. H1-There is a difference between observed frequency and expected frequency.

28 SPSS Output for Gun Control Example

29 Additional Information in SPSS Output
Exceptions that might distort χ2 Assumptions Associations in some but not all categories Low expected frequency per cell Extent of association is not same as statistical significance Demonstrated through an example

30 Another Example Heparin Lock Placement
Time: 1 = 72 hrs 2 = 96 hrs from Polit Text: Table 8-1

31 Hypotheses in Heparin Lock Placement
Continued Hypotheses in Heparin Lock Placement Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent). Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related).

32 Continued More of SPSS Output

33 Pearson Chi-Square Pearson Chi-Square = .250, p = .617
Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to heparin lock placement time. Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.

34 Continued More SPSS Output

35 Phi Coefficient Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship Phi coefficient is the measure of the strength of the association

36 Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.

37 Smallest of number of rows or columns
Cramer’s V When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Number of cases Smallest of number of rows or columns

38 How to Test Association between Frequency of Two Nominal Variables
Take Home Lesson How to Test Association between Frequency of Two Nominal Variables


Download ppt "Different Scales, Different Measures of Association"

Similar presentations


Ads by Google