Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analysis of Categorical Variables

Similar presentations


Presentation on theme: "Statistical Analysis of Categorical Variables"— Presentation transcript:

1 Statistical Analysis of Categorical Variables
Cross Tabulation Statistical Analysis of Categorical Variables

2 To date…. We have examined statistical tests for differences of means and proportions The statistics are measured at the interval level.

3 New Test… Now we wish to examine statistical tests for questions involving nominal and ordinal variables. To do so we introduce the Chi Square Test.

4 Cross Tabulation We are interested in the counting the number of cases for the categories of one variable in terms of the categories of a second variable, and…. Implicitly, we are asking if there are differences in the patterns of the counts….

5 Cross Tabulation and Chi Square Test
A cross tabulation cross classifies one variable by another variable. Below is a cross classification of occupational groups and wards for the Simon data for 1905.

6 Cross Tabulation and Chi Square Test
We count the number of cases in each occupational category for each ward. At the edges of the table we total the rows and columns.

7 Graphic Illustration of the Counts of Occupational Groups by Ward

8 The Simplest Example….a 2 by 2 Table
Do the opinions of men and women differ on the War in Iraq? Do the opinions of men and women differ on the importance of capturing Osama Bin Laden? Data: September 2006 ABC News Poll on the War on Terror. A Sample of about 1000 respondents.

9 Basic Frequencies

10 Basic Frequencies Broken Down by Gender

11 Another Graphical Illustration: 4 Bins of Counts

12 Tabular Data

13 Some Terms and Assumptions
Cell frequency: number in the body of the table Marginal total: total of the row or the column Row percent: the proportion of cases in the cell for the particular row. Column percent: the proportion of cases in the cell for the particular column Expected frequency: the number of cases expected based upon the marginal proportions Deviation: the difference between the expected frequency and the actual frequency

14 Tabular Data Cell Frequency Marginals
Q12 War worth fighting NET * Q921 GENDER Crosstabulation Count Q921 GENDER Male Female Total Q12 War worth Worth fighting NET 213 217 430 fighting NET Not worth fighting NET 245 307 552 Total 458 524 982 Marginals

15 213 245 217 307 Table Counts and Graph

16 Row Percents

17 Column Percents

18 Frequencies, Row and Column Percents

19 New Concept: Expected Frequencies
What would the counts in the cells be if there was no impact of gender on attitudes towards the Iraq War? The marginal proportions would define the cell counts.

20 Expected Frequencies Row Total * Column Total/ Grand Total Or…
Row Proportion * Column Total Column Proportion * Row Total

21 Another Example: The Importance of Capturing Osama Bin Laden

22

23 Frequencies by Gender

24 Frequencies By Gender

25 Row Percents

26 Column Percents

27

28 Expected Frequencies

29 Actual Frequencies, Expected Frequencies, and Deviations (Residual)

30 Chi Square Chi Square = Sum of [ (Expected – Observed)2 / Expected Frequency ] Chi Square Table:

31 Examples of Chi Square Distribution
see also

32 Degrees of Freedom for Chi Square
Degrees of Freedom = (r-1)* (c-1) So, 2 by 2 table has 1 degree of freedom 3 by 2 table has (3-1)(2-1)= 2 degrees of freedom

33 Calculations: Catching Osama bin Laden by Gender
640.09/198.3 = 640.09/220.7 = 640.09/251.7 = 640.09/280.3 = Chi Square (SUM) =

34 Attitudes toward Iraq War by Gender

35 Chi Square Test For a larger table, calculation is the same, but the number of terms increases. The number of terms is equal to the number of cells.

36 Concentration of Occupational Groups by Ward

37 Cross Tabulation Are the occupational patterns different in the four wards? Or….are the patterns a result of chance? (null hypothesis) How would we decide?

38 Illustration: Frequencies and Marginals

39 Row and Column Percents

40 Expected and Actual Frequencies
OCC$ (rows) by WARD (columns) Total profcler | | prop | | skilled | | skillpart | | unskilled | | Total Expected values OCC$ (rows) by WARD (columns) profcler | | prop | | skilled | | skillpart | | unskilled | |

41 Deviates Deviates: (Observed-Expected) OCC$ (rows) by WARD (columns)
profcler | | prop | | skilled | | skillpart | | unskilled | |

42 Calculations Case number OCC$ WARD FREQUENCY EXPECTED RESIDUAL CHITERM
1 profcler 2 profcler 3 profcler 4 profcler 5 prop 6 prop 7 prop 8 prop 9 skilled skilled skilled skilled skillpart skillpart skillpart skillpart unskilled unskilled unskilled unskilled

43 Review: Terms and Assumptions
Cell frequency: number in the body of the table Marginal total: total of the row or the column Row percent: the proportion of cases in the cell for the particular row. Column percent: the proportion of cases in the cell for the particular column Expected frequency: the number of cases expected based upon the marginal proportions Deviation: the difference between the expected frequency and the actual frequency

44 Strength of Relationships
Phi: Square root of (Chi Square/N) Cramer’s V: Square root of (Chi Square/n*min(r-1, c-1)) Contingency Coefficient: Square root of (Chi Square/(Chi Square+n))

45

46

47

48


Download ppt "Statistical Analysis of Categorical Variables"

Similar presentations


Ads by Google