Contingency Tables and Association Lesson 12 - 2 Contingency Tables and Association
Objectives Compute the marginal distribution of a variable Use the conditional distribution to identify association among categorical data
Vocabulary Contingency Table – relates to categories of data Marginal Distribution – a frequency or relative frequency of either the row or column variable in the contingency table Conditional Distribution – lists the relative frequency of each category of a variable, given a specific value of the other variable in the contingency table.
Requirements To describe the association between two categorical variables, relative frequencies (percentages) must be used, because there will likely be different numbers of observations for each of the categories
Contingency Table Cat 2 Cat1 Plain Peanut Almond Total Red Blue Yellow Conditional distributions Cat 2 Cat1 Plain Peanut Almond Total Red Blue Yellow Brown Green Totals by color of M&M Totals by type of M&M Marginal distributions
Conditional Distribution graph Problem #11, page 650 Abortions in thousands completed in a year, by age and year Percentages in total cells represents the marginal distributions Percentages in other cells represent the conditional distributions (cell ⁄column total) Total numbers going down; conditional % under 19 decreasing; conditional % over 25 increasing; conditional % between 20 – 24 pretty constant Conditional Distribution graph Age (yrs) Year Total 1990 1995 2000 ≤ 19 369 22.86% 274 20.15% 244 18.57% 887 20.69% 20 - 24 532 32.96% 441 32.43% 430 32.72% 1403 32.72% ≥ 25 713 44.18% 645 47.43% 640 48.71% 1998 46.60% 1614 37.64% 1360 31.72% 1314 30.64% 4288 An alternative graph
Summary and Homework Summary Homework Contingency tables are categorical data that have a specific structure A row variable A column variable There are counts associated with each combination of row variable value and column variable value Various row and column totals and row and column frequencies can be used to summarize this data Homework pg 647 – 651: 1, 3, 4, 7, 11, 13
Comments Since the data is population data no inferential statistical comparisions are done. Since many of the data is observational, beware of making any statements regarding causations.
Even Homework Answers 4: since each category could have different total numbers in them, the only safe way to compare is through percentages (or relative frequencies).