Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents.

Similar presentations


Presentation on theme: "Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents."— Presentation transcript:

1 Unit 3 Relations in Categorical Data

2 Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents to describe information falling into various categories

3 Contingency (Two-Way) Tables A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. –Example: we can examine the class of ticket and whether a person survived the Titanic:

4 Contingency Tables (cont.) The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable.

5 Contingency Tables (cont.) Each cell of the table gives the count for a combination of values of the two values. –For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk.

6 Conditional Distributions A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. –The following is the conditional distribution of ticket Class, conditional on having survived:

7 Conditional Distributions (cont.) –The following is the conditional distribution of ticket Class, conditional on having perished:

8 Conditional Distributions (cont.) The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions:

9 Conditional Distributions (cont.) We see that the distribution of Class for the survivors is different from that of the non-survivors. This leads us to believe that Class and Survival are associated and are not independent. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable.

10 Segmented Bar Charts A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Question: Why a bar chart and not a histogram?

11 Example A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level. Find the marginal distribution of blood pressure level. Age BP Under 3030-49Over 50 Low273831 Normal489092 High235972

12 Example A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level. Find the conditional distribution of blood pressure level for employees under 30. Age BP Under 3030-49Over 50 Low273831 Normal489092 High235972

13 Simpson’s Paradox Example: Surgical survival rates –Hospital A and Hospital B both serve your community. Which is better? Compare. Hospital AHospital B Died6316 Survived2037784

14 Simpson’s Paradox Example: Surgical survival rates –Hospital A and Hospital B both serve your community. Which is better? Compare. Hospital AHospital B Died6316 Survived2037784 Hospital AHospital B Died6316 Survived2037784 Total2100800

15 Simpson’s Paradox Example: Surgical survival rates –Hospital A and Hospital B both serve your community. Which is better? Compare. Hospital A loses 3% (63/2100) and Hospital B loses 2% (16/800). Obviously, Hospital B is the better choice. Hospital AHospital B Died6316 Survived2037784 Hospital AHospital B Died6316 Survived2037784 Total2100800

16 Simpson’s Paradox Not all surgery cases are equally serious. –Patients classified as “poor” or “good” condition before surgery Good condition Poor condition Only 1% of “good” (6/600) died in Hospital A, while 1.3% (8/600) in B. And, 3.8% of “poor” (57/1500) died in A, while 4% (8/200) died in B. Which is really better? Hospital AHospital B Died68 Survived594592 Total600 Hospital AHospital B Died578 Survived1443192 Total1500200

17 Simpson’s Paradox How can A do better in each group, yet do worse overall? –Original table was misleading because it did not show original patient condition –Hospital A took on more patients in poor condition and thus had the overall higher death rate

18 Simpson’s Paradox Simpson’s Paradox: –The reversal of the direction of a comparison or an association when data from several groups are combined to form a single group. –We are dealing with lurking categorical variables.


Download ppt "Unit 3 Relations in Categorical Data. Looking at Categorical Data Grouping values of quantitative data into specific classes We use counts or percents."

Similar presentations


Ads by Google