Download presentation
Presentation is loading. Please wait.
1
AP Statistics Chapter 3 Part 2
Displaying and Describing Categorical Data
2
Learning Objectives Rubric: Level 1 – Know the objectives. Level 2 – Fully understand the objectives. Level 3 – Use the objectives to solve simple problems. Level 4 – Use the objectives to solve more advanced problems. Level 5 – Adapts and applies the objectives to different and more complex problems. Summarize the distribution of a categorical variable with a frequency table. Display the distribution of a categorical variable with a bar chart or pie chart. Recognize misleading statistics. Know how to make and examine a contingency table. Be able to make and examine a segmented bar chart of the conditional distribution of variable for two or more categories.
3
Learning Objectives Describe the distribution of a categorical variable in terms of its possible values and relative frequency. Understand how to examine the association (independence or dependence) between categorical variables by comparing conditional and marginal percentages. Know what Simpson’s paradox is and be able to recognize when it occurs.
4
Learning Objective 4: Contingency Table
We have already looked at how to summarize one categorical variable using a frequency or relative frequency table When we are interested in looking at a possible relationship between two variables we organize data into a two-way table called a contingency table Gender After High School Plans 4 Year College 2 Year College Enlist Total Female 4 2 5 11 Male 1 7 8 3 18
5
Learning Objective 4: Association
The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association. An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.
6
Learning Objective 4: Contingency Table
A contingency table or two-way table: Displays two categorical variables. The rows list the categories of one variable. The columns list the categories of the other variable. Entries in the table are frequencies.
7
Learning Objective 4: Contingency Table
The table below presents Census Bureau data describing the age and sex of college students. This is a two-way table because it describes two categorical variables. (Age is a categorical here because the students are grouped into age categories.) Age group is the row variable because each row in the table describes students in one age group. Sex is the column variable because each column describes one sex. The entries in the table are the counts of students in each age-by-sex class.
8
Learning Objective 4: Contingency Table
Discrepancies may appear in tabular data. For example, the sum of entries in the “25 to 34” row is 1, ,589 = 3,493. The entry in the total column is 3,494. The explanation is rounding error.
9
Learning Objective 4: Marginal Distribution
To best grasp the information contained in the table, first look at the distribution of each variable separately. The distributions of sex alone and age alone are called marginal distributions because they appear at the right and bottom margins of the two-way table. The distribution of a categorical variable says how often each outcome occurred. Usually it is advantageous to look at percents as opposed to counts.
10
Learning Objective 4: Calculating Marginal Distributions
When we do a marginal distribution, we only look at totals (the values found on the right margin or bottom margin) To obtain the marginal distributions, divide the column or row totals by the grand or table totals. This is usually expressed as a percentage. Age Group Education 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230
11
Learning Objective 4: Calculating Marginal Distributions - Example
Calculate the marginal distributions for Education (the row categorical variable). Divide each row total by the table total. Education, by Age Group, (thousand of persons) 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230 Education Distribution Did not complete HS Completed HS 1 to 3 years of college 4+ years of college 15.9% 33.1% 25.4% 25.6%
12
Learning Objective 4: Displaying Marginal Distributions - Example
Each marginal distribution from a two-way table is a distribution for a single categorical variable. We could use a pie graph or bar graph to display such a distribution. Education Distribution Did not complete HS Completed HS 1 to 3 years of college 4+ years of college 15.9% 33.1% 25.4% 25.6%
13
Learning Objective 4: Conditional Distribution
Marginal distributions tell us nothing about the relationship between two categorical variables. To examine the relationship between two categorical variables we look at the conditional distributions. A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. Education, by Age Group, (thousand of persons) 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230
14
Learning Objective 4: Calculating Conditional Distributions
The “conditional” part is worded like: “on the condition the respondents are 35 to 54” “among those who have completed high school but did not go to college” “for those respondents over 55 years of age” 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230
15
Learning Objective 4: Calculating Conditional Distributions
When we look at conditional distributions, we are restricted to a particular column or row (but not “margins”) In conditional distributions, we divide by “Total” of the column or row. 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230
16
Learning Objective 4: Calculating Conditional Distributions - Example
Calculate the conditional distributions for whose persons who have completed HS Divide each cell value in the row “Completed HS” by the total for the row. Education, by Age Group, (thousand of persons) 25 to 34 35 to 54 55+ Total Did not complete HS 4,474 9,155 14,224 27,853 Completed HS 11,546 26,481 20,060 58,087 1 to 3 years of college 10,700 22,618 11,127 44,445 4+ years of college 11,066 23,183 10,596 44,845 37,786 81,435 56,008 175,230 25 to 34 35 to 54 55+ Completed HS 19.9% 45.6% 34.5%
17
Learning Objective 4: Displaying Conditional Distributions - Example
Each row category and column category give a different conditional distribution. We can use a pie graph or bar graph to display these a conditional distributions. 25 to 34 35 to 54 55+ Completed HS 19.9% 45.6% 34.5%
18
Learning Objective 4: Displaying Conditional Distributions
Use side by side bar charts can be used to show conditional proportions. Allows for easy comparison of the row variable with respect to the column variable.
19
Learning Objective 4: Displaying Conditional Distributions
Does background music in supermarkets influence customer purchasing decisions? For every two-way table, there are two sets of possible conditional distributions. Wine purchased for each kind of music played (column conditionals) Music played for each kind of wine purchased (row conditionals)
20
Learning Objective 4: Contingency Table - Review
Income Job Satisfaction Row Total < 30K 30K-50K 50K-80K > 80K C. Total Conditional distribution Marginal distribution Table total This is a Contingency table with Income Level as the Row Variable and Job Satisfaction as the Column Variable. The distributions of income to job satisfaction or job satisfaction to income are called Conditional Distributions. The distributions of income alone and job satisfaction alone are called Marginal Distributions. Relationships between categorical variables are described by calculating appropriate percents from the counts given in each cell.
21
Learning Objective 4: Contingency Table – Your Turn
Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data (due to rounding of the percentages we can’t recover the exact counts – round to whole numbers).
22
Learning Objective 4: Contingency Table - Your Turn:
The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients survived? 390 / 1000 = 39% 320 / 1000 = 32% 710 / 1000 = 71% 290 / 1000 = 29%
23
Learning Objective 4: Contingency Table - Your Turn:
The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients at Clinic A survived? 390 / 1000 = 39% 390 / 710 = 55% 710 / 1000 = 71% 390 / 600 = 65%
24
Learning Objective 4: Contingency Table - Your Turn:
The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients who survived were treated at Clinic B? 320 / 1000 = 32% 320 / 400 = 80% 320 / 710 = 45% 710 / 1000 = 71%
25
Learning Objective 4: Contingency Table - Your Turn:
The following two-way table summarizes the number of single and married students in a basic statistics course who like watching professional football. The percentage of students in this class who are married is considered A marginal percentage A conditional percentage Something else
26
Learning Objective 4: Contingency Table - Your Turn:
The following two-way table summarizes the number of single and married students in a basic statistics course who like watching professional football. The percentage of married students in this class who like football is considered A marginal percentage A conditional percentage Something else
27
Calculating Marginal and Conditional Distributions - Problem
Learning Objective 4: Calculating Marginal and Conditional Distributions - Problem Find each percentage and state whether it is a marginal or conditional distribution. What percent of the seniors are white? b) What percent of the seniors are planning to attend a 2-year college? c) What percent of the seniors are white and planning to attend a 2-year college? d) What percent of the white seniors are planning to attend a 2-year college? e) What percent of the seniors planning to attend a 2-year college are white? Seniors White Minority Total 4-year college 198 44 242 2-year college 36 6 42 Enlist 4 1 5 Employment 14 3 17 Other 16 19 268 57 325 268/ 325 x 100% ≈ 82.5% Marginal Plans 42/325 x 100% ≈ 12.9% Marginal 36/325 x 100% ≈ 11.1% Neither 36/268 x 100% ≈ 13.4% Conditional 36/42 x 100% ≈ 85.7% Conditional
28
Calculating Marginal and Conditional Distributions – Your Turn
Learning Objective 4: Calculating Marginal and Conditional Distributions – Your Turn An article in the Winter 2003 issue of Chance magazine reported on the Houston Independent School District’s magnet schools programs. The Find each percentage and state whether it is a marginal or conditional distribution. What percent of all applicants were Asian? b) What percent of the students accepted were Asian? c) What percent of Asians were accepted? d) What percent of all students were accepted?
29
Learning Objective 5: Segmented Bar Charts
A segmented bar chart displays conditional distributions the same as a pie chart, but in the form of bars instead of circles. Each bar is treated as the “whole” and is divided proportionally into segments corresponding to the percentage in each group of the conditional distribution.
30
Learning Objective 5: Segmented Bar Charts
Contingency table of ticket class vs. survival on the Titanic Conditional distributions of surviving the Titanic Conditional distributions of dying on the Titanic
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.