Download presentation
Presentation is loading. Please wait.
Published byAustin Jordan Modified over 8 years ago
1
AP Statistics Chapter 3 Part 2 Displaying and Describing Categorical Data
2
Learning Objectives 1.Summarize the distribution of a categorical variable with a frequency table. 2.Display the distribution of a categorical variable with a bar chart or pie chart. 3.Recognize misleading statistics. 4.Know how to make and examine a contingency table. 5.Be able to make and examine a segmented bar chart of the conditional distribution of variable for two or more categories. Rubric: Level 1 – Know the objectives. Level 2 – Fully understand the objectives. Level 3 – Use the objectives to solve simple problems. Level 4 – Use the objectives to solve more advanced problems. Level 5 – Adapts and applies the objectives to different and more complex problems. 2
3
Learning Objectives 6.Describe the distribution of a categorical variable in terms of its possible values and relative frequency. 7.Understand how to examine the association (independence or dependence) between categorical variables by comparing conditional and marginal percentages. 8.Know what Simpson’s paradox is and be able to recognize when it occurs.
4
Income Job Satisfaction Row Total 1 2 3 4 < 30K 20 24 80 82 206 30K-50K 22 38 104 125 289 50K-80K 13 28 81 113 235 > 80K 7 18 54 92 171 C. Total 62 108 319 412 901 This is a Contingency table with Income Level as the Row Variable and Job Satisfaction as the Column Variable. The distributions of income to job satisfaction or job satisfaction to income are called Conditional Distributions. The distributions of income alone and job satisfaction alone are called Marginal Distributions. Relationships between categorical variables are described by calculating appropriate percents from the counts given in each cell. Conditional distribution Marginal distribution Table total Learning Objective 4: Contingency Table - Review
5
Learning Objective 4: Contingency Table – Your Turn Many kidney dialysis patients get vitamin D injections to correct for a lack of calcium. Two forms of vitamin D injections are used: calcitriol and paricalcitol. The records of 67,000 dialysis patients were examined, and half received one drug; the other half the other drug. After three years, 58.7% of those getting paricalcitol had survived, while only 51.5% of those getting calcitriol had survived. Construct an approximate two-way table of the data (due to rounding of the percentages we can’t recover the exact counts – round to whole numbers).
6
Learning Objective 4: Contingency Table - Your Turn: The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients survived? a)390 / 1000 = 39% b)320 / 1000 = 32% c)710 / 1000 = 71% d)290 / 1000 = 29%
7
Learning Objective 4: Contingency Table - Your Turn: The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients at Clinic A survived? a)390 / 1000 = 39% b)390 / 710 = 55% c)710 / 1000 = 71% d)390 / 600 = 65%
8
Learning Objective 4: Contingency Table - Your Turn: The following two-way table summarizes the number of cancer patients treated at two cancer clinics who died or survived. What percentage of the cancer patients who survived were treated at Clinic B? a)320 / 1000 = 32% b)320 / 400 = 80% c)320 / 710 = 45% d)710 / 1000 = 71%
9
Learning Objective 4: Contingency Table - Your Turn: The following two-way table summarizes the number of single and married students in a basic statistics course who like watching professional football. The percentage of students in this class who are married is considered a)A marginal percentage b)A conditional percentage c)Something else
10
Learning Objective 4: Contingency Table - Your Turn: The following two-way table summarizes the number of single and married students in a basic statistics course who like watching professional football. The percentage of married students in this class who like football is considered a)A marginal percentage b)A conditional percentage c)Something else
11
Learning Objective 4: Calculating Marginal and Conditional Distributions - Problem Find each percentage and state whether it is a marginal or conditional distribution. a)What percent of the seniors are white? b) What percent of the seniors are planning to attend a 2-year college? c) What percent of the seniors are white and planning to attend a 2-year college? d) What percent of the white seniors are planning to attend a 2-year college? e) What percent of the seniors planning to attend a 2- year college are white? Seniors WhiteMinorityTotal 4-year college 19844242 2-year college 36642 Enlist415 Employment14317 Other16319 Total26857325 Plans 268/ 325 x 100% ≈ 82.5% Marginal 42/325 x 100% ≈ 12.9% Marginal 36/325 x 100% ≈ 11.1% Neither 36/268 x 100% ≈ 13.4% Conditional 36/42 x 100% ≈ 85.7% Conditional
12
An article in the Winter 2003 issue of Chance magazine reported on the Houston Independent School District’s magnet schools programs. The Find each percentage and state whether it is a marginal or conditional distribution. a)What percent of all applicants were Asian? b) What percent of the students accepted were Asian? c) What percent of Asians were accepted? d) What percent of all students were accepted? Learning Objective 4: Calculating Marginal and Conditional Distributions – Your Turn
13
Learning Objective 5: Segmented Bar Charts A segmented bar chart displays conditional distributions the same as a pie chart, but in the form of bars instead of circles. Each bar is treated as the “whole” and is divided proportionally into segments corresponding to the percentage in each group of the conditional distribution.
14
Learning Objective 5: Segmented Bar Charts Contingency table of ticket class vs. survival on the Titanic Conditional distributions of surviving the Titanic Conditional distributions of dying on the Titanic
15
Learning Objective 6: Describing Categorical Distributions To describe a marginal distribution, 1)Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. 2)Make a graph to display the marginal distribution. 3)Comment on and compare the heights/percentages of the different categories.
16
Young adults by gender and chance of getting rich FemaleMaleTotal Almost no chance9698194 Some chance, but probably not426286712 A 50-50 chance6967201416 A good chance6637581421 Almost certain4865971083 Total236724594826 ResponsePercent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A 50-50 chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4% Describe the marginal distribution of chance of getting rich. Learning Objective 6: Describing Categorical Distributions A good change and a 50-50 chance are the highest percentage at about 29%, and almost no chance the lowest at 4%. With some chance 15% and almost certain 22%.
17
Learning Objective 6: Describing Categorical Distributions To describe or compare conditional distributions, 1)Select the row(s) or column(s) of interest. 2)Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3)Make a graph to display the conditional distribution. Use a side-by-side bar graph or segmented bar graph to compare distributions. 4)Comment on and compare the heights/percentages of the different categories.
18
Young adults by gender and chance of getting rich FemaleMaleTotal Almost no chance9698194 Some chance, but probably not426286712 A 50-50 chance6967201416 A good chance6637581421 Almost certain4865971083 Total236724594826 ResponseMale Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A 50-50 chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Calculate the conditional distribution of opinion among males. Describe the relationship between gender and opinion. Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5% Learning Objective 6: Describing Categorical Distributions
19
Learning Objective 7: Association Between Categorical Variables One type of relationship between categorical variables is an association. Definition: We say that there is an association between two variables if specific values of one variable tend to occur in common with specific values of another. Two categorical variables are associated if knowing the value of one variable helps you predict the value of the other variable. If two categorical variables are associated, then they are dependent. If two categorical variables are not associated, then they are independent.
20
Learning Objective 7: Association Between Categorical Variables To examine data for an association from a frequency table, 1)Select the row(s) or column(s) of interest (the condition rows or columns). 2)Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3)Determine whether a specific value of one variable tends to occur in common with a specific value of another. If all the percentages are approximately the same there is no association and the variables are independent. If the percentages are different there is an association and the variables are dependent.
21
For a period of five years, physicians at McGill University Health Center followed more than 5000 adults over the age of 50. The researchers were investigating whether people taking a certain class of antidepressants (SSRIs) might be at greater risk of bone fractures. Their observations are summarized in the table: Learning Objective 7: Checking for Independence Between Variables Do these results suggest there’s an association between experiencing bone fractures and the taking of SSRI antidepressants (is experiencing bone fractures independent of taking antidepressants)? Explain. Taking SSRI No SSRITotal Experienced Fractures 14244258 No Fractures 12346274750 Total 13748715008
22
To determine if there is an association between experiencing bone fractures and the taking of SSRI antidepressants, look at the conditional distribution of experiencing bone fractures or no fractures depend on (or conditional on) SSRI group (columns). Taking SSRI No SSRITotal Experienced Fractures 14244258 No Fractures12346274750 Total13748715008 Learning Objective 7: Checking for Independence Between Variables 1) Select the row(s) or column(s) of interest (the condition rows or columns).
23
Learning Objective 7: Checking for Independence Between Variables Taking SSRI No SSRITotal Experienced Fractures No Fractures Total 10.2% 89.8% 100% 05% 95% 100% 05.2% 94.8% 100% 2) Use the data in the table to calculate the conditional distribution (in percents) of the columns. Taking SSRI No SSRITotal Experienced Fractures 14244258 No Fractures12346274750 Total13748715008
24
Learning Objective 7: Checking for Independence Between Variables 3)Determine whether a specific value of one variable tends to occur in common with a specific value of another. If all the percentages are approximately the same there is no association and the variables are independent. If the percentages are different there is an association and the variables are dependent. Taking SSRI No SSRITotal Experienced Fractures 10.2%05%05.2% No Fractures89.8%95%94.8% Total100% There appears to be an association between experiencing bone fractures and the taking of SSRI antidepressants ( they are dependent – not independent). Overall, approximately 5% of the respondents experienced fractures, while respondents taking SSRI experienced twice that amount at 10%. And overall 95% had no fractures, while whose taking SSRI only 90% had no fractures.
25
Learning Objective 7: Association Between Categorical Variables To examine data for an association from graphs. 1)Select the row(s) or column(s) of interest (the condition rows or columns). 2)Use the data in the table to make pie graphs of the conditional distributions or a segmented bar graph of the conditional distributions. 3)If the sectors (of the pie graphs) or the segments (of the bars) are approximately the same size, then the variables are not associated (independent). If the sectors (of the pie graphs) or the segments (of the bars) are not the same size, then the variables are associated (dependent).
26
Learning Objective 7: Association Graphs Association – Dependent (corresponding are sectors between pie graphs are different sizes for females and males).
27
Learning Objective 7: Association Graphs No association – Independent (corresponding segments between bars approximately the same size for both males and females).
28
Learning Objective 7: Checking for Independence Between Variables The contingency table shows the relationship between class of ticket and surviving the sinking of the Titanic. Is there an association between ticket class and surviving the Titanic (are ticket class and survival dependent or independent)?
29
Learning Objective 7: Checking for Independence Between Variables 1)Select the row(s) or column(s) of interest (the condition rows or columns). Is there an association between ticket class and surviving the Titanic? The row variable, survival, is the condition.
30
Learning Objective 7: Checking for Independence Between Variables 2)Use the data in the table to make pie graphs of the conditional distributions or a segmented bar graph of the conditional distributions. Survival is the condition, so construct segmented bar graphs or pie graphs of the categories alive and dead.
31
Learning Objective 7: Checking for Independence Between Variables 3)If the sectors (of the pie graphs) or the segments (of the bars) are approximately the same size, then the variables are not associated (independent). If the sectors (of the pie graphs) or the segments (of the bars) are not the same size, then the variables are associated (dependent). In this case the sectors or segments for corresponding categories are not approximately the same size, class and survival are dependent. There is an association between the variables ticket class and survival.
32
Learning Objective 7: Checking for Independence Between Variables – Class Problem Examine the table below about ethnicity and acceptance for the Houston Independent School District’s magnet schools program. Does it appear that the admissions decisions are made independent of the applicant’s ethnicity?
33
Learning Objective 8: Simpson’s Paradox A paradox is “a statement that is seemingly contradictory or opposed to common sense and yet is perhaps true”. Discovered by E. H. Simpson in 1951. Occurs when Averaging one variable across different levels of a second variable. Two groups from one sample are compared to two similar groups from another sample. Not E. H. Simpson
34
Learning Objective 8: Simpson’s Paradox One sample’s success rate for both groups is higher than the success rates for the other sample’s two groups. However, when both groups’ respective success rates are combined, the sample with the lower success rate ends up with the better overall proportion of successes. Thus, the paradox. One sample group usually has a considerably smaller number of members than the other groups. Simpson’s Paradox does not occur in samples with similar sizes.
35
Learning Objective 8: Simpson’s Paradox What is Simpson’s Paradox? Simpson’s Paradox occurs when an association between two variables is reversed upon observing a third variable. Simpson’s paradox, the third or lurking variable creates a reversal in the direction of an association (“confounding”). To uncover Simpson’s Paradox, divide data into subgroups based on the lurking variable.
36
Recent Cleveland Indians season records 2003—68-94, 42.0% winning percentage 2004—80-82, 49.4% winning percentage Two-season record: 148-176, 45.7% win percentage Recent Minnesota Twins season records 2003—90-72, 55.6% win percentage 2004—92-70, 56.8% win percentage Two-season record: 182-142, 56.2% win percentage Notice that the Twins had a higher percentage in both 2003 and 2004, as well as in the two-year period. Not Simpson’s Paradox. Learning Objective 8: Simpson’s Paradox
37
Ronnie Belliard 2002—61/289,.211 of his at-bats were hits 2003—124/447,.277 of his at-bats were hits Two-season average: 185/736, hits.2514 of the time Casey Blake 2002—4/20,.200 of his at-bats were hits 2003—143/557,.257 of his at-bats were hits Two-season average: 147/577, hits.2548 of the time The two season batting avg. for Belliard was lower than Blake’s, but divided into separate seasons, Belliard’s had a higher batting avg. both seasons. This is Simpson’s Paradox. Learning Objective 8: Simpson’s Paradox – At Work
38
Discrimination? Consider college acceptance rates by sex. Accepted Not accepted Total Men198162360 Women88112200 Total286274560 198 of 360 (55%) of men accepted 88 of 200 (44%) of women accepted Is there a sex bias? Learning Objective 8: Simpson’s Paradox – At Work
39
Or is there a lurking variable that explains the association? To evaluate this, split applications according to the lurking variable "major applied to” Business School (240 applicants) Art School (320 applicants)
40
18 of 120 men (15%) of men were accepted to B-school. 24 of 120 (20%) of women were accepted to B-school. A higher percentage of women were accepted. BUSINESS SCHOOL Accepted Not accepted Total Men18102120 Women2496120 Total42198240 Learning Objective 8: Simpson’s Paradox – At Work
41
ART SCHOOL 180 of 240 men (75%) of men were accepted. 64 of 80 (80%) of women were accepted. A higher percentage of women were accepted. Accepted Not accepted Total Men18060240 Women641680 Total24476320 Learning Objective 8: Simpson’s Paradox – At Work
42
Within each school, a higher percentage of women were accepted than men. No discrimination against women. Possible discrimination against men. This is an example of Simpson’s Paradox. When the lurking variable (School applied to) was ignored, the data suggest discrimination against women. When the School applied to was considered, the association is reversed.
43
Learning Objective 8: Simpson’s Paradox – The Paradox What’s true for the parts isn’t true for the whole.
44
Learning Objective 8: Simpson’s Paradox – CONCLUSION!!!! Simpson’s paradox is a rare phenomenon! It does not occur often! Thus statisticians must be trained academically & ethically well enough to make sure that if it has occurred they will detect and correct it. This is where practice, critical thinking skills, and repetition come into play!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.