Download presentation
Presentation is loading. Please wait.
1
AP Statistics Chapter 3 Part 3
Displaying and Describing Categorical Data
2
Learning Objectives Rubric: Level 1 – Know the objectives. Level 2 – Fully understand the objectives. Level 3 – Use the objectives to solve simple problems. Level 4 – Use the objectives to solve more advanced problems. Level 5 – Adapts and applies the objectives to different and more complex problems. Summarize the distribution of a categorical variable with a frequency table. Display the distribution of a categorical variable with a bar chart or pie chart. Recognize misleading statistics. Know how to make and examine a contingency table. Be able to make and examine a segmented bar chart of the conditional distribution of variable for two or more categories.
3
Learning Objectives Describe the distribution of a categorical variable in terms of its possible values and relative frequency. Understand how to examine the association (independence or dependence) between categorical variables by comparing conditional and marginal percentages. Know what Simpson’s paradox is and be able to recognize when it occurs.
4
Learning Objective 6: Describing Categorical Distributions
To describe a marginal distribution, Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. Make a graph to display the marginal distribution. Comment on and compare the heights/percentages of the different categories.
5
Learning Objective 6: Describing Categorical Distributions
Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 2367 2459 4826 Describe the marginal distribution of chance of getting rich. A good change and a chance are the highest percentage at about 29%, and almost no chance the lowest at 4%. With some chance 15% and almost certain 22%. Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%
6
Learning Objective 6: Describing Categorical Distributions
To describe or compare conditional distributions, Select the row(s) or column(s) of interest. Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). Make a graph to display the conditional distribution. Use a side-by-side bar graph or segmented bar graph to compare distributions. Comment on and compare the heights/percentages of the different categories.
7
Learning Objective 6: Describing Categorical Distributions
Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 2367 2459 4826 Calculate the conditional distribution of opinion among males. Describe the relationship between gender and opinion. Response Male Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5%
8
Learning Objective 7: Association Between Categorical Variables
One type of relationship between categorical variables is an association. Definition: We say that there is an association between two variables if specific values of one variable tend to occur in common with specific values of another. Two categorical variables are associated if knowing the value of one variable helps you predict the value of the other variable. If two categorical variables are associated, then they are dependent. If two categorical variables are not associated, then they are independent.
9
Learning Objective 7: Association Between Categorical Variables
To examine data for an association from a frequency table, Select the row(s) or column(s) of interest (the condition rows or columns). Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). Determine whether a specific value of one variable tends to occur in common with a specific value of another. If all the percentages are approximately the same there is no association and the variables are independent. If the percentages are different there is an association and the variables are dependent.
10
Experienced Fractures
Learning Objective 7: Checking for Independence Between Variables For a period of five years, physicians at McGill University Health Center followed more than 5000 adults over the age of 50. The researchers were investigating whether people taking a certain class of antidepressants (SSRIs) might be at greater risk of bone fractures. Their observations are summarized in the table: Taking SSRI No SSRI Total Experienced Fractures 14 244 258 No Fractures 123 4627 4750 137 4871 5008 Do these results suggest there’s an association between experiencing bone fractures and the taking of SSRI antidepressants (is experiencing bone fractures independent of taking antidepressants)? Explain.
11
Experienced Fractures
Learning Objective 7: Checking for Independence Between Variables 1) Select the row(s) or column(s) of interest (the condition rows or columns). To determine if there is an association between experiencing bone fractures and the taking of SSRI antidepressants, look at the conditional distribution of experiencing bone fractures or no fractures depend on (or conditional on) SSRI group (columns). Taking SSRI No SSRI Total Experienced Fractures 14 244 258 No Fractures 123 4627 4750 137 4871 5008
12
Experienced Fractures
Learning Objective 7: Checking for Independence Between Variables 2) Use the data in the table to calculate the conditional distribution (in percents) of the columns. Taking SSRI No SSRI Total Experienced Fractures 14 244 258 No Fractures 123 4627 4750 137 4871 5008 Taking SSRI No SSRI Total Experienced Fractures No Fractures 10.2% 05% 05.2% 89.8% 95% 94.8% 100% 100% 100%
13
Learning Objective 7: Checking for Independence Between Variables
Determine whether a specific value of one variable tends to occur in common with a specific value of another. If all the percentages are approximately the same there is no association and the variables are independent. If the percentages are different there is an association and the variables are dependent. Taking SSRI No SSRI Total Experienced Fractures 10.2% 05% 05.2% No Fractures 89.8% 95% 94.8% 100% There appears to be an association between experiencing bone fractures and the taking of SSRI antidepressants ( they are dependent – not independent). Overall, approximately 5% of the respondents experienced fractures, while respondents taking SSRI experienced twice that amount at 10%. And overall 95% had no fractures, while whose taking SSRI only 90% had no fractures.
14
Learning Objective 7: Association Between Categorical Variables
To examine data for an association from graphs. Select the row(s) or column(s) of interest (the condition rows or columns). Use the data in the table to make pie graphs of the conditional distributions or a segmented bar graph of the conditional distributions. If the sectors (of the pie graphs) or the segments (of the bars) are approximately the same size, then the variables are not associated (independent). If the sectors (of the pie graphs) or the segments (of the bars) are not the same size, then the variables are associated (dependent).
15
Learning Objective 7: Association Graphs
Association – Dependent (corresponding are sectors between pie graphs are different sizes for females and males).
16
Learning Objective 7: Association Graphs
No association – Independent (corresponding segments between bars approximately the same size for both males and females).
17
Learning Objective 7: Checking for Independence Between Variables
The contingency table shows the relationship between class of ticket and surviving the sinking of the Titanic. Is there an association between ticket class and surviving the Titanic (are ticket class and survival dependent or independent)?
18
Learning Objective 7: Checking for Independence Between Variables
Select the row(s) or column(s) of interest (the condition rows or columns). Is there an association between ticket class and surviving the Titanic? The row variable, survival, is the condition.
19
Learning Objective 7: Checking for Independence Between Variables
Use the data in the table to make pie graphs of the conditional distributions or a segmented bar graph of the conditional distributions. Survival is the condition, so construct segmented bar graphs or pie graphs of the categories alive and dead.
20
Learning Objective 7: Checking for Independence Between Variables
If the sectors (of the pie graphs) or the segments (of the bars) are approximately the same size, then the variables are not associated (independent). If the sectors (of the pie graphs) or the segments (of the bars) are not the same size, then the variables are associated (dependent). In this case the sectors or segments for corresponding categories are not approximately the same size, class and survival are dependent. There is an association between the variables ticket class and survival.
21
Learning Objective 7: Checking for Independence Between Variables – Class Problem
Examine the table below about ethnicity and acceptance for the Houston Independent School District’s magnet schools program. Does it appear that the admissions decisions are made independent of the applicant’s ethnicity?
22
Learning Objective 8: Simpson’s Paradox
A paradox is “a statement that is seemingly contradictory or opposed to common sense and yet is perhaps true”. Discovered by E. H. Simpson in 1951. Occurs when Averaging one variable across different levels of a second variable. Two groups from one sample are compared to two similar groups from another sample. Not E. H. Simpson
23
Learning Objective 8: Simpson’s Paradox
One sample’s success rate for both groups is higher than the success rates for the other sample’s two groups. However, when both groups’ respective success rates are combined, the sample with the lower success rate ends up with the better overall proportion of successes. Thus, the paradox. One sample group usually has a considerably smaller number of members than the other groups. Simpson’s Paradox does not occur in samples with similar sizes.
24
Learning Objective 8: Simpson’s Paradox
What is Simpson’s Paradox? Simpson’s Paradox occurs when an association between two variables is reversed upon observing a third variable. Simpson’s paradox, the third or lurking variable creates a reversal in the direction of an association (“confounding”). To uncover Simpson’s Paradox, divide data into subgroups based on the lurking variable.
25
Learning Objective 8: Simpson’s Paradox
Recent Cleveland Indians season records 2003—68-94, 42.0% winning percentage 2004—80-82, 49.4% winning percentage Two-season record: , 45.7% win percentage Recent Minnesota Twins season records 2003—90-72, 55.6% win percentage 2004—92-70, 56.8% win percentage Two-season record: , 56.2% win percentage Notice that the Twins had a higher percentage in both 2003 and 2004, as well as in the two-year period. Not Simpson’s Paradox.
26
Learning Objective 8: Simpson’s Paradox – At Work
Ronnie Belliard 2002—61/289, .211 of his at-bats were hits 2003—124/447, .277 of his at-bats were hits Two-season average: 185/736, hits of the time Casey Blake 2002—4/20, .200 of his at-bats were hits 2003—143/557, .257 of his at-bats were hits Two-season average: 147/577, hits of the time The two season batting avg. for Belliard was lower than Blake’s, but divided into separate seasons, Belliard’s had a higher batting avg. both seasons. This is Simpson’s Paradox.
27
Learning Objective 8: Simpson’s Paradox – At Work
Discrimination? Consider college acceptance rates by sex. Accepted Not accepted Total Men 198 162 360 Women 88 112 200 286 274 560 198 of 360 (55%) of men accepted 88 of 200 (44%) of women accepted Is there a sex bias?
28
Learning Objective 8: Simpson’s Paradox – At Work
Or is there a lurking variable that explains the association? To evaluate this, split applications according to the lurking variable "major applied to” Business School (240 applicants) Art School (320 applicants)
29
Learning Objective 8: Simpson’s Paradox – At Work
BUSINESS SCHOOL Accepted Not accepted Total Men 18 102 120 Women 24 96 42 198 240 18 of 120 men (15%) of men were accepted to B-school. 24 of 120 (20%) of women were accepted to B-school. A higher percentage of women were accepted.
30
Learning Objective 8: Simpson’s Paradox – At Work
ART SCHOOL Accepted Not accepted Total Men 180 60 240 Women 64 16 80 244 76 320 180 of 240 men (75%) of men were accepted. 64 of 80 (80%) of women were accepted. A higher percentage of women were accepted.
31
Learning Objective 8: Simpson’s Paradox – At Work
Within each school, a higher percentage of women were accepted than men. No discrimination against women. Possible discrimination against men. This is an example of Simpson’s Paradox. When the lurking variable (School applied to) was ignored, the data suggest discrimination against women. When the School applied to was considered, the association is reversed.
32
Learning Objective 8: Simpson’s Paradox – The Paradox
What’s true for the parts isn’t true for the whole.
33
Learning Objective 8: Simpson’s Paradox – CONCLUSION!!!!
Simpson’s paradox is a rare phenomenon! It does not occur often! Thus statisticians must be trained academically & ethically well enough to make sure that if it has occurred they will detect and correct it. This is where practice, critical thinking skills, and repetition come into play!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.