Download presentation
Presentation is loading. Please wait.
1
1.1: Analyzing Categorical Data
2
Section 1.1 Analyzing Categorical Data
After this section, you should be able to… CONSTRUCT and INTERPRET bar graphs and pie charts RECOGNIZE “good” and “bad” graphs CONSTRUCT and INTERPRET two-way tables DESCRIBE relationships between two categorical variables ORGANIZE statistical problems
3
Analyzing Categorical Data
Categorical Variables place individuals into one of several groups or categories The values of a categorical variable are labels for the different categories The distribution of a categorical variable lists the count or percent of individuals who fall into each category. Analyzing Categorical Data Frequency Table Format Count of Stations Adult Contemporary 1556 Adult Standards 1196 Contemporary Hit 569 Country 2066 News/Talk 2179 Oldies 1060 Religious 2014 Rock 869 Spanish Language 750 Other Formats 1579 Total 13838 Relative Frequency Table Format Percent of Stations Adult Contemporary 11.2 Adult Standards 8.6 Contemporary Hit 4.1 Country 14.9 News/Talk 15.7 Oldies 7.7 Religious 14.6 Rock 6.3 Spanish Language 5.4 Other Formats 11.4 Total 99.9 Variable Count Percent Values
4
Distribution & Categorical Variables
The distribution of a categorical variable lists the count or percent of individuals who fall into each category. Favorite Course Count English 8 Foreign Language 4 Histroy 11 Math 15 Science 12 Favorite Course Percentage English 16% Foreign Language 8% Histroy 22% Math 30% Science 24%
5
Displaying Categorical Data
Frequency tables can be difficult to read. Sometimes it is easier to analyze a distribution by displaying it with a bar graph or pie chart.
6
Do you listen to an MP3 Player?
Could you make a pie graph from this data? Do you listen to an MP3 Player?
7
Displaying Categorical Variables
Pie Charts NOT all categorical data can be displayed as a pie chart!
8
Graphs: Good and Bad Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. Our eyes react to the area of the bars as well as height. Be sure to make your bars equally wide. Avoid the temptation to replace the bars with pictures for greater appeal…this can be misleading! This ad for DIRECTV has multiple problems. How many can you point out? Alternate Example: The following ad for DIRECTV has multiple problems. See how many your students can point out. First, the heights of the bars are not accurate. According to the graph, the difference between 81 and 95 is much greater than the difference between 56 and 81. Also, the extra width for the DIRECTV bar is deceptive since our eyes respond to the area, not just the height.
9
Bad Graphs with counts
11
Two-Way Tables Two-Way Tables: describe two categorical variables, organizing counts according to a row variable and a column variable. When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables. Member of No Clubs Member of One Club Member of 2 or More Clubs Total Rides the School Bus 55 33 20 108 Does not Ride Bus 16 44 82 142 71 77 102 250 What are the variables described by this two-way table? How many students were surveyed? Alternate Example: Super Powers A sample of 200 children from the United Kingdom ages 9-17 was selected from the CensusAtSchool website ( The gender of each student was recorded along with which super power they would most like to have: invisibility, super strength, telepathy (ability to read minds), ability to fly, or ability to freeze time. Here are the results:
12
Analyzing Categorical Data
Two-Way Tables and Marginal Distributions Analyzing Categorical Data Definition: The Marginal Distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. To examine a marginal distribution, Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. Make a graph to display the marginal distribution.
13
Analyzing Categorical Data
Two-Way Tables and Marginal Distributions Analyzing Categorical Data Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 2367 2459 4826 Examine the marginal distribution of chance of getting rich. Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4%
14
Analyzing Categorical Data
Relationships Between Categorical Variables Marginal distributions tell us nothing about the relationship between two variables. Analyzing Categorical Data Definition: A Conditional Distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. To examine or compare conditional distributions, Select the row(s) or column(s) of interest. Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). Make a graph to display the conditional distribution. Use a side-by-side bar graph or segmented bar graph to compare distributions.
15
Analyzing Categorical Data
Two-Way Tables and Conditional Distributions Analyzing Categorical Data Young adults by gender and chance of getting rich Female Male Total Almost no chance 96 98 194 Some chance, but probably not 426 286 712 A chance 696 720 1416 A good chance 663 758 1421 Almost certain 486 597 1083 2367 2459 4826 Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Response Male Almost no chance 98/2459 = 4.0% Some chance 286/2459 = 11.6% A chance 720/2459 = 29.3% A good chance 758/2459 = 30.8% Almost certain 597/2459 = 24.3% Female 96/2367 = 4.1% 426/2367 = 18.0% 696/2367 = 29.4% 663/2367 = 28.0% 486/2367 = 20.5%
16
Similar Sounding Questions
Percentage of those who were both in first class and survived. The percentage of those who survived among those who were in first class. The percentage of those who were in first class among those who survived. Pay attention to the “among who” implicitly defined by the phrase.
17
What proportion of students that ride the school bus are members of two or more clubs?
What proportion of students that are members of no clubs do not ride the school bus? What proportion of students that do not ride the school bus are members of at least one club? Member of No Clubs Member of One Club Member of 2 or More Clubs Total Rides the School Bus 55 33 20 108 Does not Ride Bus 16 44 82 142 71 77 102 250
18
What proportion of males have “a good chance” at being rich?
What proportion of females have a “50-50 chance” at being rich? What proportion of young adults that have an “almost certain” chance of being rich are male?
19
Comparing Categorical Distributions
Sophomore Junior Senior Total One 4 Two 1 3 12 16 Three 7 6 17 Four 8 19 Five 2 5 14 33 61
20
Comparing Categorical Distributions
21
Comparing Categorical Distributions
22
Analyzing Categorical Data
Organizing a Statistical Problem As you learn more about statistics, you will be asked to solve more complex problems. Here is a four-step process you can follow. Analyzing Categorical Data How to Organize a Statistical Problem: A Four-Step Process State: What’s the question that you’re trying to answer? Plan: How will you go about answering the question? What statistical techniques does this problem call for? Do: Make graphs and carry out needed calculations. Conclude: Give your practical conclusion in the setting of the real-world problem.
23
Writing to Compare Categorical Distributions
Cite specific numerical values/proportions. Use comparison words. Greater, smaller, less, while only, more, wider, narrower, fewer, etc. Use transition words However, whereas, similarly, additionally, etc. Discuss at least two points of comparison.
24
Comparing Categorical Distributions
Is there an association between after-school club participation and whether or not the student rides the school bus? Support your answer with a discussion of the provided graphs.
25
Comparing Categorical Distributions
Sample Answer: Yes, there is a clear association between after-school club participation and transportation. Only 11% of students who don’t ride the bus do not participate in after school clubs, whereas 51% of students who do ride the bus do not participate. Similarly, 58% of students who do not ride the bus are involved in 2 or more clubs, while only 19% of students riding the bus are involved in 2 or more clubs. However, the proportion of students who participate in one club is the same for students who ride and students who don’t ride the bus.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.