Displaying Categorical Data THINK SHOW TELL
What is categorical data? Bar, Segmented Bar, and Pie Charts Frequency vs. Relative Frequency Tables/Charts Area Principle Contingency Tables Marginal Distributions Conditional Distributions
What is categorical data? Data that can be separated into “piles”/categories that are not numerically based. Example: Eye color, T-shirt size, type of vehicle, …
Titanic Who: People on Titanic What: Survival status, age, gender, ticket class When: April 14, 1912 Where: North Atlantic How: Variety of resources Why: Historical interest
Distribution of Categorical Data Frequency Table Relative Frequency Table ClassCount First325 Second285 Third706 Crew885 Class% First Second Third Crew COUNTS!PERCENTAGES!
Bar, Segmented Bar, & Pie Charts
Area Principle—How to Lie Purpose of chart is to help see patterns, but a bad chart (violating the area principle) can distort pattern or imply another relationship (really not happening). See page 3-3 in textbook.
Contingency Table LivedDiedTOTAL Crew First Second Third TOTAL Survival Ticket Class
LivedDiedTOTAL Crew First Second Third TOTAL Survival Marginal Distribution Row % Ticket Class
LivedDiedTOTAL Crew First Second Third TOTAL Ticket Class Survival Marginal Distribution Column%
LivedDiedTOTAL Crew First Second Third TOTAL Survival Marginal Distribution Table % Ticket Class
Did chance of surviving depend on ticket class? Let’s look at our marginal distribution with row percentages. Does the distribution of survivors’ ticket class look the same for non-survivors? To do this, we first restrict our attention to the survivors. Conditional distribution ALIVECount% 1 st nd rd Crew Total
Survivors Non-survivors Are Ticket Class and Survival are associated? This was an important part of the movie.
Same data Segmented Bar Chart Conditional distribution based on ticket class.
Same data Segmented Bar Chart Conditional distribution based on survival or not.
Displaying Categorical Data THINK Variable: Identify the variables and report the W’s. Be sure the data are counts and categories do not overlap. SHOW Mechanics: Make an appropriate display. Be sure “bars” are of equal width. TELL Interpretation: Discuss the patterns in the table and displays. Any possible real-world consequences?
Simpson’s Paradox AlfredSuccessesAttempts%Success 1 st half % 2 nd half204050% BubbaSuccessesAttempts%Success 1 st half % 2 nd half2540%
Simpson’s Paradox Total for year SuccessesAttempts%Success Alfred % Bubba % Does it follow that Alfred’s percentage of successes was greater than Bubba’s for the whole year? Aggregate data means to combine data into one group.