4.2 Relationships between Categorical Variables and Simpson’s Paradox
Categorical Variables two-way table—describes 2 categorical variables; each entry can only appear once Instagram Twitter Snapchat Other Total Boys Girls Sex and Social Media are the 2 categorical variables
Which is a two way table? Died TOTAL Hospital A 30 100 Hospital B 38 68 200 Hospital vs Result of Life Gender vs Type of Car
Marginal Distribution describes only one of the categorical variables (look at MARGINS) % are better for marginal distributions Instagram Twitter Snapchat Other Total Boys Girls Ex: Percent on social media
Conditional Distributions finding percents for just one row or a column Instagram Twitter Snapchat Other Total Boys Girls Percent of Boys on: Instagram/Twitter/Snapchat/Other
Categorical Variables 1. What % of girls like Instagram? 2. What % of Twitter fans are boys? 3. What % of boys are Twitter fans?
Simpson’s Paradox 1st Half 2nd Half Total Player A 4/10 .400 25/100 .250 29/110 .264 Player B 35/100 .350 2/10 .200 37/110 .337 Simpson’s Paradox—an association or comparison that holds for all or several groups can reverse direction when combined to form a single group
Simpson’s Paradox LP RP Avg Andre Dawson .346 .283 .299 Lee Lacy .336 .266 .302 Dawson had a higher average against both left handed pitchers and right handed pitchers, but together had a lower average for the year. This is another example of Simpson’s Paradox
Exit Slip Create a side by side bar graph of the two way table “Sex vs Social Media”
HW: pg 301 #29; pg 303 #36-38