4.3 Relations in Categorical Data
Use categorical data to calculate marginal and conditional proportions Understand Simpson’s Paradox in context of a problem
two-way table- describes 2 categorical variables row variable- describes people with one level column variable- describes one level of your variable
Marginal Distributions- row total and column totals Conditional Distribution (“GIVEN”)- refers to people who only satisfy a certain condition roundoff error- when the data doesn’t add to 100%
#1- How many students do these data describe? 5375 #2- What percent of these students smoke? 1004/5375= 0.187= 18.7%
#3- Give the marginal distribution of parents’ smoking behavior, both in counts and percents. ParentBothOneNeither %33%42%25%
#4- What percent of the students smoke, given both their parents smoke? 400/1780= 0.22 #5- What percent of neither parents smoke, given their student does not smoke? 1168/4371= 0.27
refers to the reversal of the direction of a comparison or an association when the data from several groups are combined to form a single group.
What percent of patients died in each hospital? Hospital A:Hospital B: Hospital A has a higher death rate Hospital AHospital BTotal Died Survived Total
Good Condition Bad Condition A: 6/600= 1%A: 57/1500=3.8% B: 8/600= 1.333%B: 8/200% = 4% In both cases, Hospital A had a lower death rate…….why????? AB Died68 Survived AB Died578 Survived
In both hospitals, people entering in bad condition had a higher death rate and since the majority of Hospital A entered in bad condition, overall they had a higher death rate.