Not in FPP Exploratory data analysis with two qualitative variables 1.

Slides:



Advertisements
Similar presentations
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Advertisements

Comparing Two Proportions (p1 vs. p2)
M2 Medical Epidemiology
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 12.
Measures of Impact Dublin June Measures of Impact You want to reduce deaths from road traffic accidents Most impact for least cost Cohort study.
 It is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group, or to a sample-based estimate.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Extension Article by Dr Tim Kenny
Logistic Regression and Odds Ratios
We’re ready to TEST our Research Questions! In science, how do we usually test a hypothesis?
Measures of Disease Association Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between.
Relative and Attributable Risks. Absolute Risk Involves people who contract disease due to an exposure Doesn’t consider those who are sick but haven’t.
Understanding study designs through examples Manish Chaudhary MPH (BPKIHS)
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
Chapter 10 Analyzing the Association Between Categorical Variables
AS 737 Categorical Data Analysis For Multivariate
Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
8.1 Inference for a Single Proportion
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Measures of Association
Objective: Use experimental methods to design our own experiments Homework: Read section 6.1 and 6.2 (for vocab and examples) and then complete exercises:
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Analysis of Categorical Data. Types of Tests o Data in 2 X 2 Tables (covered previously) Comparing two population proportions using independent samples.
Binomial Effect Size Display What is it? How do I prepare it?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Chapter 241 Two-Way Tables and the Chi-square Test.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 12.
5.2 Day 1: Designing Experiments. Period 3 – Seating Chart Front Board AlthisarBarnesCreidlerGreenHollowayMcDonaldOliverRoberts EvansCawthorn e AndersonLavendarJeffreysMcKeelMenaSyed.
The binomial applied: absolute and relative risks, chi-square.
R Programming Risk & Relative Risk 1. Session 2 Overview 1.Risk 2.Relative Risk 3.Percent Increase/Decrease Risk 2.
Displaying Categorical Data THINK SHOW TELL What is categorical data? Bar, Segmented Bar, and Pie Charts Frequency vs. Relative Frequency Tables/Charts.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Lecture 9: Analysis of intervention studies Randomized trial - categorical outcome Measures of risk: –incidence rate of an adverse event (death, etc) It.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
CHAPTER 3- DESIGNING EXPERIMENTS
Comparing 2 Proportions © 2006 W.H. Freeman and Company.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
1 Confidence Intervals for Two Proportions Section 6.1.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Intermediate Applied Statistics STAT 460 Lecture 20, 11/19/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Chapter 5.2 Designing experiments. Terminology The individuals on which the experiment is done are the experiment units. When the units are human beings.
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X = decrease (–) in cholesterol.
4.3 Reading Quiz (second half) 1. In a two way table when looking at education given a person is 55+ we refer to it as ____________ distribution. 2. True.
Objectives Given a contingency table of counts, construct a marginal distribution. Given a contingency table of counts, create a conditional distribution.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.3 Other Ways of Comparing Means and Comparing Proportions.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 2 Displaying and Describing Categorical Data.
Probability and Statistics AMP Institutes & Workshops Saturday, April 4 th, 2015 Trey Cox. Ph. D. Mathematics Faculty Chandler-Gilbert Community College.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Section 8.5 Day 3.
Take-home quiz due! Get out materials for notes!
The binomial applied: absolute and relative risks, chi-square
AP Statistics Chapter 3 Part 3
Chapter 3: Displaying and Describing Categorical Data
Chapter 2 Looking at Data— Relationships
CHAPTER 19: Two-Sample Problems
Testing p1 – p2 (Case I) Example: The effectiveness of Aspirin
Location and Party affiliation
CHAPTER 19: Two-Sample Problems
Displaying and Describing Categorical data
Commute to Work Two-Way Frequency Table Female Male Total
Presentation transcript:

Not in FPP Exploratory data analysis with two qualitative variables 1

Exploratory data analysis with two qualitative/categorical variables Main tools Contigency tables Conditional, marginal, and joint frequencies 2

Motivating example Surviving the Titanic Was there a class discrimination in survival of the wreck of the Titanic? “It has been suggested before the Enquiry that the third-class passengers had been unfairly treated, that their access to the boat deck had been impeded; and that when they reached the deck the first and second-class passengers were given precedence in getting places in the boats.” Lord Mersey,

Titanic: Class by survival 1 st Class 2 nd Class 3 rd Class Crew Dead Alive

Titanic: Marginal frequencies % Dead = 1513/2224 = 0.68 % Alive = 711/2224 = 0.32 % in first class = 325/2224 = 0.14 % in second class = 285/2224 = 0.13 % in third class = 706/2224 = 0.32 % crew= 908/2224 =

Titanic: Conditional frequenceis % (Alive | 1 st ) = 203/325 = % (Alive | 2 nd ) = 118/285 = % (Alive | 3 rd ) = 178/706 = % (Alive | Crew) = 212/908 = Based on these frequencies does there appear to be class discrimination? 6

Titanic: Class by person type 1 st Class 2 nd Class 3 rd Class Crew Child Wom Men

Titanic: percentage of men in each class % (Man | 1 st ) = 175/325 = 0.54 % (Man | 2 nd ) = 168/285 = 0.59 % (Man | 3 rd ) = 462/706 = 0.65 % (Man | Crew) = 885/908 = 0.97 There are larger percentages of men in third class and crew 8

Surviving the Titanic A reason for class differences in survival: Larger percentages of men died 3 rd class consisted of mostly men. Hence, a larger percentage of 3 rd class passengers died. Once again keep in mind possible lurking variables that could be driving the relationship seen between two measured variables 9

Relative risk and odds ratios Motivating example Physicians’ health study (1989): randomized experiment with male physicians at least 40 years old Half the subjects assigned to take aspirin every other day Other half assigned to take a placebo, a dummy pill that looked and tasted like aspirin 10

Physicians’ health study Here are the number of people in each cell: 11

Relative risk y1y2 x1aba+b x2cdc+d a+cb+d Risk of y1 for level x1=a/(a+b) Risk of y1 for level x2=c/(c+d) 12

Relative risk for physicians’ health study Relative risk of a heart attack when taking aspirin versus when taking a placebo equals People that took aspirin are 0.55 times as likely to have a heart attack than people that took the placebo Or people that took placebo are 1/0.55 = 1.82 times as likely to have a heart attack than people that took aspirin 13

Odds ratios y1y2 x1ab x2cd Odds of y1 for level x1=a/b Odds of y1 for level x2=c/d 14

Odds ratios for physicians’ health study Relative risk of a heart attack when taking aspirin versus taking a placebo is Odds of having a heart attack when taking aspirin over odds of a heart attack when taking a placebo (odds ratio) 15

Interpreting odds ratios and relative risks When the variables X and Y are independent odds ratio = 1 relative risk = 1 When subjects with level x1 are more likely to have y1 than subjects with level x2, the odds ratio > 1relative risk > 1 When subjects with level x1 are less likely to have y1 than subjects with level x2, then odds ratio < 1relative risk < 1 16

Which one should be used? If Relative Risk is available then it should be used In a cohort study, the relative risk can be calculated directly In a case-control study the relative risk cannot be calculated directly, so an odds ratio is used instead Case-control studies is an example. They compare subjects who have a “condition” to subjects that don’t but have similar controls In this type of study we know %(exposure|disease). But to compute the RR we need %(disease|exposure). Recall that RR = %(disease|exposure)/%(disease|placebo) Not available in more complex modeling (logistic regression) 17

Odds ratio vs relative risk When is odds ratio a good approximation of relative risk When cases are representative of diseased population When controls are representative of population without disease When the disease being studied occurs at low frequency Of itself, an odds ratio is a useful measure of association 18

Relative risk vs absolute risk % smokers who get lung cancer: 8% (conservative guess here) Relative risk of lung cancer for smokers: 800% Getting lung cancer is not commonplace, even for smokers. But, smokers’ chances of getting lung cancer are much, much higher than non-smokers’ chances. 19

Simpsons paradox When a third variable seemingly reverses the association between two other variables Hot hand example 20