Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 12.
Bivariate Analysis Cross-tabulation and chi-square.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Does Background Music Influence What Customers Buy?
Chapter 13: Inference for Distributions of Categorical Data
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
CHAPTER 11 Inference for Distributions of Categorical Data
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
Risk and Relative Risk. Suppose a news article claimed that drinking coffee doubled your risk of developing a certain disease. Assume the statistic was.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Summarizing One or Two Categorical Variables & Relationships Between Categorical Variables Presentation 2.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
How Can We Test whether Categorical Variables are Independent?
Categorical Variables, Relative Risk, Odds Ratios STA 220 – Lecture #8 1.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
Relationships Between Categorical Variables Thought Questions 1. Suppose a news article claimed that drinking coffee doubled your risk of developing a.
Chapter 241 Two-Way Tables and the Chi-square Test.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 12.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Apr. 22 Stat 100. Final Wednesday April 24 About 40 or so multiple choice questions Comprehensive Study the midterms Copies and answers are at the course.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Feb. 13 Chapter 12, Try 1-9 Read Ch. 15 for next Monday No meeting Friday.
© Copyright McGraw-Hill 2004
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Categorical Variables – Risk Class 26 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright ©2011 Brooks/Cole, Cengage Learning Relationships Between Categorical Variables – Simpson’s Paradox Class 27 1.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Statistics 200 Lecture #9 Tuesday, September 20, 2016
Two-Way Tables and The Chi-Square Test*
CHAPTER 11 Inference for Distributions of Categorical Data
Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Two Categorical Variables: The Chi-Square Test
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 2 Principal Question: Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Displaying Relationships Between Categorical Variables Data displayed in a contingency or two-way table. If one variable is explanatory, use it to define the rows of the table. Two types of conditional percents: row percents and column percents. Use row percents if the explanatory variable is the row variable.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 4 Example 6.1 Smoking and Divorce Risk Data on smoking habits and divorce history for the 1669 respondents who had ever been married. Among smokers, 49% have been divorced, 51% have not. Among nonsmokers, only 32% have been divorced, 68% have not. The difference between row percents indicates a relationship.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 5 Example 6.2 Tattoos and Ear Pierces Responses from n = 565 men to two questions: 1. Do you have a tattoo? 2. How many total ear pierces do you have? Among men with no ear pierces, 43/424 = 10% have a tattoo. Among men with one ear pierce, 16/70 = 23% have a tattoo. Among men with two or more ear pierces, 26/71 = 37% have a tattoo. % with a tattoo as number of ear pierces => relationship Could examine column percents (see graph above) or overall percents too.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 6 Example 6.3 Gender and Reasons for Taking Care of Your Body 1997 poll (random-digit dialing) of 1218 southern CA residents. Question: What is the most important reason why you try to take care of your body? Is it mostly to be attractive to others, mostly to keep healthy, or mostly to help your self-confidence, or what? Percent distribution of responses shown for men and women. Pattern of responses is very similar. Response does not seem to be related to gender.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Risk, Relative Risk, Odds Ratio, and Increased Risk Number in category Total number in group Risk = Example: Within a group of 200 individuals, asthma affects 24 people. In this group the risk of asthma is 24/200 = 0.12 or 12%.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 8 Risk in category 1 Risk in category 2 Relative Risk = Example: For those who drive under the influence of alcohol, the relative risk of an accident is 15. => The risk of an accident for those who drive under the influence is 15 times the risk for those who don’t drive under the influence. Relative risk = 1 => two risks are the same. Risk in denominator often the baseline risk.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 9 Relative Risk of divorce = = 1.53 Example 6.1 Smoking and Divorce Risk (cont) In this sample, the risk of divorce for smokers is 1.53 times the risk of divorce for nonsmokers. For smokers: risk of divorce = 238/485 = or 49.1%. For nonsmokers: risk of divorce = 374/1184 = or 31.6%. 49% 32%

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 10 Difference in risks Baseline risk Percent increase in risk Note: When risk is smaller than baseline risk, relative risk < 1 and the percent “increase” will actually be negative, so we say percent decrease in risk. = x 100% = (relative risk – 1) x 100%

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 11 Relative Risk of divorce for smokers = 1.53 Example 6.1 Smoking and Divorce Risk (cont) The risk of divorce is 53% higher for smokers than it is for nonsmokers. Percent increase in risk of divorce for smokers = (1.53 – 1) x 100% = 53% Difference in risks Baseline risk = x 100% = 53% (49 – 32) 32

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 12 Odds = Number in category 1 to Number in category 2 = (Number in category 1/Number in category 2) to 1 Example: Odds of getting a divorce to not getting a divorce for smokers are 238 to 247 or 0.96 to 1. Odds of getting a divorce to not getting a divorce for nonsmokers are 374 to 810 or 0.46 to 1. Odds Ratio = 0.96 / 0.46 = 2.1 => the odds of divorce for smokers are about double the odds for nonsmokers. Odds Ratio = (Odds for group 1) / (Odds for group 2)

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Misleading Statistics About Risk Questions to Ask: What are the actual risks? What is the baseline risk? What is the population for which the reported risk or relative risk applies? What is the time period for this risk?

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 14 Example 6.4 Disaster in the Skies? Case Study 1.2 Revisited Look at risk of controller error per flight: In 1998: 5.5 errors per million flights In 1997: 4.8 errors per million flights “Errors by air traffic controllers climbed from 746 in fiscal 1997 to 878 in fiscal 1998, an 18% increase.” USA Today Risk of error increased but the actual risk is very small.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 15 Example 6.5 Dietary Fat and Breast Cancer Two reasons info is useless: 1.Don’t know how data collected nor what population the women represent. 2.Don’t know ages of women studied, so don’t know baseline rate. “Italian scientists report that a diet rich in animal protein and fat – cheeseburgers, french fries, and ice cream, for example – increases a woman’s risk of breast cancer threefold.” Prevention Magazine’s Giant Book of Health Facts (1991, p. 122).

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 16 Example 6.5 Dietary Fat and Breast Cancer (cont) Age is a critical factor. Accumulated lifetime risk of woman developing breast cancer by certain ages: By age 50: 1 in 50 By age 60: 1 in 23 By age 85: 1 in 9 Annual risk 1 in 3700 for women in early 30’s. If Italian study was on very young women, the threefold increase in risk represents a small increase.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc The Effect of a Third Variable and Simpson’s Paradox Example 6.7 Educational Status and Driving after Substance Use 1996 nationwide survey of 11,847 individuals 16 or over. Response was Driving Status with 3 categories: Unimpaired = never drove while impaired Alcohol = drove within 2 hours of alcohol use, but never after drug use Drug = drove within 2 hours of drug use and possibly after alcohol use.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 18 Example 6.7 Educational Status and Driving after Substance Use As amount of education increases, the proportion who drove within two hours of alcohol use also increases. One difference between the educ. groups is age. Not enough information on age so age is a lurking variable.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 19 Example 6.8 Blood Pressure and Oral Contraceptive Use Hypothetical data on 2400 women. Recorded oral contraceptive use and if had high blood pressure. Percent with high blood pressure is about the same among oral contraceptive users and nonusers.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 20 Example 6.8 Blood Pressure and Oral Contraceptive Use (cont) Many factors affect blood pressure. If users and nonusers differ with respect to such a factor, the factor confounds the results. Blood pressure increases with age and users tend to be younger. In each age group, the percentage with high blood pressure is higher for users than for nonusers => Simpson’s Paradox.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc Assessing the Statistical Significance of a 2x2 Table Question: Can a relationship observed in the sample data be inferred to hold in the population represented by the data? A statistically significant relationship or difference is one that is large enough to be unlikely to have occurred in the observed sample if there is no relationship or difference in the population.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 22 Five Steps to Determining Statistical Significance: 1.Determine the null and alternative hypotheses. 2.Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. 3.Assuming the null hypothesis is true, find the p-value. 4.Decide whether or not the result is statistically significant based on the p-value. 5.Report the conclusion in the context of the situation.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 23 Step 1: Null and Alternative Hypotheses null hypothesis: The two variables are not related. alternative hypotheses: The two variables are related.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 24 Step 2: The Chi-square Statistic Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship. Large difference => evidence of a relationship. Compute expected count for each cell: Expected count = (Row total)  (Column total) Total n for table Compute for each cell: (Obs count – Exp count) 2 Exp count Compute test statistic by totaling over all cells: (Obs count – Exp count) 2 Exp count

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 25 Step 3: The p-value of the Chi-square Test Q: If there is actually no relationship in the population, what is the likelihood that the chi-square statistic could be as large as it is or larger? A: The p-value Large test statistic => evidence of a relationship. So how large is enough to declare significance? Note: The p-value is generally reported in computer output.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 26 Steps 4 and 5: Making and Reporting a Decision Common rule: p-value  0.05 => say relationship is statistically significant and we reject the null hypothesis p-value > 0.05 => cannot say relationship is statistically significant and we cannot reject the null hypothesis Large test statistic => small p-value => evidence a real relationship exists in the population. Note: For 2  2 tables, a test statistic of 3.84 or larger is significant.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 27 Example 6.10 Randomly Pick S or Q Of 92 college students asked: “Randomly choose one of the letters S or Q”, 66% (61/92) picked S. Of another 98 students asked: “Randomly choose one of the letters Q or S”, 46% (45/98) picked S. Can we conclude order of letters on the form and the response are related? The p-value = which is less than 0.05, so the relationship is statistically significant.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 28 Factors that Affect Statistical Significance The strength of the observed relationship Example 6.10 Of those with “S or Q”, 66% picked S. Of those with “Q or S”, 46% picked S. Difference in percentages (66% - 46%) reflects the strength of the observed relationship.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 29 Factors that Affect Statistical Significance: (cont) How many people were studied Example: I.Treatment A had 8 of 10 patients improve. Treatment B had 5 of 10 patients improve. Strength = 80% - 50% = 30% seems large but study is too small. The p-value is II.Treatment A had 80 of 100 patients improve. Treatment B had 50 of 100 patients improve. Strength = 80% - 50% = 30% is again large. The p-value is , which is very significant.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 30 Practical versus Statistical Significance Statistical Significance does not mean the relationship is of practical importance. Example 6.12 Aspirin and Heart Attacks p-value is => relationship is statistically significant. Placebo: 189/11034 = 1.71% had attack Aspirin: 104/11037 = 0.94% had attack Difference only 1.71 – 0.94 = 0.77%, or less than 1%. With large sample this important difference was detected.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 31 Interpreting a Nonsignificant Result The sample results are not strong enough to safely conclude that there is a relationship in the population. The observed relationship could have resulted by chance, even if there is no relationship in the population. This is not the same as saying there is no relationship.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. 32 Case Study 6.1 Drinking, Driving, and the Supreme Court “Random Roadside Survey” of drivers under 20 years of age. p-value is => the observed association could easily have occurred even if there is no relationship in the population. This result was used by Supreme Court to overturn a law that allowed sale of beer to females but not males.