Assignment: Read Chapter 12 Exercises p218-224: 9, 11, 17, 19, 20 Statistic for the day: Number of countries that have Spanish-speaking populations greater than 100,000: 21 Assignment: Read Chapter 12 Exercises p218-224: 9, 11, 17, 19, 20 These slides were created by Tom Hettmansperger and in some cases modified by David Hunter
A General Strategy Research Question or Hypothesis Quantification Measure something relevant Count something relevant Collect Data: two-way table Statistical Analysis: Chisquare test Conclusions and/or Action
Exercise: Follow the 4 steps and answer the Research Question: Is there a relationship between gender and ownership of cell phones in Stat 100.2? Data Rows: gender Columns: cell phone no yes All female 12 124 136 male 14 87 101 All 26 212 237
Counts and percents Rows: Sex Columns: Cellphone No Yes All Female 12 124 136 8.82 91.18 100.00 Male 14 87 101 13.86 86.14 100.00 So 91.18% of women in the sample say yes but only 86.14% of men in the sample say yes. Are they statistically significantly different?
Counts and percents: Fall 2001 Rows: gender Columns: cellphone no yes All female 26 51 77 33.77 66.23 100.00 male 19 16 35 54.29 45.71 100.00 So 66.23% of women in the sample say yes but only 45.71% of men in the sample say yes. Are they statistically significantly different?
The strategy for determining statistical significance: First, figure out what the skeptic expects Second, figure out how far the research advocate’s data is from what is expected by the skeptic. Third, decide if the distance in the second step is large. Fourth, if large then claim there is a statistically significant difference.
Accepted definition of large for scientific purposes: Something is large when it is in the outer 5% tail of the appropriate distribution.
Step 1 We must compute what the skeptic expects: No Yes All Female 12 124 136 Male 14 87 101 All 26 211 237
Counts and expected, chi-squared statistic Rows: Sex Columns: Cellphone No Yes All Female 12 124 136 14.92 121.08 136.00 Male 14 87 101 11.08 89.92 101.00 All 26 211 237 26.00 211.00 237.00 Chi-Square = 1.506
Step 2 Preparing to compute the distance between data collected by the research advocate and what the skeptic expects.
Expected counts are printed below observed C51 C52 Total 1 12 124 136 14.92 121.08 2 14 87 101 11.08 89.92 Total 26 211 237 Chi-Sq = 0.571 + 0.070 + 0.769 + 0.095 = 1.506
Step 3: Compare chi-squared statistic with chi-squared distribution But our chi-squared is 1.506 so the skeptic wins! There is not a statistically significant difference between men and women with regard to cell phone usage.
Step 4: No statistically significant difference Rows: Sex Columns: Cellphone No Yes All Female 12 124 136 8.82 91.18 100.00 Male 14 87 101 13.86 86.14 100.00 Hence, the difference: 91.18% minus 86.14% = 5.04% is not statistically significant.
FALL 2001 results Chi-Sq = 0.788 + 0.529 + 1.734 + 1.164 = 4.215 Expected counts are printed below observed no yes Total w 26 51 77 30.94 46.06 m 19 16 35 14.06 20.94 Total 45 67 112 Chi-Sq = 0.788 + 0.529 + 1.734 + 1.164 = 4.215
FALL 2001 Step 3: It is large this time! But our chisquare is 4.215 so the research advocate wins! There is a statistically significant difference between men and women.
FALL 2001 Step 4: Statistically significant difference Rows: gender Columns: cell ph no yes All female 26 51 77 33.77 66.23 100.00 male 19 16 35 54.29 45.71 100.00 Hence, the difference: 66.23% minus 45.71% = 20.52% is statistically significant.
Why 1 degree of freedom? No Yes Women 136 Men 101 26 211 237 Note that black box is the ONLY one we can fill arbitrarily. Once that box is filled, all others are determined by margins!
How many degrees of freedom? Always Sometimes Never Women One df Two df 136 Men 101 106 105 26 237 Degrees of freedom (df) always equal (Number of rows – 1) times (Number of columns – 1)
Exercise: Follow the 4 steps and answer the research question: Is there a statistically significant difference in calories between small and large sandwiches? Data on slide #12.
Data: Low High Small 5 2 7 Large 14 Response: Calories Explanatory: Size
Solution Expected counts are printed below observed low high Total small 5 2 7 3.50 3.50 large 2 5 7 Total 7 7 14 Chi-Sq = 0.643 + 0.643 + 0.643 + 0.643 = 2.571 In this case the skeptic wins and the research advocate loses. So we cannot claim that there is a relationship between size and calories.
Health studies and risk Research question: Do strong electromagnetic fields cause cancer? 50 dogs randomly split into two groups: no field, yes field The response is whether they get lymphoma. Rows: mag field Columns: cancer no yes All no 20 5 25 yes 10 15 25 All 30 20 50
Rows: mag field Columns: cancer observed above the expected no yes All no 20 5 25 15.00 10.00 25.00 yes 10 15 25 All 30 20 50 30.00 20.00 50.00 Chi-Square = 8.333 (compare to 3.84) Research advocate wins!
Terminology and jargon: Identify the ‘bad’ response category: yes cancer Risk for categories of explanatory variable Identify treatment category Identify baseline (control) category Treatment risk: 15/25 or .60 or 60% Baseline risk: 5/25 or .20 or 20% Relative risk: Treatment risk over Baseline risk = .60/.20=3 So risk due to mag field is 3 times higher than baseline risk. One more on the next page:
So the percentage change is 200% Increased risk (percentage change in risk): So the percentage change is 200% A 200% increase in treatment risk over baseline risk for getting cancer.
Final note: When the chi-squared test is statistically significant then it makes sense to compute the various risk statements. If there is no statistical significance then the skeptic wins. There is no evidence in the data for differences in risk for the categories of the explanatory variable.