Stat 301 – Day 33 t – procedures (cont.)
Last Time: Investigation (p. 438) Can’t compare difference in sample means without knowledge of sample size and sample variability More within group variability less significant Large sample sizes more significant In this example, any reasonable SD, sample size choice is highly significant Still is not a cause and effect conclusion!
Last Time 95% confident that the population of right- handers lives XX to XX longer, on average, than the population of left handers Though one reason for the lower percentages of left-handers in their 80s is a historical discouragement of left-handedness Would be difficult to randomly assign a person’s handedness
Last Time: Investigation Randomized experiment, treatment effect = unrestricted “population” mean – deprived mean H 0 : = 0 H a : > 0 If sample sizes are large or populations are normal, can use two-sample t test rather than methods of Ch. 2 Adds confidence interval May want to consider transforming the data first Randomization simulation allows other statistics
To pool or not to pool Could calculate the “pooled t test” Uses a different standard error T-Test of difference = 0 (vs <): T-Value = P-Value = DF = 19 95% CI for difference: (-28.21, -3.63) Both use Pooled StDev = Often doesn’t change results much T-Test of difference = 0 (vs <): T-Value = P-Value = DF = 17 95% CI for difference: (-28.43, -3.41) But would been slightly more powerful IF that assumption is true
Power Pooling With df = 19, rejection region = 1.73 Unpooled Need to approximate df, conservative df = 9, rejection region = 1.83 GMACRO ToPoolorNotToPool DO k1=1:1000 random 21 c2 c3 unstack c1 c4 c5; subs c3. let c4=c4+2 #unpooled let c3(k1)=(mean(c1)-mean(c2))/sqrt(std(c1)**2/11 + std(c2)**2/10) let c7(k1)=(c3(k1)>1.83) #pooled let k2=((10*std(c1)**2+9*std(c2)**2)/(19)) let c6(k1)=(mean(c1)-mean(c2))/sqrt(k2*(1/10+1/11)) let c8(k1)=(c6(k1)>1.73) ENDDO ENDMACRO If use df = 17, t > 1.74 Unpooled pooled
Investigation (p. 360) Recall Investigation 4.4.4: Scolari’s vs. Lucky’s Excedrin, coffee Avg = 2.41 Avg = 2.59 Gold Medal Flour Milk 1/2G, 1G
Comparison Shopping One-sample t-procedure on the differences (shopping99CLEANED.mtw) = mean price difference between the two stores (Lucky’s – Scolari’s ) H 0 : = 0 (no tendency for one store to be more expensive) H a : < 0 (on average, higher prices at Scolaris) Test of mu = 0 vs < 0 95% Upper Variable N Mean StDev SE Mean Bound T P diffs Variable N Mean StDev SE Mean 90% CI diffs ( , )
What if? Two-sample t (shopping99CLEANED.mtw) (unstacked data)
The Moral By pairing items, we take out a source of random variation (e.g., different items) to help us better focus on the comparison (between stores) that we are interested in…
Power Two-sample comparison Paired comparison
Example Is there a difference in the melting times of semisweet chips and butterscotch chips? Coin toss Heads = semi-sweet Tails = butterscotch Hold the chip on your tongue, touch it to the roof of your mouth, and hold it there, no “encouragement,” time how long before the chip is completely melted Stopwatch
Quiz 26 Each individual needs to submit their two times Semi-sweet time Butterscotch time In Seconds! Can view summary of responses but will make an Excel file (ChipMelting.xls) available to you in Studio-folder > bchance > stat301 Or ChocMelting.mtw
Example Carry out a two-sample t-test and then a paired t-test and compare the results Are either of these procedures valid? Did pairing appear to be helpful in this study? What source of variation is being accounted for in the pairing? What if skewed data?
For Tuesday PP Preview Investigation (p. 420)