Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?

Similar presentations


Presentation on theme: "Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?"— Presentation transcript:

1 Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?

2 The t Distribution  We want to compute a confidence interval & test a hypothesis for an unknown population mean µ  To use the Z distribution we must know the population standard deviation σ  In most real life situations, we don’t know the true population standard deviation  In this case we can use the t distribution instead of the Z distribution to calculate confidence intervals & test hypotheses It seems the more we learn the less we know!

3 T vs. Z Distribution: Who’s who? T Distribution: Unimodal & symmetric around zero Use when population µ & σ are both UNKNOWN Assumes variable of interest is normally distributed Using sample S.D. introduces more sampling variability Heavier tails (n-1) degrees of freedom Z Distribution (CLT): Unimodal & symmetric around zero Use when population µ is UNKNOWN but σ is KNOWN CLT allows us to say the sampling distribution of the mean is approx. normal as n gets large, even when the underlying variable of interest is not normally distributed Smaller tails

4 T vs. Z Distribution: Who’s who?

5 The t Distribution  Assumptions: The variable (X) is normally distributed Random sample of size n from the underlying population Very similar to Z distribution as sample size gets “large” (30+) (n-1) degrees of freedom

6 Degrees of Freedom: (n-1)  The “Currency” of statistics - you earn a degree of freedom for every data point you collect, and you spend a degree of freedom for each parameter you estimate. Since you usually need to spend 1 just to calculate the mean, you then are left with n-1 (total data points "n" - 1 spent on calculating the mean). (Reference: http://www.isixsigma.com/dictionary)  A general rule is that the degrees of freedom decrease when we have to estimate more parameters.  Before you can compute the standard deviation, you have to first estimate a mean.  This causes you to lose a degree of freedom (Reference: http://www.childrens-mercy.org/stats/ask/df.asp) Two statistics are in a bar, talking and drinking. One statistic turns to the other and says "So how are you finding married life?" The other statistic responds, "It's okay, but you lose a degree of freedom."

7 STATA & the one sample t-test

8 STATA Options  Get t critical value: display invttail(df,p) Used for CI & Hypothesis Tests  Get p-value: display ttail(df,t) (one-sided) display tprob(df,t) (two-sided)  Run t-test from data summary: Useful for summary homework problems ttesti n x_bar s µ  Run t-test on actual data: Useful in real-life research ttest varname= µ

9 STATA: obtaining the critical value  Example: Concentration of benzene in cigars  Hypothesis Test: 2-sided test Null Hypothesis: μ=81 μg/g vs. Alternative Hypothesis: μ≠81 μg/g Standard deviation is unknown α= 0.05 (two-sided test)  Data: Sample mean= 151 μg/g Sample Standard deviation, s=9 μg/g d.f. = n-1 = 7-1 = 6  The STATA command: invttail(df,p) where df is the degrees of freedom and p is a number between 0 and 1. display invttail(6,0.025) 2.4469118 This means that if T statistic is above 2.447 or below –2.447, then we would reject the null hypothesis at the 5% alpha level. Since the observed value of the statistic T is 20.6, we reject the null hypothesis. t=

10 Note of Confusion!  Note! invnorm(p) returns the inverse cumulative standard normal distribution [i.e. returns z which satisfies P(Z ≤ z)=p] invttail(df,p) returns the inverse REVERSE cumulative Student's t distribution [i.e. returns t which satisfies P(T ≥ t)=p)] So instead of using invttail(6,0.975) we should use invttail(6,0.025)

11 STATA: obtaining the p-value  Use: ttail(df,t) (one-sided) or tprob(df,t) (two-sided) display ttail(6,20.6) 4.257e-07  This gives you P(T ≥ 20.6). To obtain the p-value for this two-sided test, we have p=P(|T| ≥ 20.6) = P(T ≥ 20.6 or T ≤ -20.6)=2*P(T ≥ 20.6)= 8.513e-07.  While tprob will give you P(|T| ≥ t) directly: display tprob(6,20.6) 8.513e-07

12 STATA: running one-sample t-test from summary statistics  ttesti n x_bar s µ  Null Hypothesis: μ=81 μg/g vs. Alternative Hypothesis: μ≠81 μg/g Data: Sample mean= 151 μg/g (x_bar) Sample Standard deviation, s=9 μg/g d.f. = n-1 = 7-1 = 6 ttesti 7 151 9 81

13 STATA Output  One-sample t test  ------------------------------------------------------------------------------  | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]  ---------+--------------------------------------------------------------------  x | 7 151 3.40168 9 142.6764 159.3236  ------------------------------------------------------------------------------  Degrees of freedom: 6  Ho: mean(x) = 81  Ha: mean 81  t = 20.5781 t = 20.5781 t = 20.5781  P |t| = 0.0000 P > t = 0.0000

14 STATA: running one-sample t-test on data  Open lowbwt.dta contained on the disk in your book. If you wish to test a hypothesis regarding the population mean of the gestation age of low birth weight infants (for example: you might hypothesize that low birth infants have gestation ages greater than 28 weeks). To test this one- sided hypothesis: H0: mean <= 28 H1: mean > 28 Alpha-level = 0.05 You would use the following STATA command: ttest gestage = 28

15 STATA Output  One-sample t test  ------------------------------------------------------------------------------  Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]  ---------+--------------------------------------------------------------------  gestage | 100 28.89.253419 2.53419 28.38716 29.39284  ------------------------------------------------------------------------------  Degrees of freedom: 99  Ho: mean(gestage) = 28  Ha: mean 28  t = 3.5120 t = 3.5120 t = 3.5120  P |t| = 0.0007 P > t = 0.0003  STATA writes out two-sided and one-sided hypotheses. In this case, we would be employing the one on the right (Ha: mean>28). Since the p-value is 0.0003 which is less than our alpha-level of 0.05, we would reject the null hypothesis and conclude that the mean gestation age is not less than 28 weeks.

16 STATA: two sample t-test & paired t-test Next Week…


Download ppt "Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?"

Similar presentations


Ads by Google