Lunch & Learn Statistics By Jay
Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple statistical tests
What topics will we cover? Statistical concepts. Probability. Definitions Descriptive statistics. Hypothesis Formulation Hypothesis testing. Normal Distribution. and errors. Student’s t distribution Paired and unpaired t tests. Analysis of variance Regression Categorical data. Sensitivity / specificity. Chi square tests.
Session #1 Review Observations vary: sample v. population Observational vs. experimental data Graphing data
Session #2 Review Statistics are functions of the data Useful statistics have known distributions Statistical inference Estimation Testing hypotheses Tests seek to disprove a “null” hypothesis
Session #3 Review Tests involve a NULL hypothesis (H 0 ) an ALTERNATIVE hypothesis (H A ) Try to disprove H 0 4 steps in hypothesis testing –Identify the test statistic –State the null and alternative hypotheses –Identify the rejection region –State your conclusion
Session #4 Concepts (type I) and (type II) errors. Normal Distribution and the z- statistic: a mathematical construct. The Central Limit Theorem: a divine gift for statistical inference? We will use the Normal distribution to: … perform hypothesis tests … calculate power and sample size
Type I error …rejecting the null hypothesis when in fact, it is true = P (reject H 0 | H 0 true) = Generally, =0.05 For one-sided tests, it is conservative to choose =0.025
Type II error …accepting H 0 when in fact H A is true. = P (accepting H 0 | H A true) = Often we pick =0.20 or 0.10 and calculate the sample size to achieve this goal. In drug development, it is wasteful (but less expensive) to choose >0.10
What is Power? Power is 1- , that is… = 1 - P (accepting H 0 | H A true) = P (rejecting H 0 | H A true) …rejecting H 0 when in fact H A is true. This is something we want to happen: our goal!
Normal Distribution Parametric Mean and Variance 2 Mean = point of symmetry 2 = “spread” of bell curve Complicated mathematical formula f(x) = exp[-(x- ) 2 /2 2 ]/ [ (2 )] Looks like a bell centered at Distance from midline to inflection point = Importance: Central Limit Theorem
Central Limit Theorem Averages of random variables ~ N( , 2 /n) Proof: uses Taylor and MacLauren expansions of infinite series and other nasty mathematical tricks When n ~ teens averages ~Normal for reasonable distributions When n>30 averages ~Normal no matter how weird the original distribution
Z- statistic If xbar ~ N( , 2 /n), then… (xbar- )/( / n) ~ N(0,1) [Note change in the denominator] We say that “xbar has been normalized”
Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic
Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic ASSUMPTION: WE KNOW THE VARIANCE 2 OF THE POPULATION STUDIED
Example: C-section data Test if initial SBP is too low, I.e., < 85 mm Hg Four steps in testing: 1.Identify test statistic 2.State hypotheses 3.Identify rejection region 4.State conclusions
Example: C-section data Identify a test statistic: … minimum value? How is it distributed? … median value? How is it distributed? … average value? How is it distributed? N( , 2 / n )
Example: C-section data Identify a test statistic: xbar ~ N( , 2 / n ) Therefore… : z = (xbar- )/( / n) is the test statistic. We know it’s distribution: N(0,1)
Example: C-section data Identify a test statistic: z State null and alternative hypotheses: H 0 : >=85 (remember, put = with H 0 ) H A : < 85
Example: C-section data Identify a test statistic: z for average SBP State the null and alternative hypotheses H 0 : >=85 H A : < 85 Identify the rejection region Under H 0, we use 0 for population mean: z = (xbar- 0 )/( / n) Here 0 is 85 mmHg When z < -z , we reject H 0.
Example: C-section data Final step: state conclusion Calculate z = (xbar- 0 )/( / n) xbar = 0 = 85 (according to H 0 ) For now, we will use for n = 20 z = -2.65, which is < (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg
Example: C-section data Final step: state conclusion Calculate z = (xbar- 0 )/( / n) xbar = 0 = 85 (according to H 0 ) For now, we will use for n = 20 z = -2.65, which is < (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg “Nominal p-value” Look up value associated with z = -2.65: Get
Calculating Power of a Test Recall, power = P (rejecting H 0 | H A true). [reject H 0 at level, when H A is true] Start with “reject H 0 at level”: P (reject H 0 | H 0 true) = = P (z < - z | H 0 true) for our example = P [(xbar- 0 )/( / n) < -z | = 0 ). = P [xbar < 0 – z ( / n)].
Calculating Power Note in this case P [(xbar-85)/1.79 < ] = 0.05 P [xbar < 85 –(1.645)(1.79)] = 0.05 P (xbar < ) = 0.05 (Another way to write the rejection region of the hypothesis test)
Calculating Power Step 2: re-write probability for type II: Power = P (reject H 0 | H A true) P [xbar < 0 – z ( / n)]. = P [(xbar- A )/( / n) < ( 0 – z ( / n) - A )/ ( / n)] = P [z A < ( 0 - A )/ ( / n) – z ]
Calculating Power = P [z A < ( 0 - A )/ ( / n) – z ] = P [z A < (85-80)/1.79 – 1.645] = P (z A < 1.148) = 0.875
Calculating Sample Size = P [z A < ( 0 - A )/ ( / n) – z ] Choose power of 0.90 z A = (table) < ( 0 - A )/ ( / n) – z Solve for n. Usually 0 =0; A, specified; 2 estimated. For two-sided test, use z /2
Calculating Sample Size = P [z A < ( 0 - A )/ ( / n) – z ] Choose power of 0.90 z A = (table) < ( 0 - A )/ ( / n) – z < (85-80)/ (8.006/ n) – Solve for n: 21.97, round up to 22
Review of Session #4 (type I) and (type II) errors. Normal Distribution and the z- statistic The Central Limit Theorem Hypothesis testing using z and N(0,1) Calculating power Calculating sample size
Session #4 Homework Using the C-section data… (1)Determine whether or not the increase in SBP exceeds 20 mmHg. [Hint: form paired differences. Calculate 2 on diffs.] (2)What is the power of this test to detect an increase of 10 mmHg in SBP? (3)Extra Credit: Find sample size that provides 90% chance of detecting an increase in SBP of 5 mmHg or more.