Download presentation
Presentation is loading. Please wait.
Published byBaldwin Oliver Modified over 8 years ago
1
Lunch & Learn Statistics By Jay
2
Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple statistical tests
3
What topics will we cover? Statistical concepts. Probability. Definitions Descriptive statistics. Hypothesis Formulation Hypothesis testing. Normal Distribution. and errors. Student’s t distribution Paired and unpaired t tests. Analysis of variance Regression Categorical data. Sensitivity / specificity. Chi square tests.
4
Session #1 Review Observations vary: sample v. population Observational vs. experimental data Graphing data
5
Session #2 Review Statistics are functions of the data Useful statistics have known distributions Statistical inference Estimation Testing hypotheses Tests seek to disprove a “null” hypothesis
6
Session #3 Review Tests involve a NULL hypothesis (H 0 ) an ALTERNATIVE hypothesis (H A ) Try to disprove H 0 4 steps in hypothesis testing –Identify the test statistic –State the null and alternative hypotheses –Identify the rejection region –State your conclusion
7
Session #4 Concepts (type I) and (type II) errors. Normal Distribution and the z- statistic: a mathematical construct. The Central Limit Theorem: a divine gift for statistical inference? We will use the Normal distribution to: … perform hypothesis tests … calculate power and sample size
8
Type I error …rejecting the null hypothesis when in fact, it is true = P (reject H 0 | H 0 true) = Generally, =0.05 For one-sided tests, it is conservative to choose =0.025
9
Type II error …accepting H 0 when in fact H A is true. = P (accepting H 0 | H A true) = Often we pick =0.20 or 0.10 and calculate the sample size to achieve this goal. In drug development, it is wasteful (but less expensive) to choose >0.10
10
What is Power? Power is 1- , that is… = 1 - P (accepting H 0 | H A true) = P (rejecting H 0 | H A true) …rejecting H 0 when in fact H A is true. This is something we want to happen: our goal!
11
Normal Distribution Parametric Mean and Variance 2 Mean = point of symmetry 2 = “spread” of bell curve Complicated mathematical formula f(x) = exp[-(x- ) 2 /2 2 ]/ [ (2 )] Looks like a bell centered at Distance from midline to inflection point = Importance: Central Limit Theorem
12
Central Limit Theorem Averages of random variables ~ N( , 2 /n) Proof: uses Taylor and MacLauren expansions of infinite series and other nasty mathematical tricks When n ~ teens averages ~Normal for reasonable distributions When n>30 averages ~Normal no matter how weird the original distribution
13
Z- statistic If xbar ~ N( , 2 /n), then… (xbar- )/( / n) ~ N(0,1) [Note change in the denominator] We say that “xbar has been normalized”
14
Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic
15
Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic ASSUMPTION: WE KNOW THE VARIANCE 2 OF THE POPULATION STUDIED
16
Example: C-section data Test if initial SBP is too low, I.e., < 85 mm Hg Four steps in testing: 1.Identify test statistic 2.State hypotheses 3.Identify rejection region 4.State conclusions
17
Example: C-section data Identify a test statistic: … minimum value? How is it distributed? … median value? How is it distributed? … average value? How is it distributed? N( , 2 / n )
18
Example: C-section data Identify a test statistic: xbar ~ N( , 2 / n ) Therefore… : z = (xbar- )/( / n) is the test statistic. We know it’s distribution: N(0,1)
19
Example: C-section data Identify a test statistic: z State null and alternative hypotheses: H 0 : >=85 (remember, put = with H 0 ) H A : < 85
20
Example: C-section data Identify a test statistic: z for average SBP State the null and alternative hypotheses H 0 : >=85 H A : < 85 Identify the rejection region Under H 0, we use 0 for population mean: z = (xbar- 0 )/( / n) Here 0 is 85 mmHg When z < -z , we reject H 0.
21
Example: C-section data Final step: state conclusion Calculate z = (xbar- 0 )/( / n) xbar = 80.25 0 = 85 (according to H 0 ) For now, we will use 8.006 for n = 20 z = -2.65, which is < -1.645 (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg
22
Example: C-section data Final step: state conclusion Calculate z = (xbar- 0 )/( / n) xbar = 80.25 0 = 85 (according to H 0 ) For now, we will use 8.006 for n = 20 z = -2.65, which is < -1.645 (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg “Nominal p-value” Look up value associated with z = -2.65: Get 0.004025
23
Calculating Power of a Test Recall, power = P (rejecting H 0 | H A true). [reject H 0 at level, when H A is true] Start with “reject H 0 at level”: P (reject H 0 | H 0 true) = = P (z < - z | H 0 true) for our example = P [(xbar- 0 )/( / n) < -z | = 0 ). = P [xbar < 0 – z ( / n)].
24
Calculating Power Note in this case P [(xbar-85)/1.79 < -1.645] = 0.05 P [xbar < 85 –(1.645)(1.79)] = 0.05 P (xbar < 82.05545) = 0.05 (Another way to write the rejection region of the hypothesis test)
25
Calculating Power Step 2: re-write probability for type II: Power = P (reject H 0 | H A true) P [xbar < 0 – z ( / n)]. = P [(xbar- A )/( / n) < ( 0 – z ( / n) - A )/ ( / n)] = P [z A < ( 0 - A )/ ( / n) – z ]
26
Calculating Power = P [z A < ( 0 - A )/ ( / n) – z ] = P [z A < (85-80)/1.79 – 1.645] = P (z A < 1.148) = 0.875
27
Calculating Sample Size = P [z A < ( 0 - A )/ ( / n) – z ] Choose power of 0.90 z A = 1.282 (table) 1.282 < ( 0 - A )/ ( / n) – z Solve for n. Usually 0 =0; A, specified; 2 estimated. For two-sided test, use z /2
28
Calculating Sample Size = P [z A < ( 0 - A )/ ( / n) – z ] Choose power of 0.90 z A = 1.282 (table) 1.282 < ( 0 - A )/ ( / n) – z 1.282 < (85-80)/ (8.006/ n) – 1.645 Solve for n: 21.97, round up to 22
29
Review of Session #4 (type I) and (type II) errors. Normal Distribution and the z- statistic The Central Limit Theorem Hypothesis testing using z and N(0,1) Calculating power Calculating sample size
30
Session #4 Homework Using the C-section data… (1)Determine whether or not the increase in SBP exceeds 20 mmHg. [Hint: form paired differences. Calculate 2 on diffs.] (2)What is the power of this test to detect an increase of 10 mmHg in SBP? (3)Extra Credit: Find sample size that provides 90% chance of detecting an increase in SBP of 5 mmHg or more.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.