Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics for the biomedical profession Lecture III

Similar presentations


Presentation on theme: "Biostatistics for the biomedical profession Lecture III"— Presentation transcript:

1 Biostatistics for the biomedical profession Lecture III
Biostatistics for the biomedical profession Lecture III BIMM18 Karin Källen & Linda Hartman September 2015

2 Today Repetition Lecture 1: summary measures and graphical methods
Lecture 2: Normal distribution, generalisation, confidence interval, reference interval, t-test, ANOVA Paired samples t-test Non-parametric tests Mann-Whitney’s test Kruskal-Wallis’ test Wilcoxon signed rank test

3 Repetition – the normal distribution
The (perfect) normal distribution The mean, median, and mode all have the same value The curve is symmetric around the mean; the skew and kurtosis is 0 The curve approaches the X-axis asymptotically Mean ± 1 SD covers 2∙34.1%=68.2% of data Mean ± 2 SD covers 2∙47.5%=95% of data Mean ± 3 SD covers 99.7% of data Excercise: What is the proportion of babies who will have a head circumference between -1 SD to +1 SD (=z-score -1 to +1)? 68%

4 Output from SPSS: Histogram for birth weight (in the ’births’ data set).
Excercise: Are the data normally distributed? Decide the limits between which 95% of all birth weights will be found. Compute a 95% confidence interval for the mean

5 Birth weights cont.

6 Birth weights cont. Tests of Normality Kolmogorov-Smirnova Shapiro-Wilk Statistic df Sig. Statistic df Sig. Birth_weight , ,200* , ,147 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Two different methods to test for normal distribution The p-value tells the likelihood for normal distribution

7 Are the data normally distributed?
Mean: 3539g SD : 542g N : 262 Excercise: Are the data normally distributed? Decide the limits between which 95% of all birth weights will be found. Clue: 95% of all data with lie between SD and SD. Compute a 95% confidence interval for the mean Clue: 95%CI: Mean +/ * SEM SEM= s/√n Yes, no reason to doubt 95% Reference interval: Lower limit: *542≈ 2477g Upper limit: *542≈ 4601g SEM=542/ √262 ≈ 33.5 95% Confidence interval: Lower limit: *33.5 ≈ 3473 Upper limit: *33.5 ≈ 3605 Mean with 95%CI: ( )

8 But…. What to do if data do not follow a normal distribution?
How to perform descriptive statitics 2. How to compare the results between two samples?

9 Output from SPSS: Maternal BMI (kg/m2)
Excercise: Could a normal distribution be assumed?

10 Maternal BMI, cont…

11 Maternal BMI, cont…. 1. Excercise: Could a normal distribution be assumed? 2. Which measurements should be used to produce descriptive statistics? 3. Under which circumstances could it be possible to nevertheless compare the means with a t-test? No!!! Median, inter-quartile range, histogram, box plot etc. We will repeat and learn more about the SEM (standard error of the mean)

12 Repetition – Central Limit Theorem
Mean has approximately normal distribution if no. of observations is large (faster if distribution is symmetric) Independent observations from the same distribution We could often use normal distribution to test difference in mean – even if observations are not normal 10000 samples of mean values from dice-rolls Based on 10 rolls 100 rolls

13 Repetition: Normal distribution
Mean 420 Std 400 Median 280 QL-QU percentile Excercise: Symmetric or assymetric? Mean or median to describe data? Use the median! Example from Björk, Praktisk statistik för medicin och hälsa

14 Normal distribution: Each mean based on samples with n=10 N=10 N=20
Original dataset Mean 420 Std 400 Median 280 QL-QU percentile Normal distribution: Each mean based on samples with n=10 N=10 N=20 10 observations of CB-153 was sampled 1000 times and the mean (of the 10 observations) was calculated. Histogram of the 1000 means. N=50 N=100 The mean ( 𝑋 ) will approach normal distribution as the number of observations increase

15 Exercise - CLT Histogram of 𝑋 𝑋 = 𝟏 𝟏𝟎 𝒊=𝟏 𝟏𝟎 𝑿 𝒊 …
Original dataset Mean 420 Std 400 Median 280 QL-QU percentile Exercise - CLT Histogram of 𝑋 𝑋 = 𝟏 𝟏𝟎 𝒊=𝟏 𝟏𝟎 𝑿 𝒊 𝑋 = 𝟏 𝟏𝟎𝟎 𝒊=𝟏 𝟏𝟎𝟎 𝑿 𝒊 N=10 N=20 Let m=mean(X), s= Std(X) in the population Excercise: If 𝑋 is based on N observations: Calculate the expected mean( 𝑋 ) (= mean (of means) ) std( 𝑋 ) ( = standard deviation (of means)=standard error of means=SEM ) in the 4 histograms. N=50 N=100 SEM=s/√n

16 Normal distribution: Let m=mean(X), s= Std(X) in the population
= 420 SEM = 400/ 20 = 89 Mean( 𝑋 ) = 420 SEM = 400/ 10 = 126 Original dataset Mean 420 Std 400 Median 280 QL-QU percentile Let m=mean(X), s= Std(X) in the population Excercise: If 𝑋 is based on N observations: Calculate the expected mean( 𝑋 ) (= mean (of means) ) std( 𝑋 ) ( = standard deviation (of means)=standard error of means=SEM ) in the 4 histograms. Mean( 𝑋 ) = 420 SEM = 400/ 50 = 57 Mean( 𝑋 ) = 420 SEM = 400/ 100 = 40 SEM=s/√n

17 Maternal BMI, cont…. 1. Excercise: Could a normal distribution be assumed? 2. Under which circumstances could it be possible to nevertheless compare the means with a t-test? No!!! If the samples are large enough (at least approx ), we could nevertheless compare the means).

18 Repetition: Normal distribution Estimate of mean: Confidence interval
Hypothesis testing Comparison of means: Conf int for difference in means (2 groups) T-test (2 groups) ANOVA (> 2 groups)

19 Confidence interval A confidence interval tells us within which interval the ’true’ estimate of a parameter probably lies- E.g., a 95% confidence interval tells us between which limits the ’true’ estimate (with 95% certainty) lies. Repetition: 95% of the data will lie between +/- 2 SD (1.96 exactly). A 95% CI could be constructed (large samples): (mean-1.96*SEM to mean+1.96*SEM)

20 Confidence interval Confidence grade 95% =100 %- 5%
Confidence grade 95% =100 %- 5% i.e. 5% = 1/20 intervals (produced in the same way) will not cover the true value! True value

21 Confidence intervals and reference intervals
A 95% confidence interval tells us between which limits the ’true’ estimate of the mean (with 95% certainty) lies: (mean-1.96*SEM to mean+1.96*SEM) A reference interval reflect the interval within which 95% of the population (or values) lies Example: Lower limit: mean – 1.96 * s Upper limit: mean * s

22 Confidence interval vs Reference interval
Confidence interval Interval for the mean of the population – with a specified confidence grade (here 95%) 95% CI: ( 𝑋 -1.96* 𝑠 𝑛 to 𝑋 +1.96* 𝑠 𝑛 ) Reference interval Interval for the individual values of the population, with a specified coverage (here 95%) 95% ref int: ( 𝑋 -1.96*𝑠 to 𝑋 +1.96*𝑠) The graphs are based on approx births.

23 The distribution of birth weight in two samples (n=100 and n=1000, respectively).
m=3477g, s=555g m=3507g, s=580g Excercise: The variance seems to be larger in the larger sample. Is that remarkable?

24 Exercise Which of the following statements are true? 1
Which of the following statements are true? 1 The larger the investigation, the narrower is the reference interval 2 The sample mean is always within the limits of the confidence interval 3 The population (true) mean is always within the limits of the confidence interval 4 The larger the investigation, the wider is the confidence interval 5 A confidence interval with 99% confidence grade is always wider than the corresponding confidence interval with 95% confidence grade 2 and 5 are correct

25 T-distribution For a large sample 95% CI=( 𝑋 -1.96∙ 𝑠 𝑁 to 𝑋 ∙ 𝑠 𝑁 ) For small samples we must account for uncertainty in the estimate of the standard deviation 𝑠 95% CI=( 𝑋 -t(N-1)* 𝑠 𝑁 to 𝑋 +t(N-1)* 𝑠 𝑁 ) Degrees of freedom t-constant for 95% CI 5 2.57 9 2.26 19 2.09 29 2.02 49 2.01 99 1.98 1.96 N-1=”degrees of freedom”

26 The quantiles for T could be looked up in tables…
The quantiles for T could be looked up in tables… But are rather produced by computer programs As SPSS Built-in in t-test & CIs

27 T-test for two independent samples (groups) - Example
Birth weight Two groups A: Smokers B: Non-smokers

28 Repetition: Normal distribution Estimate of mean: Confidence interval
Hypothesis testing Comparison of means: T-test/Conf int for difference in means (2 groups) ANOVA (> 2 groups)

29 T-test Assumptions The mean is a relevant summary measure
Independent observations (e.g. no patient contributes more than one observation) 3. Observations are of Normal distribution OR Both groups are large

30 T- test Example: Birth weight, Descriptives

31 Test procedure for t-test
Test variable: D = Mean in group B - Mean in group A H0: D = 0, Mean in group A = Mean in group B H1: D  0, Mean in group A  Mean in group B Construct a confidence interval for D and/or Calculate the p-value

32 Confidence interval - General formula
D is the observed average difference between the groups D= 𝑋 𝐵 - 𝑋 𝐴 c is a constant that is dependent on the confidence level and the sample size For 95% confidence level, and large sample, c ≈ 2 SE is standard error of D, a measure of how precise the estimated difference D is

33 T-test for two independent groups (cont.)
95% CI: D= = 154 g nA=183 nB=66 For now, we assume that the standard deviation is the same in both group A and B  Base the analysis on a weighted (”pooled”) standard deviation sPooled

34 T-test for two independent groups (cont.)
sA=554 sB=478 535 535 ∙ =76 Use constant c for 95% confidence level (5% risk level) with nA nB - 1 = 247 degrees of freedom  c  1.97 (obtained from statistical table for the t-distribution)

35 T-test for two independent groups (cont.) The computer makes the calculations for us…. But we have to interpret the results!

36 Discuss: D  c * SE  154  1.97 * 77  154  151 95% CI for the mean difference in birth weight is g How do you interpret the confidence interval? 2. Is there a significant difference in Birth weight? 3. What can you say about the corresponding p-value?

37 T-test for two independent groups Example of SPSS-output
Two different test versions depending on if equal standard deviation (variance) can be assumed or not P-values for the t-tests Levene’s test: p-value (”Sig.”) testing H0: Variance in A = Variance in B If not low (e.g. if p>.1) read from the upper row. Difference with 95% CI

38 Presenting t-test results
Average in each group Mean (possibly median as well for comparison) ± 95% CI sometimes relevant ± SE (i.e. 68% CI) not relevant Variability in each group Standard deviation Percentiles if report space permits Mean difference between the groups ± 95% CI usually relevant P-value. Value of t-variable usually not relevant.

39 Elements of statistical inference
The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis (H0) is true. Type I error (often referred to as alpha) is the probability of rejecting H0 when in fact H0 is true. Type II error (often referred to as beta) is the probability of accepting H0 when in fact H0 is false.

40 Important notes…. The p-value does not tell us anything about the size of the effect. Only how probable it is to obtain an effect of the size in our sample if the null hypothesis is true The P-value is a function of both sample size and of true effect size With large samples, statistically significant results could be found even if the size of the absolute effects are so small that they are of no clinical interest. The fact that no significant results were found, does not mean that no difference exists. Perhaps the study had to low power to detect a true difference/effect/association.

41 Statistical significance vs. clinical relevance
Low p-value How large is the difference? Statistical significance: ”There is a difference” Clinical relevance: ”Is the difference of importance?” Effect estimation (CI) is needed!

42 Exercise – statistical inference
Answer: E (or B) B C A D Clinically relevant effect No effect 95% Confidence intervals around the effect measure, and p-values for the null hypothesis of ”no effect” in 5 investigations Make pairs of the statements and study results (A-E) in the Figure  1. Treatment effect cannot be detected, but cannot be ruled out 2. A clinically relevant effect is indicated, but is statistically uncertain 3. Treatment effect is statistically significant, uncertain if the effect is of clinical relevance 4. Clinically relevant effect that is statistically significant 5. Treatment effect is statistically significant, but a clinically relevant difference can be ruled out Fr Jonas Björk: Praktisk statistik för medicin och hälsa

43 Excercise: Combine the statistical terms with the correct common phenomenons (in common language)
Type I error Type II error Confounding Non-causal association Mass-significance Lack of power

44 More than two groups, one way ANOVA (analysis of variance)
Multiple T-tests could result in mass-significance! Do ANOVA instead of repeated T-tests. ANOVA: H0: Mean1=mean2=mean3 H1: At least two of the means are different In short: In an ANOVA, the total variance is devided into the within-groups, and between-groups variance.

45 ANOVA Compare variances Between groups (VB) Within groups (VW)
VB VW Ratio VB/VW Large Small Large Small Large Small The quotient (F=VB/VW) is equal to 1 if group means are equal and >1 if they are not. The corresponding test is called an F-test – and is based on the F-distribution

46 Example: ANOVA – to compare the birth weight between 4 parity groups

47 Example: ANOVA – to compare the birth weight between 4 parity groups

48 Example: ANOVA birth weight and parity, continued
Significance of the test: Are all the means the same? (m1=m2=m3=m4) To check for pair-wise differences, post-hoc test could be performed

49 Post hoc tests Different methods to adjust for multiple comparison
Different methods to adjust for multiple comparison

50 Presenting ANOVA results
Average in each group Mean (possibly median as well for comparison) ± 95% CI sometimes relevant (± SE (i.e. 68% CI) not relevant) Variability in each group Standard deviation Percentiles if report space permits P-value together with ANOVA-table if report space permits, otherwise ”F(df1,df2)=…”

51 Paired samples & Non-parametric methods
Paired samples & Non-parametric methods

52 Difference between values
T-test for paired data Preparation ControlsDay2 SalDay2 Difference between values 11 915600 357800 2 953300 502200 3 650000 470000 4 700000 560000 5 736000 6 984000 556000 7 772000 418000 8 920000 600000 9 680000 10 520000 840000 12 533000 620000 13 510000 704000 14 722000 696000 Means Two ways of comparing means: Calculate the means of the groups, and estimate the difference Estimate the difference for each row. Then calculate the mean of the differences

53 Difference between values
T-test for paired data Preparation ControlsDay2 SalDay2 Difference between values 11 915600 357800 557800 2 953300 502200 451100 3 650000 470000 180000 4 700000 560000 140000 5 736000 314000 6 984000 556000 428000 7 772000 418000 354000 8 920000 600000 320000 9 680000 400000 10 520000 840000 280000 12 533000 620000 -87000 13 510000 704000 14 722000 696000 26000 Means 824992,8571 570000

54 Difference between values
T-test for paired data Preparation ControlsDay2 SalDay2 Difference between values 11 915600 357800 557800 2 953300 502200 451100 3 650000 470000 180000 4 700000 560000 140000 5 736000 314000 6 984000 556000 428000 7 772000 418000 354000 8 920000 600000 320000 9 680000 400000 10 520000 840000 280000 12 533000 620000 -87000 13 510000 704000 14 722000 696000 26000 Means 824992,8571 570000 254992,9 Difference between means= mean of the differences

55 Difference between values
T-test for paired data Preparation ControlsDay2 SalDay2 Difference between values 11 915600 357800 557800 2 953300 502200 451100 3 650000 470000 180000 4 700000 560000 140000 5 736000 314000 6 984000 556000 428000 7 772000 418000 354000 8 920000 600000 320000 9 680000 400000 10 520000 840000 280000 12 533000 620000 -87000 13 510000 704000 14 722000 696000 26000 Means 824992,8571 570000 254992,9 s= 181454,0097 111808 216636,9 s (combined) 150709,3394 SEM 56962,77603 57898,64

56 Difference between values
T-test for paired data Preparation ControlsDay2 SalDay2 Difference between values 11 915600 357800 557800 2 953300 502200 451100 3 650000 470000 180000 4 700000 560000 140000 5 736000 314000 6 984000 556000 428000 7 772000 418000 354000 8 920000 600000 320000 9 680000 400000 10 520000 840000 280000 12 533000 620000 -87000 13 510000 704000 14 722000 696000 26000 Means 824992,8571 570000 254992,9 Thus, the mean is not influenced on whether the data are paired or not, but the estimate of the standard deviation is likely to differ with method. Use analyses for paired data when adequate! s= 181454,0097 111808 216636,9 s (combined) 150709,3394 SEM 56962,77603 57898,64

57 Paired samples t-test Discuss:
Previous t-test was made to find differences between independent groups of observations Sometimes it is more powerful to test for differences within the same patient (or another paired measurement) In a study of weight loss from spicy food, 12 subjects were weighed before and after a month on spicy food diet, see the table Discuss: How would you test if the diet gave weightloss?

58 Paired samples t-test 95% CI for mean(d): Mean(d)± t0.025(N-1)∙ 𝑠 𝑁
Do a paired samples t-test! Calculate the differences di for each subject’s weights. Test if mean(d) = 0 95% CI for mean(d): Mean(d)± t0.025(N-1)∙ 𝑠 𝑁 Here: -2.1 ± 2.2∙ = -2.1 ± 1.9 = (-4.0,-0.17) Discuss: How do you interpret the CI? Was the treatment effective

59 Paired samples t-test Paired test: 95% CI for weight loss (-4.0,-0.17)
If the researchers wouldn’t recognize the paired design, but did an independent groups’ t-test: CI = (-18.6;22.76) P=0.84 Why so wide? Large variability BETWEEN subjects inflates the variability of the difference in an independent groups’ design!

60 Exercise: Creatinine was measured in 11 men and 12 women:
Men (nA = 11) Women (nB = 12) What test would you use to test if there is a difference? T-test? Are the assumptions of the test met?

61 Parametric methods for group comparisons
Normally distributed outcomes/’large’ studies Focus on mean comparisons Two independent groups t-test Paired groups (paired measurements) Paired t-test > 2 groups Analysis of variance (ANOVA) Regression analysis What if assumptions are not met? Non-parametric tests!

62 Comparison of medians – non-parametric tests
Comparison of medians – non-parametric tests

63 Non-parametric methods
Original measurements are converted to ranks in the analysis H0: Distributions are equal in all groups Median useful marker for differences in distribution Insensitive to skewed distributions, extreme values Can be used for ordinal data E.g. 0 = No response, 1 = Mild response, 2 = Strong response

64 Difference between two independent groups: Mann-Whitney’s test
Rank the observations from the lowest to highest Calculate rank sum in group A (WA) and in group B (WB)  Straightforward generalization to more than two groups (Kruskal-Wallis test) The larger the difference is in mean ranks WA/nA and WB/nB , the lower p-value will be Mann-Whitney Another name for the same test is ”Wilcoxon Rank sum test” which utilizes WA

65 Small Group Discussion...
Mann-Whitney… Small Group Discussion... Calculate the rank sum and the mean rank for males (and females if you have time) For the group sizes nA = 11 (males) and nB = 12 (females), p < 0.05 if the rank sum for the smallest group (males) is below 100 or above 175 Conclusion? How would you summarize the test? Summarang (kvinnor) samt (män). Medelrang 9.8 respektive 15.7 (p ungefär 0.04)

66 Presenting Mann-Whitney results
Average in each group Median (possibly mean as well for comparison) Variability in each group Percentiles or quartiles (in smaller groups) or min-max (in even smaller groups) Standard deviation not relevant Difference between the groups P-value for M-W test U-statistic sometimes relevant Ideally: Median difference ± 95% CI sometimes relevant (could be calculated in e.g. SPSS)

67 Mann-Whitney Creatinine, cont

68 Extension to more than two groups Kruskal-Wallis test
Mann-Whitney U-test (k = 2) E.g. H0: Distribution A = Distribution B H1: Distribution A  Distribution B Kruskal-Wallis test (k > 2 groups) E.g. k = 3: H0: Distribution A = Distribution B = Distribution C H1: Distribution A  Distribution B or Distribution A  Distribution C or Distribution B  Distribution C Independent groups, independent observations within each group Median useful marker for differences in distribution The more the mean ranks differ, the lower the p-value will be

69 Non-parametric methods Paired samples
Non-parametric methods Paired samples

70 Non-parametric test for paired samples: Wilcoxon signed rank test
Sum the positive ranks: =9.5 (Sum the negative ranks =56.5 (=11*12/2-9.5) ) Spicy diet continued: Subject Pre Post Diff Sign Rank Signed rank 1 65 62 -3 - 5,5 -5,5 2 88 86 -2 3 125 118 -7 11 -11 4 103 105 + 5 90 91 6 76 72 -4 8 -8 7 85 81 126 122 9 97 95 10 142 145 132 12 110 -5 -10 Subject Pre Post Diff Sign Rank Signed rank 1 65 62 -3 - 5,5 -5,5 2 88 86 -2 3 125 118 -7 11 -11 4 103 105 + 5 90 91 6 76 72 -4 8 -8 7 85 81 126 122 9 97 95 10 142 145 132 12 110 -5 -10

71 Wilcoxon signed rank test -Some remarks
Works for all types of distributions and for all study sizes Effect can be summarized by the median of the paired differences together with 95% CI Almost as powerful as the (paired) t-test if the differences are normally distributed Power is decreased if many differences are zero

72 Comparison of different tests
Test-situation Parametric test Non-parametric test Independent samples, 2 groups T-test Mann-Whitney Independent samples, ≥ 2 groups ANOVA Kruskal-Wallis Paired samples, 2 groups Paired t-test Wilcoxon rank sum test

73 Two broad categories of statistical methods
Ex Positive Negative Parametric methods T-test Results in both effect-measure (w CI) and p-value. More effective to detect differences if data is (close to) normal Based on assumptions about the distribution of data - typically 𝑁(𝜇,𝜎) Test results could be sensitive to deviation from Normal distribution, especially in small studies Non-parametric methods Mann-Whitney No assumptions about the distribution of data useful also for data measured on an ordinal scale Suitable for small studies Less powerful than parametric methods (if normal distribution applies) Typically results only in p-value (but sometimes an effect measure with CI could be computed)

74 Summary: Next lecture:
Repetition, Normal distribution, Reference interval, SEM, T-test, ANOVA Paired t-test Non-parametric methods Mann-Whitney Kruskal-Wallis Wilcoxon signed rank test Next lecture: Subject 2x2 Table. Chi2-test, Fisher exact test Probability, proportions Linear regression Correlation R2


Download ppt "Biostatistics for the biomedical profession Lecture III"

Similar presentations


Ads by Google