Applied statistics Katrin Jaedicke
Basic statistic terminology Using SPSS Summary statistics Cross-sectional and longitudinal comparisons of 2 and more samples Corrections for multiple comparisons Correlations Transformations Creating graphs in SPSS and SigmaPlot To be confident in using statistics! The statistics presented in the lecture are correct (to the best of my knowledge), but this does not imply that all other statistical methods are wrong! (But be sure you know what you are doing if you are using other methods!) What you will learn in this course
Introduction to SPSS
Comparison of 2 groups (k = 2) independent samplesdependent samples metric datacategorical datametric datacategorical data normal distribution Shapiro-Wilk Test yes no t-Test for independent samples (Student’s t-test) Mann-Whitney U-Testpaired t-TestWilcoxon Test normal distribution Shapiro-Wilk Test yesno Levene Test for homogeneity of variances yes no
Independent samples, dependent samples and replicates 15 kg 5 kg 15 kg Starvation 10 kg 15.1 kg 15 kg 14.9 kg 15 kg kg a) Independent samples b) Dependent (related) samples c) Replicates
Exercise Cell culture: Treatment 1 Treatment 2 Treatment 3 A B 24 h later Independent samples, dependent samples and replicates C D 0 h 6 h 24 h E ELISA
Metric and categorical data Age groups Child Teenager Adult Examples from the lab Metric ELISA Bradford protein assay Cell proliferation Flow cytometry Realtime PCR Categorical States of disease severity Cancer classifications Staining categories Metric Categorical
Normal distribution Height of each person Number of people Very few very small people Many average height people Very few very tall people
The Null Hypothesis The question that you ask when doing a statistic test. It is important to know which question the test is asking in order to understand the result! The accepted mistake is (generally) set at 5 % < 5 % *p < 0.05 (small mistake) < 1 % **p < 0.01 (even smaller mistake) < 0.1 % ***p < (very small mistake!) What we test in statistics: How big is the mistake that I make if I reject the Null Hypothesis? (e.g. if I say the Null Hypothesis is wrong)
The normal distribution test (Shapiro-Wilk test) asks the following question: p > 0.05 e.g. the hypothesis is right and our data follow a normal distribution! Answer to that question: No-> p < 0.05 Yes -> Do our data follow a normal distribution?
Homogeneity of variance How spread out are two different samples? Null Hypothesis Question: Are the variances in both populations equal? p > 0.05 = homogeneity of variance!
Null Hypothesis Question for any tests looking at differences between groups: There are no differences between the groups.? p < 0.05 = there is a significant difference between the groups
Comparison of more groups (k > 2) independent samplesdependent samples metric data categorical data normal distribution Shapiro-Wilk Test yes no t-Test with Bonferroni correction U-Test with Bonferroni correction Levene Test homogeneity of variances yes no oneway ANOVA Kruskal-Wallis metric data categorical data paired t-Test with Bonferroni correction Wilcoxon Test with Bonferroni correction repeated measurement ANOVA Friedman Test normal distribution Shapiro-Wilk Test yes no Mauchly’s Test sphericity yesno
Mauchly’s Test of Sphericity Null hypothesis question: Is the variance between all group differences the same? p > 0.05 = homogeneity of variance (Sphericity)! P1 P2 Patient Numbers P3 P4 P5 0 h 24 h 48 h 0 h-24 h 0 h-48h 24h-48h Note: if you want to know how to calculate Variance, check here:
Post-hoc testing and the Bonferroni correction 5 Student’s t-Tests: 1.Control-A 2.Control-B 3.Control-C 4.A-C 5.B-C Error of Multiple testing -> Control and C are replicates! Very small new p-values, risk of loosing all significance, especially if small sample size. Bonferroni-Holm or Benjamini-Hochberg (Benjamini only parametric data) correction: stepwise correction (less conservative, more powerful)
Corrections for multiple comparisons (Bonferroni corrections) ELISA 1.Control-A (p= ) 2.Control-B (p= 0.003) 3.Control-C (p= 0.01) 4.A-C (p= 0.04) 5.B-C (p = 0.06) Replicates! As post-hoc testing, we do 5 comparisons which give us 5 different p values The exact same Control data are used 3 times->Replicates! The exact same stimulation data C are used 3 times->Replicates! We need to correct for the Error of Multiple testing e.g. for the mistake of using Replicates! It does not matter if we have used (for each of the 5 tests, do not! mix different tests!) Student’s t-test, the paired samples t-test, Mann-Whitney or the Wilcoxon test to get these -> corrections should be done no matter which branch/side of the overview diagram you are on
Exercise Bonferroni-Holm 1. Put all the p values from the smallest to the highest into the K column ; 0.003; 0.01; 0.04; Use the new p values to define the level of significance (**) Note: If less tests are done (e.g. 3 or 4) or if more tests are done (e.g. 6, 7…), delete or add cells in the excel spreadsheet and change K accordingly.
Transformations -> achieve parametric testing Height of each person Number of people -To get not normal distributed data into a normal distribution -To get data which does not have equal variances into data which has equal variances -After transformations, data have to be checked again for normal distribution and equality of variance -!use the new data for statistics, but not for graphs! Graphs should be done with the original, untransformed data
Correlations metric data categorical data normal distribution Shapiro-Wilk Test yes no small sample size yes no Pearson correlationSpearman’s rank correlation
- p draw line - Correlation coefficient between 0 and 1 - < 0.3weak correlation - > 0.75strong correlation Correlations + Chi square Correlations Chi square -Only Yes-No answers exist -For example: comparison of gender, races, blood groups… -Important to test if patient groups are matched
The “grey” areas of statistics Q: How important is the normal distribution? A: The “big” tests such as ANOVA and repeated measures ANOVA, but also the t-tests for larger sample sizes, can “cope” with having only approximate normal distribution. Q: How important is the equality of variance? A: Very! A violation of equality of variances potentially changes test results and may also reduce statistical power. Q: What is a small and what is a large sample size? A: There is no “definition” of small and large sample size, it depends on the field of research what is commonly used. Rule of thumb: sample size of n=4 is the minimum when I can do parametric testing, anything less should be tested non-parametric. Q: Do I always have to correct for multiple comparisons? A: No, but you have stronger results if your p-values are still significant after correction and they are less likely being open to criticism of being a “chance” finding.
Mean and Median Mean-> Normal distributed data Add all numbers of analysed samples together and divide by n (sample size) For example: 1, 2, 4, 6, =25 Mean: 25/5=5 Median-> Data are not normal distributed Find the middle number of the analysed samples For example: Odd amount of numbers: 3, 9, 15, 17, 44 Middle number Median: 15 Even amount of numbers: 3, 6, 8, 12, 17, 44 Add the 2 middles numbers and divide by 2 Median: (8+12)/2=10
Standard deviation, Standard error and Interquartile range Standard deviation and Standard error-> Normal distributed data Standard deviation: how much variation is there around the mean - Small Standard deviation: data points are spread closely around the mean - Large Standard deviation: data points are spread widely around the mean - In Excel: =STDEV Standard error: Standard deviation of the error of how accurate the mean is -> does not add valuable information to the data, do not use! Interquartile range-> Data are not normal distributed first quartile (Q1) or lower quartile: 25 th percentile second quartile (Q2) or median: 50 th percentile third quartile (Q3) or upper quartile: 75 th percentile Interquartile range: Q3-Q1
Box plot