Download presentation
Presentation is loading. Please wait.
Published byἌδραστος Ουζουνίδης Modified over 5 years ago
1
Univariate analysis Önder Ergönül, MD, MPH 17-28 June 2019
2
Statistics is used to: Describe data – descriptive statistics
Compare study groups Search for any correlation between variables Search for a regression between variables in order to extrapolate the dependent variable if independent variable(s) is(are) known Analyze health outcomes
3
Overview of Presentation
Which test to use? Parametric assumptions Non-parametric tests ANOVA (Analysis of variance) Correlation
4
Dependent vs independent variables
The dependent variable represents the output or the effect, or is tested to see if it is the effect.
5
Dependent vs independent variables
The independent variable is the variable that you have control over, what you can choose and manipulate. is usually what you think will affect the dependent variable. In some cases, you may not be able to manipulate the independent variable. It may be something that is already there and is fixed, like color, kind, time; something you would like to evaluate with respect to how it affects the dependent variable.
6
Which statistical test(s)?
How many study groups are there? One Two More than two
7
Which statistical test(s)?
2. Type of data
8
Which statistical test(s)?
Are the consecutive measurements / assessments of the dependent variable? If the dependent variable is metric, is it normally distributed? What are you looking for? Difference between groups Relationship between variables
9
Univariate analysis: Comparison of 2 groups
Variable Metric Categorical Is distribution Normal? Non-parametric tests The readers should have an idea about the statistical tests, although they do not need to be able to build a car in order to drive one. For comparison of two groups, the first step is to see whether the group of data is categoric or continous, and then to understand whether the data distributed normally or not.
10
Univariate analysis: Comparison of 2 groups
Metric variable Symmetric (Normal) distribution Asymmetric Student t test Mann-Whitney-U test Willcoxon test For the continuous variable, if the data is distributed normally, then the student t testi is the choice, otherwise Mann-Whitney U for unpaired samples, and Willcoxon test for paired samples are suggested.
11
Unpaired (parallel, independent) groups
Dependent variable Independent variable Test (P) Test (NP) categorical Chi square* metric (2 groups) Student t test Mann Whitney U (>2 groups) One way ANOVA Kruskal Wallis Pearson correlation Spearman correlation *Chi square tests are neither P nor NP.
12
Paired (dependent) groups
Dependent variable Independent variable Test categorical Mc Nemar metric categorical (2 groups) paired t test or Wilcoxon categorical (>2 groups) Friedman Spearman correlation
13
Chi Squared Test A chi-squared test, also referred to as chi-square test or test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.
14
Assumptions Normality Homogeneity of variance
15
Centre of a frequency distribution
“Central Tendency” Mean Median Mode
16
Median vs Mean Mean = 95 Calculate the mean without the highest score (234) = 81.1 Median = 95.5
17
Assumptions Normality Homogeneity of variance
18
Assumptions Homogeneity of variance Variance:
Average error between the mean and the observations made A measure in units square Standard Deviation (SD): Square root of the variance The measure of average error is in the same units as the original measure Sum of squared error
19
Assumptions Homogeneity of variance
20
Why does the sample size matter?
“Central limit theorem”
21
Parametric - Nonparametric Tests
A parametric test is a test based on some parametric assumptions, most often an assumption that the distribution of the variable in the population is normal. A nonparametric test does not rely on parametric assumptions, like having a normal distribution and equal variances.
22
Parametric - Nonparametric Tests
Groups compared 2 independent groups 2 dependent groups More than 2 independent groups More than 2 dependent groups Parametric Unpaired t-test Paired t-test ANOVA Repeated measures ANOVA Nonparametric Mann-Whitney U test Wilcoxon signed rank test Kruskal – Wallis test Friedman’s ANOVA
23
Nonparametric Tests Do not rely on population parameters such as mean or SD No assumptions about the distribution of the population Based on the ranks of the values rather than the values themselves Ranks the values, get the sum of the ranks and compare this sum to the expected frequency where all the ranks are equally likely Makes inferences on median not mean (the median is a much better measure)
24
Three-Step Process Begin with the Null Hypothesis Test the Hypothesis
Look and make sense of the data Then get the P value Conclude Reject Null Hypothesis if P < (or 5%) or Accept Null Hypothesis
25
Compare two independent groups for continuous variable
Sample size? Aim: To examine the relationship between total cholesterol levels and heart attack. A total of 28 male heart attack patients had their cholesterol levels measured at 2 days after heart attack. Cholesterol levels were recorded for a control group of 30 male patients of similar age and weight who had not had a heart attack. Compare two independent groups for continuous variable
26
Non-normal distribution or very unequal dispersions
Mann-Whitney U Test
27
Three step process What is the p value? Null Hypothesis:
“The median total serum cholesterol level of cases at two days post attack is the same as the median total serum cholesterol level of controls”. Test the hypothesis: What is the p value? Mann-Whitney p value = Conclude: Since the p value is <0.05, we reject H0, and conclude that our result is statistically significant.
28
Compare two dependent groups for continuous variable
Sample size? Aim: To examine the relationship between total cholesterol levels and heart attack. A total of 28 male heart attack patients had their cholesterol levels measured at 2 days and 4 days post attack. Compare two dependent groups for continuous variable
29
Wilcoxon signed-ranks test
Non-normal distribution or very unequal dispersions Cases 2 days 4 days N 28 Mean 253.93 235.32 Median 268 239 SD 47.71 60.3 Wilcoxon signed-ranks test
30
Three step process What is the p value? Null Hypothesis:
“The median difference in total serum cholesterol level of cases at 2 days post attack and at 4 days post attack is zero.” Test the hypothesis: What is the p value? Wilcoxon signed ranks p value = 0.02 Conclude: Since the p value is <0.05, we reject H0, and conclude that our result is statistically significant.
31
Compare independent three groups for continuous variable
Sample size? Aim: To examine the relationship between total cholesterol levels and household income. A total of 150 participants had their cholesterol levels measured. Income was categorized as high, middle and low income. Compare independent three groups for continuous variable
32
ANOVA (Analysis of Variance)
Normal distribution and similar dispersions High Income Middle Income Low Income N 35 55 60 Mean 144.8 148.3 157.5 Median 150 147 160 SD 28.3 25.8 27.9 ANOVA (Analysis of Variance)
33
ANOVA (Analysis of Variance)
ANOVA is based on comparing the variance (or variation) between the data samples to variation within each particular sample. If the between variation is much larger than the within variation, the means of different samples will not be equal. If the between and within variations are approximately the same size, then there will be no significant difference between sample means.
34
ANOVA (Analysis of Variance)
All populations involved follow a normal distribution. All populations have the same variance (or standard deviation). The samples are randomly selected and independent of one another. The samples are independent of one another.
35
Three step process What is the p value? Null Hypothesis:
“The mean total serum cholesterol levels of three different groups of participants are similar” Test the hypothesis: What is the p value? F test p value = 0.01 Conclude: Since the p value is <0.05, we reject H0, and conclude that our result is statistically significant.
36
Which means are statistically significantly different from each other?
There are many methods for comparing means after rejecting the null hypothesis. Planned comparisons vs post hoc tests. Pairwise comparison: Bonferroni correction (p value/number of pairwise comparisons) (Kruskal – Wallis Test) Tukey post hoc test p - value High - Middle 0.053 High- Low 0.01 Middle - Low 0.02
37
Parametric - Nonparametric Tests
Groups compared 2 independent groups 2 dependent groups More than 2 independent groups More than 2 dependent groups Parametric Unpaired t-test Paired t-test ANOVA Repeated measures ANOVA Nonparametric Mann-Whitney U test Wilcoxon signed rank test Kruskal – Wallis test Friedman’s ANOVA
38
Summary Type of the study and the characteristics of the data are important when choosing the accurate test Perform an analysis strategy before collecting the data Do not forget to check the assumptions before analysing the data When normality and equality of variances assumptions are violated, use nonparametric tests, otherwise use parametric tests which are more powerful
39
Comparison of study groups at baseline in RCT
According to the CONSORT statement, significance testing of baseline differences in randomized controlled trials should not be performed.
40
Comparison of study groups at baseline in RCT
41
Baseline imbalance in RCTs
Any baseline difference between the groups under study are by definition due to chance (as long as the randomization was performed correctly).
42
Baseline imbalance in RCTs
Whether baseline differences are significant does not have any implications for the validity of the results of the study. Even a covariate* that is balanced between treatment groups (according to a p-value) can affect the association between treatment and outcome. *A covariate is a variable that is possibly predictive of the outcome under study.
43
Baseline imbalance in RCTs
Choice of baseline characteristics by which an analysis is adjusted should be determined by prior knowledge of an influence on outcome rather than evidence of imbalance between treatment groups in the trial. Such information should ideally be included in trial protocols and reported with details of the analysis.
44
What should we do? At the planning stage of a study, baseline variables of prognostic value should be identified on the basis of available evidence. These should be fitted in an analysis of covariance or equivalent technique for other data types. Other variables should not be added to the analysis unless information from other sources during the course of the trial suggests their inclusion.
45
Nazi Germany invasion of Norway and the cardiovascular disease correlation
46
Correlation analysis The logic of the correlation is straightforward.
Where there is a linear relationship between two variables there is said to be a correlation between them. The strength of that relationship is given by the “correlation coefficient”, and indicated by the letter “r”.
47
Correlation analysis Body weight Systolic blood pressure r = 0.70 A positive correlation coefficient means that as one variable is increasing the value for the other variable is also increasing.
48
Coronary artery diameter
Correlation analysis Total cholesterol (mg/dL) Coronary artery diameter (mm) r = -0.85 A negative correlation coefficient means that as the value of one variable goes up the value for the other variable goes down.
49
Correlation analysis If there is a perfect relationship between two variables, then r=1, or r=-1(negative correlation).
50
Correlation analysis r=0 r=0 r=0
51
Correlation analysis Correlation coefficient is between -1 and +1
Exactly –1. A perfect downhill (negative) linear relationship –0.70. A strong downhill (negative) linear relationship –0.50. A moderate downhill (negative) relationship –0.30. A weak downhill (negative) linear relationship 0. No linear relationship A weak uphill (positive) linear relationship A moderate uphill (positive) relationship A strong uphill (positive) linear relationship Exactly +1. A perfect uphill (positive) linear relationship
52
Correlation analysis Spearman (nonparametric) vs Pearson (parametric) correlation P value – statistical significance of the correlation coefficient (r) Outliers
53
How to present correlation?
54
Summary Your data determines which statistical test you need.
Think about specific hypothesis at start of study. Clearly define your hypotheses. Determine how to collect data / which data to collect. Parametric tests typically more powerful than non-parametric tests. This lecture, I will focus on Case-Control studies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.