Download presentation
Presentation is loading. Please wait.
Published byKerry Allison Modified over 6 years ago
1
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
SMME I 2017/2018 Final exam Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
2
Layout 30 min 40 min SMME I Final Exam: Entry test in Bioethics
Entry test in Biostatistics ______________________ 1 case for statistical analysis and interpretation 1 bioethical case for comment and discussion 1 theory question from the bioethics questionnaire Oral discussion 30 min 40 min
3
Resources
4
Do not forget List of formulas Calculator Student book
5
Population vs Sample Population Parameters μ, σ, σ2
Sample / Statistics x, s, s2
6
Descriptive vs Inferential statistics
Population Parameters Sampling From population to sample Sample Statistics From sample to population Inferential statistics 6
7
Sampling Stages of sampling: Defining target population
Determining sampling size Selecting a sampling method Properties of a good sample: Random selection Representativeness by structure Representativeness by number of cases
8
Sample size calculation
Generally, the sample size for any study depends on: Acceptable level of confidence; Expected effect size and absolute error of precision; Underlying scatter in the population; Power of the study. High power Large sample size Large effect Little scatter Low power Small sample size Small effect Lots of scatter
9
Levels of measurement
10
Graphical summaries Variable Graph Statistics One qualitative
Bar chart Pie chart Frequency table Relative frequency table Proportion Two qualitative Side-by-side bar chart Segmented bar chart Two-way table Difference in proportions One quantitative Dotplot Histogram Boxplot Measures of central tendency Measures of spread Other: five number summary, percentiles, distribution shape One quantitative by one qualitative Side-by-side boxplots Stacked dotplots Statistics broken down by group Difference in means Two quantitative Scatterplot Correlation
11
Central tendency and spread
Central tendency: Mean, mode and median Spread: Range, interquartile range, standard deviation Mistakes: Focusing on only the mean and ignoring the variability Standard deviation and standard error of the mean Variation and variance What is best to use in different scenarios? Symmetrical data: mean and standard deviation Skewed data: median and interquartile range
12
Rule of 3-sigma When data are approximately normally distributed:
approximately 68% of the data lie within one SD of the mean; approximately 95% of the data lie within two SDs of the mean; approximately 99% of the data lie within three SDs of the mean.
13
Normal (Gaussian) distribution
Central limit theorem: Create a population with a known distribution that is not normal; Randomly select many samples of equal size from that population; Tabulate the means of these samples and graph the frequency distribution. Central limit theorem states that if your samples are large enough, the distribution of the means will approximate a normal distribution even if the population is not Gaussian. Mistakes: Normal vs common (or disease free); Few biological distributions are exactly normal.
14
Outliers Values that lie very far away from the other values in the data set.
15
Confidence interval for the population mean
Population mean: point estimate vs interval estimate Standard error of the mean – how close the sample mean is likely to be to the population mean. Assumptions: a random representative sample, independent observations, the population is normally distributed (at least approximately). Confidence interval depends on: sample mean, standard deviation, sample size, degree of confidence. Mistakes: 95% of the values lie within the 95% CI; A 95% CI covers the mean ± 2 SD.
16
Hypothesis testing The general idea of hypothesis testing involves:
Making an initial assumption; Collecting evidence (data); Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. Every hypothesis test — regardless of the population parameter involved — requires the above three steps.
17
Hypothesis testing Decision: Reject null hypothesis
Do not reject null hypothesis Null hypothesis is true Type I error No error Null hypothesis is false Type II error
18
Level of significance Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. α is decided as part of the research design, while P-value is computed from data. α = 0.05 is most commonly used. Small α value reduces the chance of Type I error, but increases the chance of Type II error. Trade-off based on the consequences of Type I (false-positive) and Type II (false-negative) errors.
19
Power Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). Power depends on the sample size, variability, significance level and hypothetical effect size. You need a larger sample when you are looking for a small effect and when the standard deviation is large.
20
Choosing a statistical test
Choice of a statistical test depends on: Level of measurement for the dependent and independent variables; Number of groups or dependent measures; Number of units of observation; Type of distribution; The population parameter of interest (mean, variance, differences between means and/or variances).
22
Parametric and non-parametric tests
Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population
23
Parametric and non-parametric tests
Type of test Non-parametric Parametric Scale Nominal Ordinal Ordinal, Interval, Ratio 1 group χ2 goodness of fit test Wilcoxon signed rank test 1-sample t-test 2 unrelated groups χ2 test Mann–Whitney U test 2-sample t-test 2 related groups McNemar test Paired t-test K unrelated groups Kruskal–Wallis H test ANOVA K related groups Friedman matched samples test ANOVA with repeated measurements
24
Normality test Normality tests are used to determine if a data set is modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. In descriptive statistics terms, a normality test measures a goodness of fit of a normal model to the data – if the fit is poor then the data are not well modeled in that respect by a normal distribution, without making a judgment on any underlying variable. In frequentist statistics statistical hypothesis testing, data are tested against the null hypothesis that it is normally distributed.
25
Chi-square test limitations
No categories should be less than 1 No more than 1/5 of the expected categories should be less than 5 To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more Yates Correction* When there is only 1 degree of freedom, regular chi-test should not be used Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values
26
Association is not causation.
Beware! Association is not causation. The observed association between two variables might be due to the action of a third, unobserved variable.
27
Fisher exact test This test is only available for 2 x 2 tables.
For small n, the probability can be computed exactly by counting all possible tables that can be constructed based on the marginal frequencies. Thus, the Fisher exact test computes the exact probability under the null hypothesis of obtaining the current distribution of frequencies across cells, or one that is more uneven.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.