Download presentation
Presentation is loading. Please wait.
Published byAbigail Hortense Freeman Modified over 9 years ago
1
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health npay@marmara.edu.tr
2
At the end of the session the participants will be able to: define bias and random error explain how statistical inference is determined list the steps of hypothesis testing differentiate parametric and non-parametric tests choose the appropriate statistical test Learning Objectives
3
There was an economic crisis with nearly 9 million unemployed people… The candidates for the elections were Franklin Delano Roosevelt and Alfred Mossman Landon Presidential Elections in USA, 1936
4
The Literary Digest made one of the largest polls ever conducted. Approximately 2 300 000 prospective voters filled in the questionnaires. Findings of the poll Roosevelt: %43 Landon:%57 Findings of the poll Roosevelt: %43 Landon:%57
5
Literary Digest was not accurate in predicting the winner BUT George Gallup was able to predict a victory for Roosevelt using a much smaller sample of about 50 000 people. Actual Results: Roosevelt:62%, Landon: 38%!!!
6
Selection bias Literary Digest chose the prospective voters from the subscription list of the magazine, from automobile registration lists, from phone lists, and from club membership lists. BIAS Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate BIAS Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate
7
What is the proportion of red colored candies in the jar? Take a sample of 25 candies: 1st sample: 40% 2nd sample: 60% 3rd sample: 20% Take a sample of 25 candies: 1st sample: 40% 2nd sample: 60% 3rd sample: 20%
8
What if we took a 10 times larger sample with 250 candies from the same jar? Take a sample of 250 candies: 1st sample: 38% 2nd sample: 40% 3rd sample: 43% Take a sample of 250 candies: 1st sample: 38% 2nd sample: 40% 3rd sample: 43%
9
Errors in Epidemiology: 1. Bias: any trend in the collection, analysis, interpretation, publication or review of data that can lead to conclusions that are systematicly different from the truth 2. Random error: the variation in a sample that can be expected to occur by chance
10
Estimating Random Error The sample of 25 candies: 1st sample: 38% (95%CI: 21%-61%) 2nd sample: 40% (95%CI: 29%-79%) 3rd sample: 43% (95%CI: 7%-41) The sample of 250 candies provides a better precision 40% (95%CI: 34%-46%) 40% (99%CI: 32%-48%) The sample of 250 candies provides a better precision 40% (95%CI: 34%-46%) 40% (99%CI: 32%-48%)
11
Estimating random error We need to indicate the variability the estimate would have in other samples. Confidence Intervals (CI): CIs define an upper and a lower limit with an associated probability. The ends of the CI are called confidence limits.
12
Statistical Inference There are two approaches for statistical inference: 1. Estimating parameters 2. Testing hypothesis Make certain assumptions about the population and then use probabilities to estimate the likelihood of the results obtained in the sample. assume a random sample has been properly selected.
13
Hypothesis Testing Steps: 1. State the hypothesis 2. Decide on the appropriate statistical test and select the level of significance 3. Perform the test and draw a conclusion
14
Hypothesis testing 1. State the hypothesis H 0 : null hypothesis H 1 : alternative hypothesis If the H 0 is rejected, then the H 1 is concluded. If the evidence is insufficient to reject H 0, it is retained but not accepted per se.
15
Hypothesis testing 2. Decide on the appropriate statistical test and select the level of significance The level of significance when chosen before the statistical test is performed is called the alpha level. Alpha level: The probability of incorrectly rejecting the null hypothesis when it is actually true. (0.05, 0.01, 0.001)
16
Hypothesis testing True Situation Conclusion from the hypothesis test Difference exists H 1 No difference H 0 Difference exitsPower, 1- β α error, type 1 error No differenceβ error, type 2 error
17
Hypothesis testing 3. Perform the test and draw a conclusion p value Probability of obtaining a result as extreme as (or more extreme than) the one observed, if the H 0 is true The p value is calculated after the statistical test is performed and if the p value is less than alpha the H 0 is rejected.
18
Which statistical test to use? Evaluate the following: If the variables are qualitative or quantitative? If the groups are dependent or independent? How many groups are there? If the data are normally distributed? If the variances are homogeneous?
19
Measuring variables Categorical (nominal): has two or more categories, but there is no intrinsic ordering to the categories (gender, blood type) Ordinal: similar to categorical but there is a clear ordering of the variables (SES, satisfaction scales). Interval: an interval variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. Continuous: numeric values that can be ordered sequentially, and that do not naturally fall into discrete ranges (weight)
20
Dependent and Independent Groups Independent groups The researcher chooses two groups; participants who engage in regular physical activity and who are sedentary. The two groups are compared for their HDL levels Dependent (paired) groups The researcher chooses sedentary participants and determines their HDL level. Then the participants are asked to engage in regular physical activity and their HDL levels are determined again.
21
Normal and skewed distributions
22
Parametric vs Non-parametric tests Parametric tests Can be used when data are approximately normally distributed and variances are homogeneous More powerful Easy to do, easy to interpret Non-parametric tests Does not have assumptions about the data Less powerful Harder to do, harder to interpret If If sample sizes as small as n=6 are used we need to use non-parametric tests
23
Independent groups t test (Student’s t test) It is used to compare the means of two independent samples. The researcher chooses two groups; participants who engage in regular physical activity and who are sedentary. The two groups are compared for their HDL levels
24
Paired groups t test It is used to compare the means of two dependent groups. The researcher chooses sedentary participants and determines their HDL level. Then the participants are asked to engage in regular physical activity and their HDL levels are determined again.
25
ANOVA – Analysis of Variance It is used to compare the means of more than two independent samples. The researcher chooses three groups; participants who engage in vigorous physical activity, moderate physical activity and who are sedentary. The three groups are compared for their HDL levels.
26
ANOVA – post hoc tests Post hoc tests are designed for situations in which the researcher has already obtained a significant F-test Exploration of the differences among means is needed to provide information on which which two groups are different
27
ANOVA – post hoc tests Tukey’s HSD test LSD test Scheffe’s test
28
ANOVA – post hoc tests Bonferroni correction We need to adjust the alpha to account for inflated error when several post hoc tests are conducted. Divide the alpha by number of tests to get the new alpha level.
29
Repeated Measures ANOVA It is used to compare the means of more than two dependent groups. The researcher chooses sedentary participants and determines their HDL level. Then the participants are asked to engage in mild physical activity and their HDL levels are determined again. Lastly the participants are asked to engage in vigorous physical activity and their HDL levels are determined again.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.