Download presentation
Presentation is loading. Please wait.
1
By Hatim Jaber MD MPH JBCM PhD 18-21---6-2017
Faculty of Medicine Epidemiology and Biostatistics ( ) الوبائيات والإحصاء الحيوي Lecture 7-9 Inferential statistics Hypothesis Testing By Hatim Jaber MD MPH JBCM PhD
2
Presentation outline 18-21--6-2017
Time Introduction of concepts 10 : to 10:30 Errors and confidence interval 10 : 30 to 10:40 Hypothesis testing 10 : 40 to 10:50 Selection of Tests of Significance 10 : 50 to 11:00 Main steps of inferential statistics 11 : 00 to 11:25
3
Key Lecture Concepts Assess role of random error (chance) as an influence on the validity of the statistical association Identify role of the p-value in statistical assessments Identify role of the confidence interval in statistical assessments Briefly introduce tests to undertake 3
4
Identify research design
Research Process Research question Hypothesis Identify research design Data collection Presentation of data Data analysis Interpretation of data 4
5
Interpreting Results When evaluating an association between
disease and exposure, we need guidelines to help determine whether there is a true difference in the frequency of disease between the two exposure groups, or perhaps just random variation from the study sample. 5
6
Random Error (Chance) 1. Rarely can we study an entire population, so
inference is attempted from a sample of the population 2. There will always be random variation from sample to sample 3. In general, smaller samples have less precision, reliability, and statistical power (more sampling variability) 6
7
Hypothesis Testing The process of deciding statistically whether the findings of an investigation reflect chance or real effects at a given level of probability. 7
8
Hypothesis Testing An objective method of making decisions or inferences from sample data (evidence) Sample data used to choose between two choices i.e. hypotheses or statements about a population We typically do this by comparing what we have observed to what we expected if one of the statements (Null Hypothesis) was true
9
Elements of Testing hypothesis
Null Hypothesis "الفرضية الصفرية" Alternative hypothesis"الفرضية البحثية" Identify level of significance Test statistic Identify p-value / confidence interval Conclusion 9
10
Hypothesis Testing H0: There is no association between the
exposure and disease of interest H1: There is an association between the Note: With prudent skepticism, the null hypothesis is given the benefit of the doubt until the data convince us otherwise. 10
11
Hypothesis Testing Always two hypotheses:
HA: Research (Alternative) Hypothesis What we aim to gather evidence of Typically that there is a difference/effect/relationship etc. H0: Null Hypothesis What we assume is true to begin with Typically that there is no difference/effect/relationship etc.
12
Hypothesis Testing: Basic Concepts
Alternative hypothesis: is the statement that we believe is true. It is designated by HA, or H1. Example: There is a difference between the 1st Biostat exam mean of the two sections. Statistical hypothesis: is the mathematical form of the alternative hypothesis. Example: H1: s1 s2 Null hypothesis: is the opposite of the alternative hypothesis (Ho), and the one to be tested statistically. Example: H0: s1= s2
13
Hypothesis Testing Because of statistical uncertainty regarding inferences about population parameters based upon sample data, we cannot prove or disprove either the null or alternate hypotheses as directly representing the population effect. Thus, we make a decision based on probability and accept a probability of making an incorrect decision. 13
14
Associations Two types of pitfalls can occur that affect the association between exposure and disease Type 1 error: observing a difference when in truth there is none Type 2 error: failing to observe a difference where there is one. 14
15
Interpreting Epidemiologic Results
Four possible outcomes of any epidemiologic study: REALITY YOUR DECISION H0 True (No assoc.) H1 True (Yes assoc.) Do not reject H0 (not stat. sig.) Correct decision Type II (beta error) Reject H0 (stat. sig.) Type I (alpha error) 15
16
YOUR DECISION H0 True (No assoc.) H1 True (Yes assoc.)
Four possible outcomes of any epidemiologic study: REALITY YOUR DECISION H0 True (No assoc.) H1 True (Yes assoc.) Do not reject H0 (not stat. sig.) Correct decision Failing to find a difference when one exists Reject H0 (stat. sig.) Finding a difference when there is none 16
17
Type I and Type II errors
is the probability of committing type I error. is the probability of committing type II error. 17
18
Alpha has 3 famous values
0.05 0.01 0.1 beta has 2 values 0.1 and 0.2 Confidence values 0.95, 0.99,.090
19
“Conventional” Guidelines:
• Set the fixed alpha level (Type I error) to 0.05 This means, if the null hypothesis is true, the probability of incorrectly rejecting it is 5% or less. DECISION H0 True H1 True Do not reject H0 (not stat. sig.) Reject H0 (stat. sig.) Type I (alpha error) Study Result 19
20
X X Types of Errors Controlled via sample size (=1-Power of test)
Typically restrict to a 5% Risk = level of significance Study reports NO difference (Do not reject H0) IS a difference (Reject H0) H0 is true Difference Does NOT exist in population HA is true Difference DOES exist in population X Type I Error X Type II Error Prob of this = Power of test
21
Empirical Rule For a Normal distribution approximately,
a) 68% of the measurements fall within one standard deviation around the mean b) 95% of the measurements fall within two standard deviations around the mean c) 99.7% of the measurements fall within three standard deviations around the mean 21
22
Normal Distribution usually set at 5%) 34.13% 34.13% 13.59% 13.59%
2.28% 2.28% 50% 50 % usually set at 5%) 22
23
Random Error (Chance) 4. A test statistic to assess “statistical significance” is performed to assess the degree to which the data are compatible with the null hypothesis of no association Given a test statistic and an observed value, you can compute the probability of observing a value as extreme or more extreme than the observed value under the null hypothesis of no association. This probability is called the “p-value” 23
24
Random Error (Chance) By convention, if p < 0.05, then the association between the exposure and disease is considered to be “statistically significant.” (e.g. we reject the null hypothesis (H0) and accept the alternative hypothesis (H1)) 24
25
Random Error (Chance) p-value
the probability that an effect at least as extreme as that observed could have occurred by chance alone, given there is truly no relationship between exposure and disease (Ho) the probability the observed results occurred by chance that the sample estimates of association differ only because of sampling variability. 25
26
Random Error (Chance) What does p < 0.05 mean? Indirectly, it means that we suspect that the magnitude of effect observed (e.g. odds ratio) is not due to chance alone (in the absence of biased data collection or analysis) Directly, p=0.05 means that one test result out of twenty results would be expected to occur due to chance (random error) alone 26
27
Example: IE+ = 15 / (15 + 85) = 0.15 IE- = 10 / (10 + 90) = 0.10
D+ D- E+ 15 85 E- 10 90 IE+ = 15 / ( ) = 0.15 IE- = 10 / ( ) = 0.10 RR = IE+/IE- = 1.5, p = 0.30 Although it appears that the incidence of disease may be higher in the exposed than in the non-exposed (RR=1.5), the p-value of 0.30 exceeds the fixed alpha level of 0.05. This means that the observed data are relatively compatible with the null hypothesis. Thus, we do not reject H0 in favor of H1 (alternative hypothesis). 27
28
Random Error (Chance) Take Note:
The p-value reflects both the magnitude of the difference between the study groups AND the sample size The size of the p-value does not indicate the importance of the results Results may be statistically significant but be clinically unimportant Results that are not statistically significant may still be important 28
29
Sometimes we are more concerned with estimating the true difference than the probability that we are making the decision that the difference between samples is significant 29
30
TRADEOFF! There is a tradeoff between the alpha and beta errors. We cannot simply reduce both types of error. As one goes down, the other rises. As we lower the α error, the β error goes up: reducing the error of rejecting H0 (the error of rejection( increases the error of “accepting” H0 when it is false (the error of acceptance).
31
Random Error (Chance) A related, but more informative, measure known
as the confidence interval (CI) can also be calculated. CI = a range of values within which the true population value falls, with a certain degree of assurance (probability). 31
32
Protection against Random Error
Test statistics provide protection from type 1 error due to random chance Test statistics do not guarantee protection against type 1 errors due to bias or confounding. Statistics demonstrate association, but not causation. 32
33
Confidence Interval - Definition
A range of values for a variable constructed so that this range has a specified probability of including the true value of the variable A measure of the study’s precision Point estimate Lower limit Upper limit 33
34
Statistical Measures of Chance
Confidence interval 95% C.I. means that true estimate of effect (mean, risk, rate) lies within 2 standard errors of the population mean 95 times out of 100 34
35
Interpreting Results Confidence Interval: Range of values for a point estimate that has a specified probability of including the true value of the parameter. Confidence Level: (1.0 – ), usually expressed as a percentage (e.g. 95%). Confidence Limits: The upper and lower end points of the confidence interval. 35
36
Hypothetical Example of 95% Confidence Interval
Exposure: Caffeine intake (high versus low) Outcome: Incidence of breast cancer Risk Ratio: (point estimate) p-value: (not statistically significant) 95% C.I.: 95% confidence interval _____________________________________________________ (null value) 36
37
Random Error (Chance) INTERPRETATION:
Our best estimate is that women with high caffeine intake are 1.32 times (or 32%) more likely to develop breast cancer compared to women with low caffeine intake. However, we are 95% confident that the true value (risk) of the population lies between 0.87 and 1.98 (assuming an unbiased study). _____________________________________________ (null value) 95% confidence interval 37
38
Confidence Intervals A range of values within which we are confident (in terms of probability) that the true value of a pop parameter lies A 95% CI is interpreted as 95% of the time the CI would contain the true value of the pop parameter i.e. 5% of the time the CI would fail to contain the true value of the pop parameter
39
Random Error (Chance) Interpretation:
If the 95% confidence interval does NOT include the null value of 1.0 (p < 0.05), then we declare a “statistically significant” association. If the 95% confidence interval includes the null value of 1.0, then the test result is “not statistically significant.” 39
40
Interpreting Results Interpretation of C.I. For OR and RR:
The C.I. provides an idea of the likely magnitude of the effect and the random variability of the point estimate. On the other hand, the p-value reveals nothing about the magnitude of the effect or the random variability of the point estimate. In general, smaller sample sizes have larger C.I.’s due to uncertainty (lack of precision) in the point estimate. 40
41
Selection of Tests of Significance
41
42
Data types Categorical Scale Variables Continuo us Measure ments
takes any value Discrete: Counts/ integers Categorical Ordinal: obvious order Nominal: no meaningful order
43
Choosing summary statistics
Which average and measure of spread? Scale Normally distributed Mean (Standard deviation) Skewed data Median (Interquartile range) Categorical Ordinal: Median (Interquartile range) Nominal: Mode (None)
44
Scale of Data 1. Nominal: Data do not represent an amount or quantity (e.g., Marital Status, Sex) 2. Ordinal: Data represent an ordered series of relationship (e.g., level of education) 3. Interval: Data are measured on an interval scale having equal units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit) 4. Interval Ratio: Variable such as weight for which we can compare meaningfully one weight versus another (say, 100 Kg is twice 50 Kg) 44
45
Steps to undertaking a Hypothesis test
Set null and alternative hypothesis Make a decision and interpret your conclusions Define study question Calculate a test statistic Calculate a p-value Choose a suitable test
46
Inferential Statistics
Testing for Differences
47
Inferential Statistics
Assess the strength of evidence for/against a hypothesis; evaluate the data Inferential statistical methods provide a confirmatory data analysis Generalize conclusions from data from part of a group (sample) to the whole group (population) Assess the strength of the evidence Make comparisons Make predictions Ask more questions; suggest future research
48
Inferential Statistics
Inferential Statistics: It reports the degree of confidence or accuracy of the sample statistic that predicts the value of the population parameter. We do this by using procedures (either parametric or non-parametric) to reach a conclusion (or to generalize) about that population based on the information derived from the sample that has been drawn from that population, or to test a hypothesis about the relationship between variables in the population. A relationship is a bond or association between variables. For us to be able to generalize we must pick a random sample and every single subject in the population has to be selected.
49
Which Test to Use? Scale of Data Nominal Chi-square test Ordinal
Mann-Whitney U test Interval (continuous) - 2 groups T-test - 3 or more groups ANOVA 49
50
Types of inferential statistics
1. Bivariate Parametric Tests 2. Nonparametric statistical tests: Nominal Data (1 group or more) 3. Nonparametric statistical tests: Ordinal Data (2 groups or more) For continuous data Tests for correlations between variables
51
Types of inferential statistics
1. Bivariate Parametric Tests: (a) One Sample t test (t) (b) Two Sample t test (t) (c) Analysis of Variance/ANOVA (F) (d) Pearson’s Product Moment Correlations(r) >> we use it if the two variables are continuous (intervals and ratios).
52
Types of inferential statistics
2. Nonparametric statistical tests: Nominal Data (1 group or more): (a) Chi-Square Goodness-of-Fit Test (b) Chi-Square Test of Independence We choose between (a) or (b) according to the groups; being dependent or independent: If it's 1 group we use pre and post test. If there are 2 or more groups we have 2 options: -for dependent groups >> (a) -for independent groups >> (b)
53
Types of inferential statistics
3. Nonparametric statistical tests: Ordinal Data (2 groups or more): (a) Mann Whitney U Test (U) >> for 2 groups (b) Kruskal Wallis Test (H) >> for more than 2 groups
54
Types of inferential statistics
→For continuous data: a) For 1 group: Z-score T-test 2 means (pre & post) >> dependent t-test b) For 2 groups 2 means >> independent t-test c) 2 or more groups: One way ANOVA →Tests for correlations between variables: Both variables are continuous >> Pearson’s correlations (as explained earlier) Both variables are ordinal data >> Spearman test If the dependent variable was continuous and the independent was categorical/nominal >> t-test or z-test If the dependent variable was continuous and the independent was ordinal >> point biserial correlation
55
Inferential Statistics
Inferential statistics are used to draw conclusions about a population by examining the sample POPULATION Sample
56
Inferential Statistics
Researchers set the significance level for each statistical test they conduct by using probability theory as a basis for their tests, researchers can assess how likely it is that the difference they find is real and not due to chance
57
Degrees of Freedom Degrees of freedom (df) are the way in which the scientific tradition accounts for variation due to error it specifies how many values vary within a statistical test scientists recognize that collecting data can never be error-free each piece of data collected can vary, or carry error that we cannot account for by including df in statistical computations, scientists help account for this error there are clear rules for how to calculate df for each statistical test
58
Steps in the hypothesis testing procedure
Review Assumptions State Hypothesis Select Test Statistics Make Statistical decision Decision Rule Calculate test statistics Do not reject Ho Reject Ho Conclude Ho may be true Conclude H1 is true
59
Inferential Statistics: 5 Steps
To determine if SAMPLE means come from same population, use 5 steps with inferential statistics 1. State Hypothesis Ho: no difference between 2 means; any difference found is due to sampling error any significant difference found is not a TRUE difference, but CHANCE due to sampling error results stated in terms of probability that Ho is false findings are stronger if can reject Ho therefore, need to specify Ho and H1
60
Steps in Inferential Statistics …
2. Level of Significance Probability that sample means are different enough to reject Ho (.05 or .01) level of probability or level of confidence
61
Steps in Inferential Statistics…
3. Computing Calculated Value Use statistical test to derive some calculated value (e.g., t value or F value) 4. Obtain Critical Value a criterion used based on df and alpha level (.05 or .01) is compared to the calculated value to determine if findings are significant and therefore reject Ho
62
Steps in Inferential Statistics ….
5. Reject or Fail to Reject Ho CALCULATED value is compared to the CRITICAL value to determine if the difference is significant enough to reject Ho at the predetermined level of significance If CRITICAL value > CALCULATED value --> fail to reject Ho If CRITICAL value < CALCULATED value --> reject Ho If reject Ho, only supports H1; it does not prove H1
63
6. Make a decision: Reject H0 or don’t reject H0. If the calculated value turns out to be less than the cut-off (α) value that we determined we reject the null hypothesis. If it's more than the cut-off value we cannot reject the null hypothesis.
64
Testing Hypothesis If reject Ho and conclude groups are really different, it doesn’t mean they’re different for the reason you hypothesized may be other reason Since Ho testing is based on sample means, not population means, there is a possibility of making an error or wrong decision in rejecting or failing to reject Ho Type I error Type II error
65
Identifying the Appropriate Statistical Test of Difference
One-way chi-square One variable Two variables (1 IV with 2 levels; 1 DV) t-test Two variables (1 IV with 2+ levels; 1 DV) ANOVA Three or more variables ANOVA
66
DEPENDENT (outcome) variable
Dependent variables Does attendance have an association with exam score? Do women do more housework than men? DEPENDENT (outcome) variable INDEPENDENT (explanatory/ predictor) variable affects
67
Comparing means 2 3+ 2 3+ Independent t-test Comparing BETWEEN groups
Paired t-test Comparing measurements WITHIN the same subject 2 3+
68
Comparing means ANOVA = Analysis of variance 2 3+ 2 3+
Independent t-test One way ANOVA Comparing BETWEEN groups 3+ Comparing means Paired t-test Comparing measurements WITHIN the same subject 2 Repeated measures ANOVA 3+ ANOVA = Analysis of variance
69
Summarising means Calculate summary statistics by group
Look for outliers/ errors Use a box-plot or confidence interval plot
70
Using SPSS Analyse Descriptive Statistics Crosstabs Click on ‘Statistics’ button & select Chi-squared Test Statistic = p- value p < 0.001 Note: Double clicking on the output will display the p-value to more decimal places
71
T-tests Paired or Independent (Unpaired) Data?
T-tests are used to compare two population means Paired data: same individuals studied at two different times or under two conditions PAIRED T-TEST Independent: data collected from two separate groups INDEPENDENT SAMPLES T-TEST There is another situation when paired data arises and that is when subjects have been individually matched through experimental design. Such as in a matched case control study. For example, twin studies, an eye treatment where one eye could be considered as control. Two separate groups (independent group) require different analysis
72
Example 1: t-Test Results
t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Triglyceride level at week 8 (mg/dl) - Triglyceride level at baseline (mg/dl) 80.360 13.583 16.233 -.837 34 .408 Null Hypothesis is: P-value = Decision (circle correct answer): Reject Null/ Do not reject Null Conclusion:
73
95% Confidence Interval of the Difference
Example 1: Solution t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Triglyceride level at week 8 (mg/dl) - Triglyceride level at baseline (mg/dl) 80.360 13.583 16.233 -.837 34 .408 As p > 0.05, do NOT reject the null NO evidence of a difference in the mean triglyceride before and after treatment P(t>0.837) =0.204 P(t< ) 0.837 -0.837
74
Example 2: Weight Loss Weight loss was measured after taking either a new weight loss treatment or placebo for 8 weeks Treatment group N Mean Std. Deviation Placebo 19 -1.36 2.148 New drug 18 -5.01 2.722
75
Example 2: t-Test Results
Ignore the shaded part of the output for now! Levene's Test for Equality of Variances T-test results 95% CI of the Difference F Sig. t df (2-tailed) Mean Difference Std. Error Difference Lower Upper Equal variances assumed 2.328 .136 4.539 35 .000 3.648 .804 2.016 5.280 Equal variances not assumed 4.510 32.342 .809 2.001 5.295 Null Hypothesis is: P-value = Decision (circle correct answer): Reject Null/ Do not reject Null Conclusion:
76
Two categorical variables
Are boys more likely to prefer maths and science than girls? Variables: Favourite subject (Nominal) Gender (Binary/ Nominal) Summarise using %’s/ stacked or multiple bar charts Test: Chi-squared Tests for a relationship between two categorical variables
77
Scatterplot Relationship between two scale variables:
Explores the way the two co-vary: (correlate) Positive / negative Linear / non-linear Strong / weak Presence of outliers Statistic used: r = correlation coefficient Outlier Linear
78
Correlation Coefficient r
Measures strength of a relationship between two continuous variables -1 ≤r ≤1 r = 0.9 r = 0.01 r = -0.9
79
Correlation Interpretation
An interpretation of the size of the coefficient has been described by Cohen (1992) as: Cohen, L. (1992). Power Primer. Psychological Bulletin, 112(1) Correlation coefficient value Relationship -0.3 to +0.3 Weak -0.5 to -0.3 or 0.3 to 0.5 Moderate -0.9 to -0.5 or 0.5 to 0.9 Strong -1.0 to -0.9 or 0.9 to 1.0 Very strong
80
Exercise: Gestational age and birth weight
Describe the relationship between the gestational age of a baby and their weight at birth. r = 0.706 Draw a line of best fit through the data (with roughly half the points above and half below)
81
Output from SPSS Key regression table:
As p < 0.05, gestational age is a significant predictor of birth weight. Weight increases by 0.36 lbs for each week of gestation. Y = x P – value < 0.001
88
YouTube.html Choosing which statistical test to use - statistics help - YouTube.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.