ARA0103 Aðferðafræði Rannsókna

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Inferential Statistics & Hypothesis Testing
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
16/07/2015Dr Andy Brooks1 TFV0103 Tölfræði og fræðileg vinnubrögð Fyrirlestur 12 Kafli 9.1 Inference about the mean μ (σ unknown) Ályktun um meðaltalið.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
© Copyright McGraw-Hill 2004
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
T tests comparing two means t tests comparing two means.
Chapter 13 Understanding research results: statistical inference.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter Nine Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Power and Effect Size.
Two-Sample Hypothesis Testing
Lecture Slides Elementary Statistics Twelfth Edition
Statistics for the Social Sciences
Inference and Tests of Hypotheses
Keller: Stats for Mgmt & Econ, 7th Ed Hypothesis Testing
HYPOTHESIS TESTING Asst Prof Dr. Ahmed Sameer Alnuaimi.
Chapter 8 Hypothesis Testing with Two Samples.
Central Limit Theorem, z-tests, & t-tests
Hypothesis Testing: Hypotheses
ARA0103 Aðferðafræði Rannsókna
ARA0103 Aðferðafræði Rannsókna
Comparing Two Proportions
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Kin 304 Inferential Statistics
Hypothesis Tests for a Population Mean in Practice
Chapter 9 Hypothesis Testing.
Unit 6: Comparing Two Populations or Groups
Decision Errors and Power
Lesson Comparing Two Means.
Discrete Event Simulation - 4
Comparing Two Proportions
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
CHAPTER 10 Comparing Two Populations or Groups
Reasoning in Psychology Using Statistics
Significance Tests in practice
Hypothesis Testing Kenningapróf
Psych 231: Research Methods in Psychology
Lecture 10/24/ Tests of Significance
Psych 231: Research Methods in Psychology
CHAPTER 9 Testing a Claim
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
CHAPTER 10 Comparing Two Populations or Groups
What are their purposes? What kinds?
Inferential Statistics
Intro to Confidence Intervals Introduction to Inference
Hypothesis Testing and Confidence Intervals
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Click the mouse button or press the Space Bar to display the answers.
Psych 231: Research Methods in Psychology
CHAPTER 10 Comparing Two Populations or Groups
Psych 231: Research Methods in Psychology
Carrying Out Significance Tests
Section 10.2 Comparing Two Means.
CHAPTER 10 Comparing Two Populations or Groups
Reasoning in Psychology Using Statistics
Type I and Type II Errors
Presentation transcript:

ARA0103 Aðferðafræði Rannsókna mean/meðaltal standard deviation/staðalfrávik sample size/úrtaksstærð standard error/staðalvilla effect size/stærð áhrifa power/styrkur meta-analysis/eftirgreining replication/endurtekning p-gildi er < 0,01 Er áhrif klínískt marktækt? ARA0103 Aðferðafræði Rannsókna Fyrirlestrar 16 og 17 Power, Effect Sizes, Meta-Analysis, and Replication staðalvilla 18/09/2018 Dr Andy Brooks

Diastolic Blood Pressure Data blood pressure/blóðþrýstingur measurement error/mælingarvilla Diastolic Blood Pressure Data Hver er mælingarvilla? ±1, ±0,01? 18/09/2018 Dr Andy Brooks

Null hypothesis Núlltilgáta Hópur 1 gamalt sjúkrahús Hópur 2 nýtt sjúkrahús It is suspected that staff working in an old hospital have different blood pressures to staff working in a new hospital. Gæti verið meiri streita í gömlu sjúkrahúsi. The null hypothesis is that, on average, there is no difference between blood pressures. Núlltilgátan, varðandi blóðþrýsting, að meðaltali, munur er ekki til. We start by taking a random sample/slembiúrtak of 10 from the old hospital (Hópur 1) and 10 from the new hospital (Hópur 2). 18/09/2018 Dr Andy Brooks

t-próf Excel Group 1 Group 2 t-Test: Two-Sample Assuming Unequal Variances 76 70   Variable 1 Variable 2 74 82 Mean 76,6 73 90 Variance 62,2667 86,4444 80 68 Observations 10 60 Hypothesized Mean Difference 62 df 18 t Stat 0,9335 P(T<=t) one-tail 0,1814 72 t Critical one-tail 1,7341 P(T<=t) two-tail 0,3629 t Critical two-tail 2,1009 Excel 18/09/2018 Dr Andy Brooks

Niðurstaða (n=10) Meðalmunur er 3,6 en: fjöldi í hverju hólfi sniðsins Meðalmunur er 3,6 en: p-gildi er 0,36 og miklu stærri en 0,05 Ekki hægt að hafna núlltilgátunni Segjum “núlltilgátan er rétt” The standard deviations are large compared to the average difference of 3,6. Group 1 staðalfrávik = 7,9 Group 2 staðalfrávik = 9,3 The standard errors of the means are only slightly less than the average difference of 3,6. Group 1 staðalvilla = 7,9/√10 = 2,5 Group 2 staðalvilla = 9,3/√10 = 2,9 “segjum 2,7” 18/09/2018 Dr Andy Brooks

Graph showing standard error bars standard error bar/staðalvillusúla Graph showing standard error bars skörun er mikil Standard error bars are approximate (± 2,7). (Standard error not standard deviation bars are shown.) 18/09/2018 Dr Andy Brooks

Assumption σ known. Forsenda σ þekkt. 95% öryggisbil (alfastig=0,05) fyrir μ, þýðismeðaltal 99% öryggisbil (alfastig=0,01) fyrir μ, þýðismeðaltal 18/09/2018 Dr Andy Brooks

Confidence Interval (CI)/Öryggisbil x-bar – úrtaksmeðaltal s - úrtaksstaðalfrávik Confidence Interval (CI)/Öryggisbil The confidence interval for the population mean: Öryggisbil fyrir μ, þýðismeðaltal er: The critical values of t can be read from tables in statistical books or calculated using statistical software (t.d. TINV in Excel).  - óþekkt n - úrtaksstærð degrees of freedom/frígráður 18/09/2018 Dr Andy Brooks

95% Öryggisbil n = 10 df (frígráður) = n-1 = 9 5% in the tails 2,5% left tail, 2,5% right tail From a table of the t-distribution, the multiplier is 2,26. 2,26 * 2,7 (staðalvilla) ≈ 6,1 18/09/2018 Dr Andy Brooks

BOOKTABLE6 one tail 0,05 0,025 0,01 0,005 two tail 0,1 0,02 df 3 2,35 3,18 4,54 5,84 4 2,13 2,78 3,75 4,60 5 2,02 2,57 3,36 4,03 6 1,94 2,45 3,14 3,71 7 1,89 2,36 3,00 3,50 8 1,86 2,31 2,90 9 1,83 2,26 2,82 3,25 10 1,81 2,23 2,76 3,17 11 1,80 2,20 2,72 3,11 12 1,78 2,18 2,68 3,05 13 1,77 2,16 2,65 3,01 14 1,76 2,14 2,62 2,98 15 1,75 2,60 2,95 16 2,12 2,58 2,92 17 1,74 2,11 18 1,73 2,10 2,55 2,88 Critical Values of Student´s t-Distribution BOOKTABLE6 18/09/2018

Graph showing 95% confidence intervals skörun er mikil In research papers, sometimes it is not clear if standard error bars or standard deviation bars or 95% confidence intervals are being shown! 18/09/2018 Dr Andy Brooks

Possible error in conclusion possible error/hugsanleg villa real difference/raunverulegur munur Possible error in conclusion If there is a real difference, on average, of 3,6 in diastolic blood pressures,then it is an error to accept the null hypothesis that there is no difference. Ef 3,6 að meðaltali sé raunverulegt, svo það er mistök að segja núlltilgátan er rétt. If there is a real difference, on average, of 3,6 in diastolic blood pressures, then our samples (n=10) were not big enough. úrtaksstærð ekki nógu stórt the standard errors of the means are too big our statistical test did not have enough power to detect a difference in means as small as 3,6 18/09/2018 Dr Andy Brooks

Type I and II errors/Mistök af tegund I og II brief introduction/stutt kynning A Type I error is rejecting a correct null hypothesis. Mistök af tegund I er að hafna réttri núlltilgátu. The probability of a Type I error occurring is the alpha level (often 0,05 or 0,01). Líkurnar á að Mistök af tegund I gerast er alfastigið (oft 0,05 eða 0,01). A Type II error is not rejecting a wrong null hypothesis. Mistök af tegund II er að hafna ekki rangri núlltilgátu. The probability of a Type II error is β. Líkurnar á að Mistök af tegund II gerast er β. Is the sample big enough ? Er úrtakið nógu stórt ? The power of a statistical test is 1- β. Styrkurinn tölfræðiprófs er 1- β. We want power to be at least 0,8. Okkar vantar styrkur sé að minnsta kosti 0,8. n allt tengt 18/09/2018 Dr Andy Brooks

Warning/Viðvörun If α = 0,05 If α = 0,01 If α = 0,001 There is a 1:20 chance you have committed a Type I error. If α = 0,01 There is a 1:100 chance you have committed a Type I error. If α = 0,001 There is a 1:1000 chance you have committed a Type I error. If your sample size is small: Statistical power may be very low. And you may easily commit a Type II error. β can be calculated for a test knowing the size of the effect/stærð áhrifa you are looking for, sample size/úrtaksstærð, and α level/alfastig. 18/09/2018 Dr Andy Brooks

fjöldi í hverju hólfi sniðsins Dæmi afköst fjöldi í hverju hólfi sniðsins Ef n = 10, líkur á því að finna mun (sem er til) er ≈0,3. Ef n = 20, líkur á því að finna mun (sem er til) er ≈0,6. 18/09/2018 Dr Andy Brooks

t-próf (n=50) We measure another 40 workers at each hospital... fjöldi í hverju hólfi sniðsins t-Test: Two-Sample Assuming Unequal Variances   Variable 1 Variable 2 Mean 76,6 73 Variance 57,1837 79,3878 Observations 50 Hypothesized Mean Difference df 95 t Stat 2,1782 P(T<=t) one-tail 0,0159 t Critical one-tail 1,6611 P(T<=t) two-tail 0,0319 t Critical two-tail 1,9853 Excel 18/09/2018 Dr Andy Brooks

Graph showing standard error bars (n=50) standard error bar/staðalvillusúla Graph showing standard error bars (n=50) Standard error bars are approximate (± 1,1). 18/09/2018 Dr Andy Brooks

Graph showing 95% confidence intervals (n=50) 2,02 * 1,1 (staðalvilla) = 2,22 18/09/2018 Dr Andy Brooks

point estimate/punktspá effect size/stærð áhrifa fjöldi í hverju hólfi sniðsins Niðurstaða (n=50) Við höfnum núlltilgátunni. An increased sample size has given us the power to detect a difference. Núlltilgátan er röng, hin tilgátan er rétt. Tölfræðileg marktekt p = 0,03 (< 0,05) The point estimate for the effect size is 3,6. En er áhrif klínískt marktækt? Nei ? Standard deviations are large at both hospitals. Maybe we should be seeking explanations/útskýringar for these large standard deviations. Hvaða fólk er að reykja? Hvaða folk er með yfirvinnu? Hvaða fólk er með næturvakt? Maybe we should test to see if the standard deviations are statistically different? 18/09/2018 Dr Andy Brooks

Effect Sizes two sample case effect size/stærð áhrifa Effect Sizes two sample case The size of the effect is usually normalised with respect to the standard deviation. The effect size, assuming a common variance, is given by: Cohen proposed: 0,2 is small effect size 0,5 is a medium effect size 0,8 is a large effect size 18/09/2018 Dr Andy Brooks

Estimate of effect size diastolic blood pressure experiment estimate/spágildi Estimate of effect size diastolic blood pressure experiment Point estimate/punktspá of difference between means = 3,6 Estimate of variance = 70 For simplification, we assume a single common variance in the diastolic blood pressure experiment. Estimate of standard deviation = 8,3666 sqrt(70) Estimate of effect size = 0,4303 a small to medium effect 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth http://www.stat.uiowa.edu/~rlenth/Power/ Java applets for power and sample size by Russ Lenth power = 0,04 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth 18/09/2018 Dr Andy Brooks

unacceptable/óaðgengilegur α and β As the α level gets more strict (0,05 -> 0,01), then you have less power β. There is less chance of a Type I error. But more chance of a Type II error (1-β). As the α level gets less strict (0,01 -> 0,05), then you have more power β. There is more chance of a Type I error. But less chance of a Type II error (1-β). Some researchers use an α level of 0,10, but this means a 1:10 chance of making a Type I error. Many researchers find an alpha level of 0,10 to be unacceptable. 18/09/2018 Dr Andy Brooks

More power A power of 0,5 means there is a 50% chance your experiment will fail to detect a difference that is real. If an experiment costs $10 million to run, you want a power of 0,99 and not 0,5. There may be no way of estimating power until you have performed the experiment. Previous results by other researchers can sometimes be used to estimate the effect size. 18/09/2018 Dr Andy Brooks

Power calculations/Styrksútreikningar Power calculations get more complicated with more complicated experimental designs. Power calculations get more complicated when group sample sizes and/or group variances are unequal. Professional software exists to support calculations of power for many types of statistical tests. www.power-analysis.com Power calculations are impossible unless you have an estimate of the effect size. In research papers, a power analysis is often not reported because a power analysis was never done. It is becoming more common to insist a power analysis is done before a research paper is accepted for publication. 18/09/2018 Dr Andy Brooks

Missing effect size ? In the absence of previous results, group sample sizes should be at least 10. Have at least 20 participants if you plan to randomize patients into two groups of 10 and use an independent two-sample t-test. Try if possible to have large numbers in each group (20, 30, 40, or 50...). The more the better. 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth 18/09/2018 Dr Andy Brooks

Java applets for power and sample size by Russ Lenth 18/09/2018 Dr Andy Brooks

descriptive statistics/lýsandi tölfræði outlier/einfari Failure to reject the null hypothesis Ekki hægt að hafna núlltilgátunni? If you cannot reject the null hypothesis, use descriptive statistics (average, standard deviation, standard error, minimum, maximum), histograms, boxplots and line graphs to present, compare, and interpret the data. What happens if you use an α of 0,10 ? This may allow you to interpret the experimental results statistically, but you need to emphasis the need to repeat the experiment with bigger samples. Try and estimate the power of the experiment retrospectively. This can help future researchers. Find explanations of any outliers. Sometimes this is where the real results of an experiment are. 18/09/2018 Dr Andy Brooks

meta-analysis/eftirgreining Tilraun 1 Tilraun 2 Meta-analysis Tilraun 3 Tilraun 4 Tilraun 5 Meta-analysis involves examining the results of experiments with the same null hypothesis. A meta-analysis can simply involve counting the number of research papers that conclude the effect was present against the number of papers that conclude there was no effect. Counts are based on the best quality experiments (t.d. Randomized Control Trial/Hrein Tilraun). Simple counting of research papers is viewed by many researchers as insufficient. The data has to be combined statistically. 18/09/2018 Dr Andy Brooks

Meta-analysis Another form of meta-analysis involves pooling together raw data from several experiments. að samlaga óunnin gögn úr nokkrir tilraunum This pooling together data effectively increases group sample sizes and so increases the power of any statistical tests applied. If we have data for 5 experiments where group sample sizes were 10, in the meta-analysis, group sample sizes become 50. fjöldi í hverju hólfi sniðsins 18/09/2018 Dr Andy Brooks

Special software/sérstakur hugbúnaður Meta-analysis Another form of meta-analysis involves pooling together effect size estimates from several experiments. Að samlaga áhrifastærðir Special software exists to support meta-analytic procedures. t.d. RevMan from the Cochrane Collaboration. Dæmisaga, Fyrirlestur 10 18/09/2018 Dr Andy Brooks

Fig 3 Relative risk for mortality Dæmisaga, Fyrirlestur 10 Fig 3 Relative risk for mortality forest plot (c) BMJ 18/09/2018 Dr Andy Brooks

Replication/Endurtekning other explanations/aðrar útskýringar dispute/rökræða Replication/Endurtekning The results from a RCT may be wrong. A cause-and-effect relationship does not exist. 0rsakatengls er ekki til Gæti verið að aðrar útskýringar eru til People only start believing the result when the RCT is successfuly replicated by other research teams. The results from several RCTs can be combined in a meta-analysis. Even the results of a meta-analysis can be disputed... 18/09/2018 Dr Andy Brooks

Replication/Endurtekning validity/réttmæti improve/bæta við Replication/Endurtekning Often, when you decide to replicate an experiment, you also improve the experiment: Measure O1 to check the groups are equal. Breyta spurningalista Bæta við spurningum, taka burt spurningar, breyta orðalag,... Use a different questionnaire, one which has been validated. But if you make too many improvements, it is a different experiment. Svo er ekki hægt að samlaga óunnin gögn, osf. 18/09/2018 Dr Andy Brooks