A guided tour of research study design and statistics

A guided tour of research study design and statistics
Mustafa Soomro Consultant psychiatrist St James Hospital, Portsmouth

Definition of variable
Variable is a ‘thing’ which we measure and has a variable value.

Types of variables in a study design
Independent variable Can be manipulated in experimental design Causal variable (if confounders controlled) Predictor Antecedent Dependent variable Can not be manipulated in experiments (dependent on the value of independent variable) Effect variable (if confounders controlled) Predicted Subsequent

Measurements used on variables (data)
Quantitative measures Discrete measures (equal interval integer measures): are integers with equal intervals between successive integers; e.g. number of days in hospital, number of children Continuous measures : in which any two intervals could be infinitely divided; these may have true zero (e.g. temperature in Kelvin or weight or height) or may not have a true zero (e.g. temperature in C or F) Ratio variables: continuous variables with true zero; or discrete (interval) variables with true zero (this property allows ratio or coefficient of variation between measures to be calculated)

Measurements (data) Qualitative measures
Ordinal measures – Ordered (or ranked) with several orders with no equal distance between the successive orders e.g. Likert Scale, disease severity mild moderate and severe Nominal or categorical Categories with no order and no equal intervals and are mutually exclusive. Two categories (dichotomous [male, female] or binary [yes, no]) or more (polytomous) e.g. multiple political parties in the UK

Properties of various measures
What calculations and methods would apply Nominal Ordinal Non-ration Discrete / continuous Ratio frequency distribution. Yes median and percentiles. No add or subtract. mean, standard deviation, standard error of the mean. ratio, or coefficient of variation. Whether parametric [PM] or non-parametric [NPM] methods NPM PM

Frequency distribution of data
Continuous and discrete Histogram Normal, Right or positive skewed Left or negative skewed Other distributions Categorical an ordinal Bar chart

Normal distribution Normal distribution showing 1, 2 and 3 SDs 1 SD
68% (on one side 34%) 2 SD 95% (on one side 48%) 3 SD 99% (on one side 49.9%)

Descriptive statistics – measures of central tendency and spread
Mean Variance, SD and SE of mean and CI Median Range and IQR Mode

Inferential statistics
Using statistical tests (parametric and non-parametric) upon sample to test hypothesis Then drawing conclusions about the population from the sample Two types of hypotheses: Null (there is no difference between the groups) Alternative (there is a difference)

Inferential statistics – error in hypothesis testing
Type one error (rejecting null hypothesis incorrectly) Likelihood of this (called alpha) should be equal to less than 0.05 Type two error (accepting null hypothesis incorrectly) Likelihood of this is called beta; 1-beta is called power of the study; and is often set at 0.8

Statistical tests for hypothesis testing give P- value How likely the difference is due to chance i.e. P value equal or less than 0.05 Interpretation of 0.05 is that the probability of finding the difference or greater difference by chance is in 1 in 20 or less Confidence Interval (CI) Range of difference obtained in 95/100 repetitions of the study (95% CI)

In one tailed test: Null hypothesis A=B Alternative hypothesis is chosen as one of these two: A>B or B>A In two tailed test: Alternative hypotheses are two as follows: A>B and B>A

Categorical (binomial)
Data Goal Continuous (normal) Ordinal or Continuous (non-normal) Categorical (binomial) Describe one group Mean, SD Median, interquartile range Proportion Compare one group to a hypothetical value One-sample t test Wilcoxon test Chi-square or Binomial test Compare two unpaired groups Unpaired t test Mann-Whitney test Fisher's test (chi- square test) Compare two paired groups Paired t test McNemar's test Compare three or more unmatched Groups One-way ANOVA Kruskal-Wallis test Chi-square test Compare three or more matched groups Repeated-measures ANOVA Friedman test Cochrane Q Quantify association between two variables Pearson correlation Spearman correlation Contingency coefficients Predict value from another variable Simple regression Nonparametric regression Simple logistic regression Predict value from several other variables Multiple regression Multiple logistic regression

Goal Survival Time Data Describe one group Kaplan Meier survival curve Compare one group to a hypothetical value Compare two unpaired groups Log-rank test or Mantel-Haenszel Compare two paired groups Conditional proportional hazards regression Compare three or more unmatched groups Cox proportional hazard regression Compare three or more matched groups Quantify association between two variables Predict value from another variable Predict value from several other variables

Reliability of a test Reliability is reproducibility …..
(test constructed using standardised criteria will improve reliability)

Reliability of a test – types of reliability
Internal consistency reliability Cronbach’s alpha – average of item-item correlations Split half reliability (not needed if Cronbach’s calculated) Item total correlation Test retest reliability Inter-rater reliability Percentage agreement (affected by chance agreement, through should not be used) Kappa (for categorical data) ICC (for continuous data)

Cohen’s Kappa Iner-rater agreement for categorical measures
K= observed agreement – expected agreement / 1- expected agreement The K value can be interpreted as follows (Altman, 1991): Value of K Strength of agreement < 0.20 Poor Fair Moderate Good Very good

ICC Measure of agreement for continuous data
which takes into account absolute differences in ratings between the raters

Validity of a test Validity is authenticity or truthfulness or
accuracy

Construct validity: relates to consistency of features of a test
Validity of a test Face validity Construct validity: relates to consistency of features of a test Descriptive validity Content validity Divergent / Discriminant validity: investigating correlation with a test consists of different constructs Convergent validity: investigating correlation with a test consists of same constructs Criterion validity Concurrent validity: denotes confirmation by other means eg gold standard test Predictive validity [utility]: relates to prediction of course of the condition by the test

Study design validity and reliability
Internal and external validity of study Internal validity refers to how much it is free from bias External validity refers to how much it is applicable to the population of interest Reliability of study – i.e. precision of its results (narrowness of CI)

Study design types Experimental Observational Randomised
Individual unit randomised Cross over trials Cluster randomised Quasi-randomised Quasi-experimental Observational Case control (retrospective) Cohort (prospective and retrospective) Cross sectional surveys Longitudinal surveys (prospective panel studies)

Errors in studies Non-systematic error – random error
Due to small sample size (remedy: use large enough samples) Due to less reliable measures (remedy: reliable measures)

Errors in studies Systematic error - bias Confounding bias:
Selection bias [e.g. in selecting cases or controls in case control studies] (remedy: use random selection or well defined selection criteria) Allocation bias [e.g. in RCTs (remedy: use random and concealed allocation) Information bias (remedy: use blinding and using reliable and valid measures) Performance bias (remedy: use blinding) Attrition bias (remedy: use intention to treat analysis and do complete follow up)

Understanding magnitude of effect – basic concepts
Continuous data Mean: arithmetic average Categorical data Risk or absolute risk: Probability of an event (ratio of events to total of events and non-events) 10 depressed pts receive AD; 6 respond and 4 do not respond; response rate (risk of response or absolute response): 6/6+4 or 6/10 (60% or 0.6) Odds: Ratio of events to non-events 6/4 = 1.5

Magnitude of effect in RCTs, continuous data
Mean difference (MD) = Mean change in control group – mean change in experimental group Standardised mean difference (SMD) (i.e. effect size) = Mean difference / SD pooled MD and SMD to be reported with confidence intervals (CI)

Magnitude of effect in RCTs, continuous data
SMD of 0.2 means that, mean difference from baseline in one group differs by 0.2 standard deviation from the same of the other group SMD of 1 one group differs by 1 standard deviation from

Normal distribution Normal distribution showing 1, 2 and 3 SDs 1 SD
68% (on one side 34%) 2 SD 95% (on one side 48%) 3 SD 99% (on one side 49.9%)

Magnitude of effect in RCTs, Continuous data
Effect size Standardised mean difference (SMD) % of control group who would be below the average person (mean difference from baseline) in experimental group 0.0 50% 0.2 58% 0.5 68% 0.8 79% 1.0 84% 2.0 98% 3.0 99.9%

Magnitude of effect in RCTs, categorical data
AD Placebo Total Not depressed 40 20 60 Depressed 10 30 50 100 Risk (absolute risk) of depression in control (control event rate [CER]) = 30/50 = .6 Risk (absolute risk) of depression in experimental group (experimental event rate [EER]) = 10/50 = .2 ARR (absolute risk reduction) = CER – EER = = .4 NNT (numbers needed to treat): 1/ARR = 1/.4 = 2.5 (3 after rounding up) Interpretation: On average one needs to treat 3 patients with AD to get one extra patient better than the response rate with placebo.

AD Placebo Total No sedation 20 40 60 Sedation 30 10 50 100 ARI (absolute risk increase): EER – CER 30/50 – 10/50 = 0.4 NNH (number needed to harm): 1/ARI for sedation would be 2.5 (2 after rounding down because with harm we need to error on side of caution) Interpretation: on average one needs to treatment 2 patients with AD to have one extra patient experience sedation compared to sedation rate with placebo NNT and NNH should be reported with CIs

AD Placebo Total Not depressed 40 20 60 Depressed 10 30 50 100 RR (relative risk or risk ratio): EER/CER = .2 /.6 = .333 (RR of depression with AD) Interpretation: risk of depression with AD is 33% that of which is with placebo RRR (relative risk reduction): CER-EER / CER = / .4 Odds and OR (odds ratio) EEO (experimental events odds) = 10/40 = .25 CEO (control events odds) = 30/20 = 1.5 OR = EEO / CEO = .17 Interpretation: odds of depression with AD is 0.17 to that 1.0 with placebo RR and OR should be reported with CI

Magnitude of accuracy in diagnostic studies
Two by two table of gold standard test results and comparison test results Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b Test negative c (false negative) 5 d (true negative) 60 c+d a+c 30 b+d 70 a+b+c+d

Overall test accuracy Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b Test negative c (false negative) 5 d (true negative) 60 c+d a+c 30 b+d 70 a+b+c+d Diagnostic odds ratio = a*d / b*c = 30 Interpretation: odds of getting accurate result with the test to those of getting inaccurate results Overall test accuracy = .65 TP + TN / TP + TN + FN + FP (i.e. whole sample) a+d /a+b+c+d

Estimates of diagnostic test accuracy
Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b Test negative c (false negative) 5 d (true negative) 60 c+d a+c 30 b+d 70 a+b+c+d Proportion with test positive in diseased Sensitivity = a/(a + c) = 25/30 = .83 Proportion with test negative in non-diseased Specificity = d/(b + d) = 60/70 = .86

Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b Test negative c (false negative) 5 d (true negative) 60 c+d a+c 30 b+d 70 a+b+c+d Likelihood Ratio (LR) for positive and negative test Ratio of likelihood of test positive in diseased vs non- diseased LR + = sens/(1 – spec) = .83 /.14 = 6 Ratio of likelihood of test negative in diseased vs non- LR – = (1 – sens)/spec = .17/.86 = .2

Use of LR LR combines sensitivity and specificity.
It measures the power of a test to change the pre-test into the post-test probability of a disease being present. LR for test positive LR for test negative Magnitude of change from pre-test to post-test probability LR more than 10 Less than 0.1 Large change LR 5 to 10 LR 01 to .0.2 Moderate change LR 2 to 5 LR 0.2 to 0.5 Small change LR less that 2 LR more than 0.5 Tiny change LR of 1.0 No change

Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b 35 Test negative c (false negative) 5 d (true negative) 60 c+d 65 a+c 30 b+d 70 a+b+c+d Positive predictive value (PPV) = a/(a + b) = 25/35 = .7 Negative predictive value (NPV) = d/(c + d) = 60/65 = .9 Prevalence (pre-test probability) = (a + c)/(a + b + c + d) = 30/100 = .3

Comparison test Gold standard test Disease present Disease absent Total Test positive a (true positive) 25 b (false positive) 10 a+b Test negative c (false negative) 5 d (true negative) 60 c+d a+c 30 b+d 70 a+b+c+d Pre-test odds of disease = prevalence/(1 – prevalence) = .3/.7 = .43 Post-test odds = pre-test odds × likelihood ratio = .43*6= 2.6 Post-test probability = post-test odds/(post-test odds + 1) 2.6/3.6 = .72 [compare this to.3 pre-test probability i.e. test improves chance of diagnosis]

Nomogram for converting pre-test probability to post-test probability using LR

Receiver Operator Characteristic (ROC) Curve
Plot true positives on vertical with false positives on horizontal for different cut offs e.g. of a depression scale. This helps in deciding which point on the curve would give more acceptable sensitivity and specificity. There is a trade off between Area under curve (AUC) is measure of overall test accuracy (as curve nears the diagonal its accuracy reduces and as it move to upper left corner its accuracy increases)

Magnitude of effect Cohort and Case-control
RR Case control: (All thes should be reported with CIs)

Bradford Hill’s criteria of causality for observational studies:
Temporal association Dose response association Specificity: (this may not be always true when multiple causation or multiple outcomes are involved) Consistency Plausible biological associations High strength of association Absence of reverse causality

End Thank you Questions

A guided tour of research study design and statistics

Similar presentations

Presentation on theme: "A guided tour of research study design and statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A guided tour of research study design and statistics

Similar presentations

Presentation on theme: "A guided tour of research study design and statistics"— Presentation transcript:

Similar presentations

About project

Feedback