Going from data to analysis Dr. Nancy Mayo
Getting it right Research is about getting the right answer, not just an answer An answer is easy The right answer is hard to find
© Nancy E. Mayo Types of Questions About hypotheses Is treatment A better than treatment B? Answer: Yes or No About parameters What is the extent to which treatment A improves outcome in comparison to treatment B? Answer: A number / value (parameter)
Research is about relationships Links one variable or factor to another One is thought or supposed (hypothesized) to be the “cause” of the second variable
What’s in a name? DisciplineCauseEffect EpidemiologyExposureOutcome Medical/clinicalRisk factorDisease PsychologyIndependentDependent StatisticalStimulusResponse MathematicalXy
Why do I need statistics? Reduce data Define relationships Make inferences from your sample to the population
X, exposure, independent variable Y, outcome, dependent variable Linear None
X, exposure, independent variable Y, outcome, dependent variable Linear None
X, exposure, independent variable Y, outcome, dependent variable Linear None Only linear relationships can be examined by correlation
©Nancy E. Mayo 2004 Population Target Available Inference from Sample to Population Sample Need stats
What kind of statistics do I need?
Depends on your DATA MeasuredCounted
Only 2 kinds of data Measured = Continuous –can take on any value the precision of which depends upon the calibration of your measurement device –Distribution is expected to be normal Counted = Categorical (values are fixed) –Binary (dichotomous) Polychotomous –Ordinal ranked (need for assistance) ranked (need for assistance) interval (categories are equally spaced: falls) interval (categories are equally spaced: falls) ratio (there is a natural 0 ) ratio (there is a natural 0 ) –Nominal – named values, no order (diagnosis)
Your Job When reading an article (later doing your own research) IDENTIFY THESE VARIABLES IDENTIFY WHAT SCALE THEY ARE MEASURED ON MATCH DATA TO ANALYSIS
Quantitative Research The answer to the question is found in the tables
What tables should I find in an article Table 1 – basic characteristics sample Table 2 – outcomes / exposures Table 3 - answer the main question –Relationship between exposure and outcome Table 4 – interesting subgroup
What tables should I find in an article Table 1 – characteristics of the sample on features relating to target and available population Table 2 – distribution of the sample on exposure and outcome variables Table 3 - relationship between the exposure and outcome Table 4 – interesting sub-groups
What kind of statistics should I find in these Tables?
What kind of statistics are there? Depends on your DATA Depends on your QUESTION
Data UsesContinuousCategorical Reduce Data (Descriptive) Means (SD) medians (percentiles, range) Proportions Define relationshipsScatter plotHistogram Linear (Pearson correlation) Correlation (Spearman ranked ) Relative risk Make inferences (Simple univariate (bivariate) t-test independent paired t-test Chi-square test McNemar’s test MultivariateANOVA multiple linear regression Logistic regression
Standard Normal Distribution Showing the proportion of the population that lies within 1, 2 and 3 SD (Wikipedia)
Questions HYPOTHESISPARAMETER QuestionQuestions is answered by YES or NO Question demands a numeric response Test or parameterValue of the test has no meaning (t-test, F test) Difference between two means, rate or a risk SignificanceP –value (probability that what you observed occurred by chance alone) 95% confidence intervals (with studies of this nature, 95% of the time the mean will lie within this interval)
UsesContinuousCategorical Reduce Data (Descriptive) Means (SD) medians (percentiles, range) Proportions Lets look at Table 1
Data UsesContinuousCategorical Define relationshipsScatter plotHistogram Linear (Pearson correlation) Correlation (Spearman ranked ) Relative risk Go to internet: scatter plot Got to internet: histogram
Probability Degree of likelihood that something will happen. Statistical probabilities are expressed as as decimals 0.5, 0.25, 0.75 between 0 and 1. For example, a probability of 0 means that something can never happen; a probability of 1 means that something will always happen. The probability of an event is calculated as follows: –n favourable outcomes / n of all possible outcomes The probability of getting heads in one toss is: p(heads) = 1/(1 + 1) = 1⁄2.
Statistical probability Probability that what you observed could have occurred by chance Wish that to be a very small number By convention: p < 0.05 is considered very unlikely to have occurred by chance Means that in studies like this, an observation this extreme or more extreme would occur by chance alone only in 5 of 100 studies
Remember: one study is only a sample Likely to occurred by chance; unlikely to be because of anything that was done in the study Unlikely to have occurred by chance, the assumption is that it occurred because of something done in the study
When you start a study, there are risks Probability that you are one of the yellow studies You conclude that there was an effect when there was not You conclude that there was an effect when there was not Type I or alpha error By convention, we set this risk at 5 chances out of 100 or p=0.05 Any finding that has a p value associated with it of <0.05 is considered statistically significant (unlikely to have occurred by chance alone)
Correlation >0.8 strong 0.5 to 0.8 moderate <0.5 weak
Correlation What proportion of outcome is explained by the exposure? ANSWER: r 2 r = 0.5 (moderate) r 2 = 0.25 (not much) r = 0.9 (strong) r 2 = 0.81 (still a lot) r = 0.3 (weak) r 2 = 0.09 (almost nothing)
Measuring Effects Effect Post-onlyGroups similar at baseline so effect of I will be observed at t=post. Assumes pre value unimportant; event dara (eg. Falls) Change pre to post Assumes pre value unimportant; reduces variability as a change value can occur in different ways; analyses based on explaining variability Change pre to follow up Often addresses maintenance of effects GrowthLongitudinal change; good for interventions over long term or with multiple measurements (4 or more ideal); pre-value is considered c Nancy E. Mayo (Nov 2005)
RCT’s are Longitudinal Designs Analyses of post only or change are cross- sectional Time may be important Effect of intervention may depend on time c Nancy E. Mayo (Nov 2005)
Estimating Effects Time: pre / post Time effect = impact of time averaged over group Group: Intervention Control At baseline, groups are equal Group effect= effect of group averaged over time, as baseline is equal, group effect can only be due to post-score Group * Time: does the effect of group depend on time
c Nancy E. Mayo (Nov 2005) Main Effect of Group Time Effect X X X X } Group effect (averaged over time)
c Nancy E. Mayo (Nov 2005) Main Effect of Time Time Effect X X X X Time effect (averaged over group) a a a
c Nancy E. Mayo (Nov 2005) Group*Time Effect Time Effect X X X X The effect of group depended on the time: same at baseline but increasingly different over time } } }
95% CI Mean ± 1.96 X SE SE = SD / sqrt N (number of subjects) 1.96 is the area under the curve of a standard normal (mean of 0 and sd 1) distribution that is outside of the 95% range
Interpretation of 95% CI With 100 studies like this one The mean change in PPT will lie Between the 95% confidence bounds 95 times out of 100 Likely that a gain will be between 4 and 8 units of change
Linking Data to Statistics Exposure3Exposure1Exposure2Outcome