Evidence Based Medicine Week 3: Basic Research Concepts in Western and Eastern Medicine. Part II: Statistics
Terminology Review... Variable (things we measure) Correlational vs. Experimental research (experiment has manipulation and control and answers “Why” or “how” questions. Correlational just shows how things are or were. No manipulation, just observation.) Dependent vs. Independent variable. (You manipulate the independent variable, and the dependent variable reacts.)
Measurement scales for variables Nominal – category like race or gender. Ordinal – this gives rank. We can know something is the best, but that doesn't tell us anything about how much better it is than other things. Interval – these give us rank and also a reliable quantity - quantify the variables. Like temperature. 30 degrees is 10 more than 20... Ratio variables -rank, quantity and a solid zero point so mathematically more precise. Like temperature in degrees Kelvin or time. Time is used as a ratio variable – minutes are the same distance apart. Interval is the distance between 2 points. The point is to find out how one variable changes/influences the other variable. Diet and obesity for instance.
Relationships between variables Ultimately, every study examines the relationships between variables. The whole point is finding out how one variable changes the other (i.e. treatment and pain/health. Diet and obesity, acupuncture and depression, whatever...) Statistics gives us a tool to evaluate the strength of the relationship that we find, and also the probability that whatever we find just happened by chance. Life is random. Statistics is a way of figuring out how random and how predictable stuff is.
Statistical significance “p” value represents the probability that the results of the study happened by random chance. This is the degree to which the result can be considered “true” “useful” or “representative” of the general population. p=0.05 means there's a 5% chance of the result being a fluke. The lower the “p” value the more valid the result. So p=.001 is much better than p=.05 P=.05 is the minimum standard for a result to be considered “significant.” “p” measures “reliability” or “truthfulness” P Value isn’t the prob that it’s true, but the probability that the results happen by random chance. The smaller the P value the more valid the results. P=.05 is the minimum standard for ‘significant.’ Based on the Bell Curve somehow…but we don’t have to know how. Cool.
Magnitude/Size/Strength of Relationships The stronger the relationship between variables found in a study, the more likely it is that the relationship also exists in the general population. “strength” of relationship and “reliability” are therefore related. This is only true if the sample size is kept constant – see the next slide. Sample size is represented by “n” so n=150 means 150 people participated in the study. N=1000 means 1000 people. In general bigger “n” means better results. Randomness is less as you increase the population of people you study. This is a Bell curve thingy. Bigger population = bigger applicability to the general population. N=number of individuals/items studied in the research.
Size of the Sample If you only measure one person, then it's easy to find relationships between things that aren't actually related or representative of the general population. Like saying because one woman (gender) is blond (hair color), thus all women are blond. The more people you measure, the more likely it is that whatever trends or relationships you see between variables are actually true.
Example Consider this example from research on statistical reasoning (Nisbett, et al., 1987). There are two hospitals: in the first one, 120 babies are born every day; in the other, only 12. On average, the ratio of baby boys to baby girls born every day in each hospital is 50/50. However, one day, in one of those hospitals, twice as many baby girls were born as baby boys. In which hospital was it more likely to happen? The answer is obvious for a statistician, but as research shows, not so obvious for a lay person: it is much more likely to happen in the small hospital. The reason for this is that technically speaking, the probability of a random deviation of a particular size (from the population mean), decreases with the increase in the sample size.
Smaller relationships need larger samples If a coin is slightly asymmetrical and, when tossed, is more likely to produce heads than tails (e.g., 60% vs. 40%), then ten tosses would not be sufficient to convince anyone that the coin is asymmetrical. However, if the effect in question were large enough, then ten tosses could be enough. For instance, imagine that the coin is so asymmetrical that no matter how you toss it, the outcome will be heads. If ten tosses produced ten heads, most people would consider it significant. In other words, it would be considered convincing evidence that in the theoretical population of an infinite number of tosses of this coin, there would be more heads than tails. Thus, if a relation is large, then it can be found to be significant even in a small sample. With higher numbers in the study (really big studies like the Framingham Study over 60 years and 40K people) even smaller percentages of likelihood are significant. Not so much in small studies.
“No Relation” Finding “no relation” between variables can also be statistically significant (and clinically relevant). If “n” is large then no relation probably means that there really is no relation in the general population (especially if “p” is very small). This can be just as clinically useful as data in which there is a relationship.
Measuring the Magnitude (strength) of relationships between variables In general the strength is a comparison of the observed relationship to the maximum possible relationship between variables (max possible that could have been observed). If all WCC (white cell count) scores of males were equal exactly to 100 and those of females equal to 102, then all deviations from the grand mean (or normal) in our sample would be entirely accounted for by gender. We would say that in our sample, Gender is perfectly correlated with WCC, that is, 100% of the observed differences between subjects regarding their WCC is accounted for by their gender. If WCC scores were in the range of 0-1000, the same difference (of 2) between the average WCC of males and females found in the study would account for such a small part of the overall differentiation of scores that most likely it would be considered negligible.
Explained Variation Statistics explains the relationship between variables. Try to account for the relationship between the variables that you are studying compared to the overall differentiation in the studied variable (dependent variable). Like low HDL accounts for 10% of the total variation in heart attack incidence or in the above example gender accounts for x% of white cell count. That percent is the explained variation.
Calculating statistical significance This is “p” value. Depends on the study and type of analysis. Always takes “n” into account as well as the size of the relationship between the variables. Larger “n” means that a smaller relationship can be significant.
Normal Distribution This is the bell curve – it is assumed for most studies. This is an assumed truth about the general nature of reality and we use it as the basis for most statistical analysis. It takes into account 2 features. Mean (or average) and standard deviation. IQ is perfect example. Mean = 100. Standard deviation = + 25. 68% of the population falls + 1 standard deviation from normal and 95% falls + 2. Anything greater than + 2 is “significant” (p = at least .05) This is a calculated concept and the standard deviation is calculated from your results. The bigger the “n” the more closely the results will approximate “normal distribution” even in something that isn't well represented by a normal distribution. There are ways to do statistical analysis without normal distribution, but they aren't as powerful. Normal Distrib is kind of fictitious and is a huge assumption. “normality” is assumed in all populations. The more you study the more any relationship starts to look like the Bell curve though. That’s why it’s used so often.
Null Hypothesis All hypothesis are tested against the “null” hypothesis. It is essentially a statement that no relationships exist between your tested variables. Ex: Hypothesis = mice fed superfood will have greater longevity then those fed normal mouse food. Null hypothesis = there is no relationship between superfood diet and longevity.
The normality assumption Keep in mind that we assume that all variables ultimately meet the normal distribution if the population studied is big enough. This is an assumption and can only be proved in some cases – not all. This may introduce some error, but we honestly have no way to find/calculate that error yet.