Statistics made simple Modified from Dr. Tammy Frank’s presentation, NOVA
Why do we need statistics? Example: – Chemical may increase growth of animal – Will be tested on housefly – A colony of 20,000 houseflies are divided into 2 groups – Group 1 gets chemical in food – Group 2 gets a placebo in same food What comes next?
2 weeks later – take random sample of 25 house flies from each group, measure wingspan What are the results?
Housefly results 25 houseflies from each group Group 1 (with chemical) – 7.5mm wingspan Group 2 (without) – 7.2mm wingspan What does this mean? Are group 1 flies really bigger? Some might say yes, some might say no Did you, by chance, happen to pick some larger flies from group 2? Was there sampling error or bias?
One way to be sure is to measure all 20,000 flies……not feasible So what do we do?
Statistics You say the flies are bigger, I say not Statistics provide rules to help us find out Statistics will help tell us if these are significant (real) differences Is there bias? Where bigger ones in group 2 picked by chance? Statistics will tell us what the chances are that the results are due to sampling bias or random chance
Significant Difference Real difference Due to chemical, not chance If test shows probability of getting results by chance or random error is <5%, we accept claim that chemical produced larger fly If test shows that the probability of getting results by chance or random error is >5%, we reject claim that chemical produced larger fly
5% is arbitrary cut-off point that is generally accepted However, if the cost of making an incorrect decision is very high, there will be higher cut- off like 1% » such as research with cancer drugs, etc. Probability value is the p-value Measure of probability that the pattern we see in our data is due to sampling error or random chance
Scientific Method Remember that we cannot “prove” anything. We can only accept or reject a hypothesis A theory is the closest that a biologist can come to “proving” a hypothesis Supported and validated by data and scientific community
Null and Alternative Hypotheses For any experiment/survey/study, there must be a null hypothesis and an alternative hypothesis Set up so that one of them must be true, and one must be false Null hypothesis (H 0 ): = or ≤ or ≥ Example: – The average weight of hermit crab group A is the same as that of hermit crab group B (=) – OR – The average weight of hermit crab group A is the same or greater than that of hermit crab group B (≥) – OR – The average weight of hermit crab group A is the same or less thank that of hermit crab group B (≤)
If null is true, then alternative must be false H o : average weight of hermit crab group A = average weight of group B H A : average weight of hermit crab group A ≠ average weight of group B
Two-tailed hypotheses Use if you have no expectations – You are trying to find out if weights are different but have no reason for them to be H o : average weight of hermit crab group A = average weight of group B H A : average weight of hermit crab group A ≠ average weight of group B
One-tailed hypothesis Use if you have an expectation of the outcome, based on previous studies or information For example, previous studies have demonstrated that Group A area has more hermit crab food that Group B H o : average weight of hermit crab group A ≤ average weight of group B H A : average weight of hermit crab group A › average weight of group B Alternative hypothesis corresponds to what you expect
Always reject or accept the null hypothesis, never reject the alternative If you accept or support the null, then don’t mention the alternative If you reject the null, then accept or support the alternative We never prove a hypothesis We just gain a measure of how confident we are with our hypothesis
p-value The measure of the probability that the pattern we see in our data is due to random chance or sampling error 0.05 is the value most commonly used If p-value is ›0.05 (high p-value), accept null » Weight is not significantly different If p-value is ≤0.05 (low p-value), reject null and accept alternative » Weight is significantly different
Important terms: x = measurement value ∑ = sum of n = sample size df = degrees of freedom = n – 1 X = mean or average = ∑ x /n √ s 2 = Standard deviation = average distance from mean s 2 = Variance = mean of sum of squares ∑(x – X) 2 /df Tells you how much your values varied from mean – Large variance means there is large spread in data, small variance means data points are closer to mean
What test do you use to get p? Depends on what type of data you are collecting – Measurement variable or nominal variables?
Measurement variables Something that can be counted or measured Involves numbers Examples: length, weight, quantity What are examples of tests that can be used?
t-test Used to determine if two sets of data have the same mean Paired t-test – when measurements are linked Patient before and after using drug The null would state there is no difference Unpaired t-test – when you have before and after within 2 different groups Patients with drug (group 1) and patients without drug (group 2)
What do you do when there are more than 2 sets of data? ANOVA – analysis of variance Null would state that the means are equal Example would be if you had 5 groups of patients taking drugs at different dosages per group Single factor ANOVA Only vary one parameter – drug dosage Two factor ANOVA with or without replication Vary dosage and time of day
Nominal variables Usually involves categories A nominal variable is often a word or percentage Examples: color, sex, genotypes What are examples of tests that can be used?
Goodness of fitness test Chi-square