Statistics (0.0) IB Diploma Biology
The mean is a measure of the central tendency of a set of data. Calculate the mean using: Your calculator (sum of values / n) Excel Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) n A. colubris C. latirostris 1 13.0 17.0 2 14.0 18.0 3 15.0 4 5 19.0 6 16.0 7 8 20.0 9 10 Mean s n = sample size. The bigger the better. In this case n=10 for each group. All values should be centred in the cell, with decimal places consistent with the measuring tool uncertainty. =AVERAGE(highlight raw data)
The mean is a measure of the central tendency of a set of data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) n A. colubris C. latirostris 1 13.0 17.0 2 14.0 18.0 3 15.0 4 5 19.0 6 16.0 7 8 20.0 9 10 Mean 15.9 18.8 s Descriptive table title and number. Uncertainties must be included. Raw data and the mean need to have consistent decimal places (in line with uncertainty of the measuring tool)
Descriptive title, with graph number. Labeled point Y-axis clearly labeled, with uncertainty. Make sure that the y-axis begins at zero. x-axis labeled
From the means alone you might conclude that C From the means alone you might conclude that C. latirostris has a longer bill than A. colubris. But the mean only tells part of the story.
Standard deviation is a measure of the spread of most of the data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) n A. colubris C. latirostris 1 13.0 17.0 2 14.0 18.0 3 15.0 4 5 19.0 6 16.0 7 8 20.0 9 10 Mean 15.9 18.8 s 1.91 1.03 Which of the two sets of data has: The longest mean bill length? The greatest variability in the data? Standard deviation can have one more decimal place. =STDEV (highlight RAW data).
Standard deviation is a measure of the spread of most of the data. Table 1: Raw measurements of bill length in A. colubris and C. latirostris. Bill length (±0.1mm) n A. colubris C. latirostris 1 13.0 17.0 2 14.0 18.0 3 15.0 4 5 19.0 6 16.0 7 8 20.0 9 10 Mean 15.9 18.8 s 1.91 1.03 Which of the two sets of data has: The longest mean bill length? The greatest variability in the data? C. latirostris A. colubris Standard deviation can have one more decimal place. =STDEV (highlight RAW data).
Standard deviation is a measure of the spread of most of the data Standard deviation is a measure of the spread of most of the data. Error bars are a graphical representation of the variability of data. Error bars could represent standard deviation, range or confidence intervals. Which of the two sets of data has: The highest mean? The greatest variability in the data? A B
Title is adjusted to show the source of the error bars Title is adjusted to show the source of the error bars. This is very important. You can see the clear difference in the size of the error bars. Variability has been visualised. The error bars overlap somewhat. What does this mean?
Large overlap No overlap The overlap of a set of error bars gives a clue as to the significance of the difference between two sets of data. Large overlap No overlap Lots of shared data points within each data set. Results are not likely to be significantly different from each other. Any difference is most likely due to chance. No (or very few) shared data points within each data set. Results are more likely to be significantly different from each other. The difference is more likely to be ‘real’.
Our results show a very small overlap between the two sets of data. So how do we know if the difference is significant or not? We need to use a statistical test. The t-test is a statistical test that helps us determine the significance of the difference between the means of two sets of data.
Excel can jump straight to a value of P for our results. One function (=ttest) compares both sets of data. As it calculates P directly (the probability that the difference is due to chance), we can determine significance directly. In this case, P=0.00051 This is much smaller than 0.005, so we are confident that we can: reject H0. The difference is unlikely to be due to chance. Conclusion: There is a significant difference in bill length between A. colubris and C. latirostris.
95% Confidence Intervals can also be plotted as error bars. no overlap =CONFIDENCE.NORM(0.05,stdev,samplesize) e.g =CONFIDENCE.NORM(0.05,C15,10) These give a clearer indication of the significance of a result: Where there is overlap, there is not a significant difference Where there is no overlap, there is a significant difference. If the overlap (or difference) is small, a t-test should still be carried out.
Interesting Study: Do “Better” Lecturers Cause More Learning? Students watched a one-minute video of a lecture. In one video, the lecturer was fluent and engaging. In the other video, the lecturer was less fluent. They predicted how much they would learn on the topic (genetics) and this was compared to their actual score. (Error bars = standard deviation). Find out more here: http://priceonomics.com/is-this-why-ted-talks-seem-so-convincing/
Interesting Study: Do “Better” Lecturers Cause More Learning? Students watched a one-minute video of a lecture. In one video, the lecturer was fluent and engaging. In the other video, the lecturer was less fluent. They predicted how much they would learn on the topic (genetics) and this was compared to their actual score. (Error bars = standard deviation). Is there a significant difference in the actual learning? Find out more here: http://priceonomics.com/is-this-why-ted-talks-seem-so-convincing/
From MrT’s Excel Statbook.
Diabetes and obesity are ‘risk factors’ of each other. There is a strong correlation between them, but does this mean one causes the other? http://diabetes-obesity.findthedata.org/b/240/Correlations-between-diabetes-obesity-and-physical-activity
Correlation does not imply causality. Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming
Correlation does not imply causality. Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other. To be able to determine causality through experimentation we need: One clearly identified independent variable Carefully measured dependent variable(s) that can be attributed to change in the independent variable Strict control of all other variables that might have a measurable impact on the dependent variable. We need: sufficient relevant, repeatable and statistically significant data. Some known causal relationships: Atmospheric CO2 concentrations and global warming Atmospheric CO2 concentrations and the rate of photosynthesis Temperature and enzyme activity Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming
Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there." Cartoon from: http://www.xkcd.com/552/
Bibliography / Acknowledgments