Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________ data. These numbers can often be better interpreted using statistics such as; ______________________ & _______________________. You must use statistical analyses for your Data Collection and Processing sections of the Lab reports – simply graphing data is not considered “processing” in IB standards.
Example: In an investigation of the heights of the blades of grass in a field, you measure blades of grass with a ruler. The result are a _____________________________________________. These values must be handled appropriately to give us useful information. The values have ______________ (e.g. the height of a particular blade of grass is 25 __?_______). The values have _______________________. (E.g. measuring with a ruler might only be precise to ±1 mm, so the height of a particular blade might be 25 ±1 mm. In this case, it could have been ____________ mm, or _________ mm, or any value in between these, but not more or less than these). We also express the precision of our values in the number of decimal places that we choose. For example, stating that a blade of grass is 25 mm long is not the same as stating that it is 25.0 mm long. In the first case, it is implicit that the blade could be between ___________ and ____________ mm long. In the second case, the blade could be between _____________ and _______________mm long.
In collecting data, we should: · ______________________________________________________________ · _______________________________________________________________ · ______________________________________________________________ Biologist usually take a sample Sometimes, it is possible to measure all of the things that are being considered. For example, the heights of all the oak trees in a very small forest. In statistics, all the values that could be considered is called the __________________ In most investigations, it is not possible, not practical, or not advisable to measure all the values in a population. In these situations, we measure just some of the values, known as a _____________________________
Example. The heights of each of the blades of grass in the two fields differ between each other, even inside the same field. We could measure the height of one blade of grass from each of the two fields (a very small sample) and find that the blade of grass from the field with high potassium is longer than the blade of grass from the field with low potassium. However, we would still be unsure of whether the difference between the heights of these two blades is due to the field it came from, or was just a difference that occurs anyway within each field. We could measure the heights of 500 blades of grass in each of the two fields (a larger sample). It is difficult for the human mind to obtain useful information from such a large amount of unprocessed data. However, by using statistics, we can describe the values in various ways that make the information more meaningful for us. Most often, we process the data to estimate an average and to describe the variation in some way. Populations and samples show variation In a population, we usually find that not all the values are identical. Instead, there are differences between the values even inside a population. We call this___________________. The mean:_____________________________________________________. We estimate the mean as follows:
Example. We have measured the heights of 500 blades of grass from each field. The mean height of the grass in the sample from the field with high potassium is 56.2 mm and the mean height of the grass in the sample from the field with low potassium is 48.5 mm. Does the data prove the difference between the means of our samples really represents a difference between the two populations or is this possibly due to random variation in the population? ______________________________________ ___________________________________________________________________ To help decide this we need to examine: The range: __________________________________________________ Range = ____________________________________________________________ The standard deviation: ________________________________________ _____________________________________________________________ Sx = ________________________________________________
Error bars In many charts and graphs, we show the mean values of our samples. Such a chart or graph can clearly show the differences between our conditions and trends may become apparent. Consider the examples shown on page 3 in the Heinemann textbook. What trends and relationships do these show? An error bar is _________________________________________________________ _______________________________________________________________________
Comparison of graphs: Note that the Standard deviation graph has removed the extremes of variation called ____________________________ The range graph with its extreme values is perhaps misleads us to think the data maybe similar.
Example. We have measured the lengths of 969 blades of grass in a field with a medium level of soil phosphorus. We have then grouped the data into a table, which shows how many blades of grass had a particular length. We call this kind of table a frequency distribution table. Length of blade of grass (mm) Number of blades of grass 0 – – – – – – – – – – – The normal distribution Very often in biology, the variation that we find in our samples follows a so-called normal distribution (sometimes also called a bell-curve). Most, ______________% of the values are quite close to the mean, rather fewer are somewhat greater or somewhat less than the mean, and just a few are much greater or much less than the mean ________________%. Calculate the mean, mode (value which appears most frequently), median (value at which ½ the data are greater and ½ are lesser), n (the sample number) and the range of the data below. Graph the data on a separate piece of graph paper.
Some special characteristics of the normal distribution · _______________________________________________________________ · ____________________________________________________________________ · _____________________________________________________________________ · __________________________________________________________________ · ____________________________________________________________________
Calculating Standard Deviation with the formula and with the Ti-84 graphic calculator! Calculate the Sx for the following data points: X= 2, 3, 4 – no calculator needed. Let’s calculate the Sx for the heights in our class in centimeters. Hint: 1 inch = 2.54cm using the Ti-83/84. STAT, edit ENTER, L1 –enter the data here hitting ENTER btwn each value, STAT, move to CALC, 1-Var Stats, ENTER ….Ta Da! 11click4biology.info/c4b/1/stat1.htmSlide
Example. In an investigation of the two fields, we found that the mean height of the blades of grass in the sample from the field with high potassium was 56.2 mm and the mean height of the blades of grass from the field with the low potassium was 48.5 mm. We have found thus a difference between the means of our two samples. However, we need to determine if this difference is significant. That is, does it indicate a true difference in the means of the populations of the two fields. Using the standard deviation to indicate possible significance A ___________ difference between the means of samples, and __________ standard deviations for these samples, indicates that it is likely that the difference between the means is statistically significant. A ___________ difference between the means of samples, and_________ standard deviations for these samples, indicates that it is likely that the difference between these means is not statistically significant.
Confidence levels It is seldom possible to say with absolute certainty that the difference between sample means is significant with complete certainty (100 % confidence). Most often in biology, we decide that we want to be ________% confident that the difference between the samples is significant. This means that there is only a _______% chance that the samples could be as different as they are because of chance, and not because of a real difference between the populations. We could also say that we are confident that the probability (p) that chance alone produced the difference between our sample means is 5 % (p = 0.05). Standard deviation is: ______________________________________________ __________________________________________________________________
To Calculate Sx with your Ti-83/84, go to the following website for step by step directions. Eventually, you must be able to perform this function. The t-test The t-test determines whether the difference observed between the Sx of two samples is significant, at a chosen confidence level (0.05). Briefly, the test works as follows: Briefly, the test works as follows: · A value of t is calculated with your Ti 83/84 from the data. · If the calculated value for t is _________________the required value for t, the difference between the means is _______________________ at this confidence level (0.05) · If the calculated value for t is smaller than the required value for t, the difference between the means is _________________________at this confidence level.
It is VERY unlikely that the mean height of our two samples will be exactly the same C1 Sample Average height = 162 cm D8 Sample Average height = 168 cm Is the difference in average height of the samples large enough to be significant?
We can analyse the spread of the heights of the students in the samples by drawing histograms Here, the ranges of the two samples have a small overlap, so… … the difference between the means of the two samples IS probably significant Frequency Height (cm) C1 Sample Frequency Height (cm) D8 Sample
Here, the ranges of the two samples have a large overlap, so… … the difference between the two samples may NOT be significant. The difference in means is possibly due to random sampling error Frequency Height (cm) C1 Sample Frequency Height (cm) D8 Sample
To decide if there is a significant difference between two samples we must compare the mean height for each sample… … and the spread of heights in each sample. Statisticians calculate the standard deviation of a sample as a measure of the spread of a sample S x = Σx 2 - (Σx) 2 n n - 1 Where: Sx is the standard deviation of sample Σ stands for ‘sum of’ x stands for the individual measurements in the sample n is the number of individuals in the sample You can calculate standard deviation using the formula:
Student’s t-test The Student’s t-test compares the averages and standard deviations of two samples to see if there is a significant difference between them. We start by calculating a number, t t can be calculated using the equation: ( x 1 – x 2 ) (s 1 ) 2 n1n1 (s 2 ) 2 n2n2 + t = Where: x 1 is the mean of sample 1 s 1 is the standard deviation of sample 1 n 1 is the number of individuals in sample 1 x 2 is the mean of sample 2 s 2 is the standard deviation of sample 2 n 2 is the number of individuals in sample 2
Worked Example: Random samples were taken of pupils in C1 and D8 Their recorded heights are shown below… Students in C1Students in D8 Student Height (cm) Step 1: Add the data to List 1 & List 2 in your Ti- 83+ STAT, EDIT, ENTER: put data into L1 & L2
Step 3: Calculate the Degrees of Freedom Step 2: Use the 2 variable t-Test to calculate the difference of the Standard deviation between the 2 sets of data. STAT, move to TESTS, choose 2-sampT TEST (#4), Scroll down to CALCULATE, ENTER. Degree Freedom = n 1 + n 2 – = 28 D.F.
Step 4: Find the critical value of t for the relevant number of degrees of freedom using the Biology T-Table (page 7). Use the 95% (p=0.05) confidence limit (or critical value). Critical value = Our calculated value of t is below the critical value for 28d.f., therefore, there is no significant difference between the height of students in samples from C1 and D8
Example. The number of hours a week that a group of students spent training was compared with the fastest speed that they could run. A trend was found. With increasing time spent training, the faster the student could run. Example. The number of cigarettes that a group of students smoked each week was compared with the speed that they could run. A trend was found. With increasing number of cigarettes smoked each week, the slower the student could run. Example. The speed at which a group of students could solve mathematical tasks was compared with the speed that they could run. No trend was found. With increasing speed in solving mathematical tasks, no consistent change was found in the speed at which the students could run. If an increase in one factor is associated with an increase in another factor, we say that there is a _____________________correlation. If an increase in one factor is associated with a decrease in another factor, we say that there is a ________________________ correlation. If an increase in a factor is not associated with a consistent change in another factor, we say that there is _____________ correlation between them. Correlation & Causation
* Correlation does NOT necessarily indicate causation. * Why not? __________________________________________________ * ______________________________________________________________________ For Example: _____________________________________________ __________________________________________________________