STATISTICS!!! The science of data
What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for making calculations or drawing conclusions Encarta dictionary Encarta dictionary
Statistics in Science Data can be collected about a population (surveys)Data can be collected about a population (surveys) Data can be collected about a process (experimentation)Data can be collected about a process (experimentation)
2 types of Data * Qualitative * Qualitative *Quantitative *Quantitative
Qualitative Data Information that relates to characteristics or description (observable qualities)Information that relates to characteristics or description (observable qualities) Information is often grouped by descriptive categoryInformation is often grouped by descriptive category ExamplesExamples –Species of plant –Type of insect –Shades of color –Rank of flavor in taste testing Remember: qualitative data can be “scored” and evaluated numerically
Qualitative data, manipulated numerically Survey results, teens and need for environmental actionSurvey results, teens and need for environmental action
Quantitative data Quantitative – measured using a naturally occurring numerical scaleQuantitative – measured using a naturally occurring numerical scale ExamplesExamples –Chemical concentration –Temperature –Length –Weight…etc.
Quantitation Measurements are often displayed graphicallyMeasurements are often displayed graphically
Quantitation = Measurement In data collection for Biology, data must be measured carefully, using laboratory equipmentIn data collection for Biology, data must be measured carefully, using laboratory equipment ( ex. Timers, metersticks, pH meters, balances, pipettes, etc) The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm? For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!
How to determine uncertainty? Usually the instrument manufacturer will indicate this – read what is provided by the manufacturer.Usually the instrument manufacturer will indicate this – read what is provided by the manufacturer. Be sure that the number of significant digits in the data table/graph reflects the precision of the instrument used (for ex. If the manufacturer states that the accuracy of a balance is to 0.1g – and your average mass is 2.06g, be sure to round the average to 2.1g) Your data must be consistent with your measurement tool regarding significant figures.Be sure that the number of significant digits in the data table/graph reflects the precision of the instrument used (for ex. If the manufacturer states that the accuracy of a balance is to 0.1g – and your average mass is 2.06g, be sure to round the average to 2.1g) Your data must be consistent with your measurement tool regarding significant figures.
Finding the limits As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit (ex metric ruler is lined to 1mm,so the limit of uncertainty of the ruler is +/- 0.5 mm.)As a “rule-of-thumb”, if not specified, use +/- 1/2 of the smallest measurement unit (ex metric ruler is lined to 1mm,so the limit of uncertainty of the ruler is +/- 0.5 mm.) If the room temperature is read as 25 degrees C, with a thermometer that is scored at 1 degree intervals – what is the range of possible temperatures for the room?If the room temperature is read as 25 degrees C, with a thermometer that is scored at 1 degree intervals – what is the range of possible temperatures for the room? (+/- 0.5 degrees Celsius - if you read 15 o C, it may in fact be 14.5 or 15.5 degrees) (+/- 0.5 degrees Celsius - if you read 15 o C, it may in fact be 14.5 or 15.5 degrees)
-Stephen Taylor
Looking at Data How accurate is the data? (How close are the data to the “real” results?)How accurate is the data? (How close are the data to the “real” results?) How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.How precise is the data? (All test systems have some uncertainty, due to limits of measurement) Estimation of the limits of the experimental uncertainty is essential.
Quick Review – 3 measures of “Central Tendency” mode: value that appears most frequentlymode: value that appears most frequently median: When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser.median: When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser. The most commonly used measure of central tendency is the mean, or arithmetic average (sum of data points divided by the number of points)The most commonly used measure of central tendency is the mean, or arithmetic average (sum of data points divided by the number of points)
do not calculate a mean from values that are already averages. do not calculate a mean when the measurement scale is not linear (pH)
Comparing Averages Once the 2 averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the 2Once the 2 averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the 2
Drawing error bars The simplest way to draw an error bar is to use the mean as the central point, and to use the distance of the measurement that is furthest from the average – RANGE - as the endpoints of the data bar (use with less than 5 data points) The RANGE is a difference between the smallest and largest measurements of a sample provides a sense of the variation of the sample.
Average value Value farthest from average Calculated distance
What do error bars suggest? If the bars show extensive overlap, it is likely that there is not a significant difference between those valuesIf the bars show extensive overlap, it is likely that there is not a significant difference between those values
Sample #1 25, 35, 32, 28 Sample #2 15, 75, 10, 20 Find the mean of each sample.
These samples have the same mean, but are still very different. How different? Use range. Sample #1 25, 35, 32, 28 Sample #2 15, 75, 10, 20
How can leaf lengths be displayed graphically?
Simply measure the lengths of each and plot how many are of each length
If smoothed, the histogram data assumes this shape
This Shape? Is a classic bell-shaped curve, AKA Gaussian Distribution Curve, AKA a Normal Distribution curve.Is a classic bell-shaped curve, AKA Gaussian Distribution Curve, AKA a Normal Distribution curve. Essentially it means that in all studies with an adequate number of datapoints a significant number of results tend to be near the mean. Fewer results are found farther from the meanEssentially it means that in all studies with an adequate number of datapoints a significant number of results tend to be near the mean. Fewer results are found farther from the mean
Standard Deviation The standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of dataThe standard deviation is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data
Standard deviation The STANDARD DEVIATION is a more sophisticated indicator of the precision of a set of a given number of measurementsThe STANDARD DEVIATION is a more sophisticated indicator of the precision of a set of a given number of measurements –The standard deviation is like an average deviation of measurement values from the mean. In large studies (5 or more data points), the standard deviation is used to draw error bars, instead of the maximum deviation.
A typical standard distribution curve
According to this curve: One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for somewhere around 68 percent of the data in this group.One standard deviation away from the mean in either direction on the horizontal axis (the red area on the preceding graph) accounts for somewhere around 68 percent of the data in this group. Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the data.Two standard deviations away from the mean (the red and green areas) account for roughly 95 percent of the data.
Three Standard Deviations? three standard deviations (the red, green and blue areas) account for about 99 percent of the datathree standard deviations (the red, green and blue areas) account for about 99 percent of the data -3sd -2sd +/-1sd 2sd +3sd
How is Standard Deviation calculated? With this formula! You DO NOT need to memorize the formulaYou DO NOT need to memorize the formula It can be calculated on a scientific calculatorIt can be calculated on a scientific calculator OR…. In Microsoft ExcelOR…. In Microsoft Excel
You DO need to know the concept! standard deviation is a statistic that tells how tightly all the various datapoints are clustered around the mean in a set of data. standard deviation is a statistic that tells how tightly all the various datapoints are clustered around the mean in a set of data. When the datapoints are tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. (precise results, smaller sd)When the datapoints are tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. (precise results, smaller sd) When the datapoints are spread apart and the bell curve is relatively flat, a large standard deviation value suggests less precise results When the datapoints are spread apart and the bell curve is relatively flat, a large standard deviation value suggests less precise results
Given the set of numbers {20.0 mL, 23.0 mL, 25.0 mL, 26.0 mL, 25.0 mL}, calculate the mean and the standard deviation using your calculator.
Now let's look at how standard deviation can be used to help us decide whether the difference between two mean is likely to be significant. Thirty teenage boys measured the length of their left and right hands to find out whether they are different. Hand Mean lengthSD left mm11.0 mm right mm10.9 mm Because the SD's are much greater than the difference in mean length, it is very unlikely that the difference in mean length between left and right hands is significant.
The same thirty boys also measured the length of their right foot to find out whether it was different from their hand lengths. Appendage Mean lengthSD right hand mm10.9 mm right foot mm14.3 mm Because the SD's are much less that the difference in mean length, it is very likely that the difference in mean length between right hands and right feet is significant.
C. Results significantly different?
f. Results significantly different?
f. Results significantly different? NO