The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.

The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and Variability

http://www.zipcodezoo.com/hp350/Adelpha_basiloides_0.jpg http://4.bp.blogspot.com/-M8r6K- ZeMas/TWWe7aOgiWI/AAAAAAAAAP8/ck9cmUdqfas/s1600/Adelpha _cytherea_ButterflyPhotography-BB_Blogspot_JGJ.jpg Remember our two species of butterflies? Smooth-Banded Sister (Adelpha cytherea) Spot Celled Sister (Adelpha basiloides) These are closely related species from the Nymphalidae family - both are found in the tropics of Central America and both feed on the nectar of flowers. "Is there a significant difference in proboscis length and body mass between A. basiloides and A. cytherea?” Research Question

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! Give the raw data tables proper titles Include uncertainties! Be consistent in your number of decimal places. Don’t use more than the sensitivity limits of your instrument.

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! What is the number of butterflies sampled for each species? What is the total number of butterflies sampled?

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! What is the number of butterflies sampled for each species? n = 15 What is the total number of butterflies sampled? Total sampled in both species = 30

Now that we have recorded our raw data in an organized fashion it is time to calculate some basic statistics for our datasets… 3) Mean (average) 1) Mode 2) Median We’ll start with three, "Measurements of Central Tendency.” Each is a summary score that tries in some way to represent a set of scores. It is a single score generated from a dataset that in some way is typical of the distribution of scores. Fancy name. Don’t get caught up in it. These are easy stats and you know most of them already.

Mode: This is the score or value that occurs most frequently in a dataset. What is the Mode of this dataset?

Mode: This is the score or value that occurs most frequently in a dataset. What is the Mode of this dataset? Answer: 23.5 Why? The value 23.5 occurs the most in the dataset – twice to be exact. Not very complicated…

Mode: This is the score or value that occurs most frequently in a dataset. Datasets can be amodal, monomodal, bimodal and multimodal. (You should be able to figure out what these terms mean.) Which of these terms would best describe the dataset to the left? Note: this dataset is difference from the one before which was monomodal.

Mode: This is the score or value that occurs most frequently in a dataset. Datasets can be amodal, monomodal, bimodal and multimodal. (You should be able to figure out what these terms mean.) Which of these terms would best describe the dataset to the left? Note: this dataset is difference from the one before which was monomodal. Answer = Amodal, as there is no repeating value

Median: This is a middle point of scores in a dataset. 50%of the scores are above the median, and 50% are below it. What is the Median of this dataset? The median is a point and it does not have to be and actual score in that distribution. Think about what the median would be for a dataset with an even number of samples – e.g. Median value of the dataset 10, 7, 8 and 6?

Median: This is a middle point of scores in a dataset. 50%of the scores are above the median, and 50% are below it. What is the Median of this dataset? = 23.2 The median is a point and it does not have to be and actual score in that distribution. Think about what the median would be for a dataset with an even number of samples – e.g. Median value of the dataset 10, 7, 8 and 6? = 7.5

Mean: This is the average value of the dataset and all of you should be able to calculate this easily…

OK. So all of this is made terribly easy if you learn to use Excel properly. Click on the image below and watch the podcast on how to use Excel to calculate Modes, Medians, and Means within a spreadsheet. You need to master these skills. http://www.youtube.com/watch?v=ziQcGGBvH00&feature=youtu.be

Now what we need to do is graph the data in Excel. This, too, is fairly easy. View the podcast below to see how this is done. Do not forget all of the rules that you have learned over the years on what is expected in terms of graphical presentation of data! For Graphs… Be neat, and make the graph large enough to be easily read. Use a pencil and a ruler, if constructing the graph by hand. Each axis should have a LABEL and the UNITS of measurement. The independent variable should be on the X-axis, and the dependent variable should be on the Y-axis. Scale the axes properly so that the data is effectively displayed. Use the appropriate type of graph - line graph, scatter plot, bar graph, etc. Data points should be properly positioned relative to the axes scales. (Remember these from 6 th grade?) http://youtu.be/-WsEgIbfbug

Using Excel, we’ve generated the graph shown below... Now what does it tell us? How would you analyze these results? What conclusions would you draw in viewing this graph?

What it tells us is that A. cytherea has a higher mean bill length than A. basiloides. But this is only part of the picture and is a 9 th and 10 th grade analysis of the datasets. We need to go further in our statistical analysis because Mean values are not always accurate representative scores! Why?

Well… because the mean is a measure of the central tendency of the dataset, but it tells us NOTHING, NOTHING! about the spread of the data. The data points that we are analyzing could be tightly clustered around the mean or they could have high variability.

What is the RANGE of this small dataset? 54, 56, 67, 72, 19, 52, 56, 56, 66, 68, 57, 58, 63 (Max sample value – Min sample value) = RANGE Range is a simple and easy to compute measure of variability in a dataset:

(Max sample value – Min sample value) = RANGE (72 – 19) = 53 = RANGE This large range value suggest that there is a great deal of variability in our dataset, but here we can see that RANGE is also limited in that it tell us nothing about the variability within the distribution. What is the RANGE of this small dataset? 54, 56, 67, 72, 19, 52, 56, 56, 66, 68, 57, 58, 63 ? ? Range is a simple and easy to compute measure of variability in a dataset:

When we plot out the dataset on a simple number line, one can see the flaw in relying just on the MEAN and RANGE values as measurements of central tendencies and variability: 56, 67, 72, 11, 56, 56, 66, 19, 68, 57, 58, 63 56 67 72 11 56 66 68 57 58 63 The Mean (X) of this dataset = 54.1 (X) 19 The vast majority of values are clustered around this end of the distribution. The mean is not in the middle of this cluster, at is has been affected by the outliers, 11 and 19. This dataset has a skewed distribution!

+/- 1 s.d. = 68% of data! The greater the SD value the greater the variability!

How do you calculate the standard deviation of a dataset? We are going to leave the mathematics behind this measure of variability to your math teachers, but you have to be able to calculate S.D. values in Excel. Follow the link to a podcast tutorial on using Excel to calculate standard deviation: http://youtu.be/90YWFllx1EA

Error bars are a graphical representation of the variability of data. Error bars can be used to represent range, standard deviation or other measures of variability. In IB Biology STANDARD DEVIATION ERROR BARS will be most useful.

SET A – the bar (mean) for A is higher than B SET B – the S.D. error bar is longer for B than A

How do you put standard deviation error bars on the graphs that you generate? Follow the link to a podcast tutorial on putting error bars on graphs in Excel: http://youtu.be/oV0vbQlp9AI

No overlap What do error bars tell us? The overlap of error bars gives us a clue as to the significance of the results! Overlap! LOTS OF OVERLAP = LOTS OF SHARED DATA Results are NOT LIKELY TO BE SIGNIFICANTLY DIFFERENT! The difference between means is most likely due to chance NO OVERLAP = VERY LITTLE SHARED DATA Results ARE LIKELY TO BE SIGNIFICANTLY DIFFERENT! The difference between means is most likely to be REAL

a.SET B b.SET B c. SET A d. SET B e. SET A

Let’s look back at our original data and try to answer the first half of our research question. Now, given your knowledge about what the standard deviation of a dataset represents, what should your conclusion be in regards to the proboscis lengths of A. cytherea and A. basiloides? "Is there a significant difference in proboscis length between A. basiloides and A. cytherea?”

Let’s look back at our original data and try to answer the first half of our research question. Now, given your knowledge about what the standard deviation of a dataset represents, what should your conclusion be in regards to the proboscis lengths of A. cytherea and A. basiloides? "Is there a significant difference in proboscis length between A. basiloides and A. cytherea?” NO! The two datasets contain too much shared data to conclusively state that a significant difference exists between the proboscis lengths of these butterflies. Lots of overlap in SD error bars!

But what about when we look at the mean body mass values for the two species? There is some overlap. This one is hard to call. We need another statistical test to tell us if there is a difference in these data sets. Something more refined… ?

The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.

Similar presentations

Presentation on theme: "The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.

Similar presentations

Presentation on theme: "The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and."— Presentation transcript:

Similar presentations

About project

Feedback