The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
 Have a title and use over half the page.  Needs to show trends.  Put the independent variable on the X-axis.  Dependent variable goes on the.
Statistical Analysis IB Diploma BiologyIB Diploma Biology (HL/SL)
I can analyse quantitative data and represent is graphically.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
AP Biology Intro to Statistic
1 STATISTICS!!! The science of data. 2 What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis.
 There are times when an experiment cannot be carried out, but researchers would like to understand possible relationships in the data. Data is collected.
Today: Central Tendency & Dispersion
Statistics Used In Special Education
Objective To understand measures of central tendency and use them to analyze data.
Quantitative Skills: Data Analysis and Graphing.
Data Collection & Processing Hand Grip Strength P textbook.
1.1 Displaying Distributions with Graphs
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
Understanding and Presenting Your Data OR What to Do with All Those Numbers You’re Recording.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
The Scientific Method Honors Biology Laboratory Skills.
Nature of Science Science Nature of Science Scientific methods Formulation of a hypothesis Formulation of a hypothesis Survey literature/Archives.
MATH IN THE FORM OF STATISTICS IS VERY COMMON IN AP BIOLOGY YOU WILL NEED TO BE ABLE TO CALCULATE USING THE FORMULA OR INTERPRET THE MEANING OF THE RESULTS.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Lecture 3 Describing Data Using Numerical Measures.
Measures of Central Tendency And Spread Understand the terms mean, median, mode, range, standard deviation.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
STATISTICS!!! The science of data.
Chapter 4: Variability. Variability Provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together.
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Statistical Analysis Topic – Math skills requirements.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Statistical Analysis Image: 'Hummingbird Checks Out Flower'
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
Excel How To Mockingbird Example BIO II Van Roekel.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Why do we analyze data?  To determine the extent to which the hypothesized relationship does or does not exist.  You need to find both the central tendency.
Engineering College of Engineering Engineering Education Innovation Center Analyzing Measurement Data Rev: , MCAnalyzing Data1.
1 Research Methods in Psychology AS Descriptive Statistics.
USING GRAPHING SKILLS. Axis While drawing graphs, we have two axis. X-axis: for consistent variables Y-axis: for other variable.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part I – Introduction and Uncertainty in Measurement.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Using Excel to Graph Data Featuring – Mean, Standard Deviation, Standard Error and Error Bars.
The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part III – Hypothesis Testing with T-tests.
Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.
AP Biology Intro to Statistics
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Statistics (0.0) IB Diploma Biology
Description of Data (Summary and Variability measures)
Introduction to Summary Statistics
STATS DAY First a few review questions.
Success Criteria: I will be able to analyze data about my classmates.
Chapter 3 Describing Data Using Numerical Measures
EXAMPLES OF STATS FUNCTIONS
Representing Quantitative Data
Descriptive Statistics
AP Biology Intro to Statistic
AP Biology Intro to Statistic
Tuesday, February 18th What is the range of the upper 75%?
AP Biology Intro to Statistic
Common Core Math I Unit 2: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Descriptive Statistics
Describing Data Coordinate Algebra.
Central Tendency & Variability
STAT 515 Statistical Methods I Sections
Presentation transcript:

The Data Collection and Statistical Analysis in IB Biology John Gasparini The Munich International School Part II – Basic Stats, Standard Deviation and Variability

ZeMas/TWWe7aOgiWI/AAAAAAAAAP8/ck9cmUdqfas/s1600/Adelpha _cytherea_ButterflyPhotography-BB_Blogspot_JGJ.jpg Remember our two species of butterflies? Smooth-Banded Sister (Adelpha cytherea) Spot Celled Sister (Adelpha basiloides) These are closely related species from the Nymphalidae family - both are found in the tropics of Central America and both feed on the nectar of flowers. "Is there a significant difference in proboscis length and body mass between A. basiloides and A. cytherea?” Research Question

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! Give the raw data tables proper titles Include uncertainties! Be consistent in your number of decimal places. Don’t use more than the sensitivity limits of your instrument.

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! What is the number of butterflies sampled for each species? What is the total number of butterflies sampled?

Imagine that you have collected data on the proboscis length and body mass of our two butterfly species. Record it properly. You must be neat to reduce problems later! What is the number of butterflies sampled for each species? n = 15 What is the total number of butterflies sampled? Total sampled in both species = 30

Now that we have recorded our raw data in an organized fashion it is time to calculate some basic statistics for our datasets… 3) Mean (average) 1) Mode 2) Median We’ll start with three, "Measurements of Central Tendency.” Each is a summary score that tries in some way to represent a set of scores. It is a single score generated from a dataset that in some way is typical of the distribution of scores. Fancy name. Don’t get caught up in it. These are easy stats and you know most of them already.

Mode: This is the score or value that occurs most frequently in a dataset. What is the Mode of this dataset?

Mode: This is the score or value that occurs most frequently in a dataset. What is the Mode of this dataset? Answer: 23.5 Why? The value 23.5 occurs the most in the dataset – twice to be exact. Not very complicated…

Mode: This is the score or value that occurs most frequently in a dataset. Datasets can be amodal, monomodal, bimodal and multimodal. (You should be able to figure out what these terms mean.) Which of these terms would best describe the dataset to the left? Note: this dataset is difference from the one before which was monomodal.

Mode: This is the score or value that occurs most frequently in a dataset. Datasets can be amodal, monomodal, bimodal and multimodal. (You should be able to figure out what these terms mean.) Which of these terms would best describe the dataset to the left? Note: this dataset is difference from the one before which was monomodal. Answer = Amodal, as there is no repeating value

Median: This is a middle point of scores in a dataset. 50%of the scores are above the median, and 50% are below it. What is the Median of this dataset? The median is a point and it does not have to be and actual score in that distribution. Think about what the median would be for a dataset with an even number of samples – e.g. Median value of the dataset 10, 7, 8 and 6?

Median: This is a middle point of scores in a dataset. 50%of the scores are above the median, and 50% are below it. What is the Median of this dataset? = 23.2 The median is a point and it does not have to be and actual score in that distribution. Think about what the median would be for a dataset with an even number of samples – e.g. Median value of the dataset 10, 7, 8 and 6? = 7.5

Mean: This is the average value of the dataset and all of you should be able to calculate this easily…

OK. So all of this is made terribly easy if you learn to use Excel properly. Click on the image below and watch the podcast on how to use Excel to calculate Modes, Medians, and Means within a spreadsheet. You need to master these skills.

Now what we need to do is graph the data in Excel. This, too, is fairly easy. View the podcast below to see how this is done. Do not forget all of the rules that you have learned over the years on what is expected in terms of graphical presentation of data! For Graphs… Be neat, and make the graph large enough to be easily read. Use a pencil and a ruler, if constructing the graph by hand. Each axis should have a LABEL and the UNITS of measurement. The independent variable should be on the X-axis, and the dependent variable should be on the Y-axis. Scale the axes properly so that the data is effectively displayed. Use the appropriate type of graph - line graph, scatter plot, bar graph, etc. Data points should be properly positioned relative to the axes scales. (Remember these from 6 th grade?)

Using Excel, we’ve generated the graph shown below... Now what does it tell us? How would you analyze these results? What conclusions would you draw in viewing this graph?

What it tells us is that A. cytherea has a higher mean bill length than A. basiloides. But this is only part of the picture and is a 9 th and 10 th grade analysis of the datasets. We need to go further in our statistical analysis because Mean values are not always accurate representative scores! Why?

Well… because the mean is a measure of the central tendency of the dataset, but it tells us NOTHING, NOTHING! about the spread of the data. The data points that we are analyzing could be tightly clustered around the mean or they could have high variability.

What is the RANGE of this small dataset? 54, 56, 67, 72, 19, 52, 56, 56, 66, 68, 57, 58, 63 (Max sample value – Min sample value) = RANGE Range is a simple and easy to compute measure of variability in a dataset:

(Max sample value – Min sample value) = RANGE (72 – 19) = 53 = RANGE This large range value suggest that there is a great deal of variability in our dataset, but here we can see that RANGE is also limited in that it tell us nothing about the variability within the distribution. What is the RANGE of this small dataset? 54, 56, 67, 72, 19, 52, 56, 56, 66, 68, 57, 58, 63 ? ? Range is a simple and easy to compute measure of variability in a dataset:

When we plot out the dataset on a simple number line, one can see the flaw in relying just on the MEAN and RANGE values as measurements of central tendencies and variability: 56, 67, 72, 11, 56, 56, 66, 19, 68, 57, 58, The Mean (X) of this dataset = 54.1 (X) 19 The vast majority of values are clustered around this end of the distribution. The mean is not in the middle of this cluster, at is has been affected by the outliers, 11 and 19. This dataset has a skewed distribution!

+/- 1 s.d. = 68% of data! The greater the SD value the greater the variability!

How do you calculate the standard deviation of a dataset? We are going to leave the mathematics behind this measure of variability to your math teachers, but you have to be able to calculate S.D. values in Excel. Follow the link to a podcast tutorial on using Excel to calculate standard deviation:

Error bars are a graphical representation of the variability of data. Error bars can be used to represent range, standard deviation or other measures of variability. In IB Biology STANDARD DEVIATION ERROR BARS will be most useful.

SET A – the bar (mean) for A is higher than B SET B – the S.D. error bar is longer for B than A

How do you put standard deviation error bars on the graphs that you generate? Follow the link to a podcast tutorial on putting error bars on graphs in Excel:

No overlap What do error bars tell us? The overlap of error bars gives us a clue as to the significance of the results! Overlap! LOTS OF OVERLAP = LOTS OF SHARED DATA Results are NOT LIKELY TO BE SIGNIFICANTLY DIFFERENT! The difference between means is most likely due to chance NO OVERLAP = VERY LITTLE SHARED DATA Results ARE LIKELY TO BE SIGNIFICANTLY DIFFERENT! The difference between means is most likely to be REAL

a.SET B b.SET B c. SET A d. SET B e. SET A

Let’s look back at our original data and try to answer the first half of our research question. Now, given your knowledge about what the standard deviation of a dataset represents, what should your conclusion be in regards to the proboscis lengths of A. cytherea and A. basiloides? "Is there a significant difference in proboscis length between A. basiloides and A. cytherea?”

Let’s look back at our original data and try to answer the first half of our research question. Now, given your knowledge about what the standard deviation of a dataset represents, what should your conclusion be in regards to the proboscis lengths of A. cytherea and A. basiloides? "Is there a significant difference in proboscis length between A. basiloides and A. cytherea?” NO! The two datasets contain too much shared data to conclusively state that a significant difference exists between the proboscis lengths of these butterflies. Lots of overlap in SD error bars!

But what about when we look at the mean body mass values for the two species? There is some overlap. This one is hard to call. We need another statistical test to tell us if there is a difference in these data sets. Something more refined… ?