Statistical Analysis adapted from the work of Stephen Taylor.

Slides:



Advertisements
Similar presentations
AP Biology.  Segregation of the alleles into gametes is like a coin toss (heads or tails = equal probability)  Rule of Multiplication  Probability.
Advertisements

Statistical Analysis IB Diploma Biology Modified by Christopher Wilkinson from Stephen Taylor Image: 'Hummingbird Checks Out Flower'
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Statistical Analysis IB Diploma BiologyIB Diploma Biology (HL/SL)
Quantitative Skills 4: The Chi-Square Test
BHS Methods in Behavioral Sciences I
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
AM Recitation 2/10/11.
Chi-Squared Test.
BOT3015L Data analysis and interpretation Presentation created by Jean Burns and Sarah Tso All photos from Raven et al. Biology of Plants except when otherwise.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
TOPIC 1 STATISTICAL ANALYSIS
Statistical Analysis Statistical Analysis
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Comparing Means: t-tests Wednesday 22 February 2012/ Thursday 23 February 2012.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Beak of the Finch Natural Selection Statistical Analysis.
Statistical Analysis IB Diploma Biology Stephen Taylor Image: 'Hummingbird Checks Out Flower'
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory. How.
1 Section 9-4 Two Means: Matched Pairs In this section we deal with dependent samples. In other words, there is some relationship between the two samples.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Test for Significant Differences T- Tests. T- Test T-test – is a statistical test that compares two data sets, and determines if there is a significant.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Statistics allow biologists to support the findings of their experiments.
Chi square analysis Just when you thought statistics was over!!
Sampling  When we want to study populations.  We don’t need to count the whole population.  We take a sample that will REPRESENT the whole population.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Statistical Analysis Topic – Math skills requirements.
Statistics in IB Biology Error bars, standard deviation, t-test and more.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Statistical Analysis Image: 'Hummingbird Checks Out Flower'
Data Analysis.
RESEARCH & DATA ANALYSIS
STATISTICS FOR SCIENCE RESEARCH (The Basics). Why Stats? Scientists analyze data collected in an experiment to look for patterns or relationships among.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Excel How To Mockingbird Example BIO II Van Roekel.
DateGroup Project TaskDetails Feb 8, 9Article Analysis Due at start of class. See pgs Feb 15, 16In-class work day Discuss scientific writing,
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
AP Biology Intro to Statistics
STATISTICS FOR SCIENCE RESEARCH
Statistical Analysis IB Diploma Biology Stephen Taylor
AP Biology Intro to Statistics
Statistics (0.0) IB Diploma Biology
Statistical Analysis - IB Biology - Mark Polko
STATISTICAL ANALYSIS.
EXAMPLES OF STATS FUNCTIONS
TOPIC 1: STATISTICAL ANALYSIS
Statistical Analysis Error Bars
STATISTICS Topic 1 IB Biology Miss Werba.
STATISTICAL ANALYSIS.
1.1 Statistical Analysis.
Presentation transcript:

Statistical Analysis adapted from the work of Stephen Taylor

Variation in populations. Variability in results. affects Confidence in conclusions. Why is this biology? The key methodology in Biology is hypothesis testing through experimentation. Carefully-designed and controlled experiments and surveys give us quantitative (numeric) data that can be compared. We can use the data collected to test our hypothesis and form explanations of the processes involved… but only if we can be confident in our results. We therefore need to be able to evaluate the reliability of a set of data and the significance of any differences we have found in the data

Which medicine should be prescribed? Generic drugs are out-of-patent, and are much cheaper than the proprietary (brand-name) equivalents. Doctors need to balance needs with available resources. Which would you choose?

Which medicine should be prescribed? Means (averages) in Biology are almost never good enough. Biological systems (and our results) show variability. Which would you choose now?

Hummingbirds are nectarivores (herbivores that feed on the nectar of some species of flower). In return for food, they pollinate the flower. This is an example of mutualism – benefit for all. As a result of natural selection, hummingbird bills have evolved. Birds with a bill best suited to their preferred food source have the greater chance of survival. Researchers studying comparative anatomy collect data on bill-length in two species of hummingbirds: Archilochus colubris (red-throated hummingbird) and Cynanthus latirostris (broadbilled hummingbird). Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels. wikimedia commons Dick Daniels Studying Comparative Anatomy

Comparative Anatomy To do this, they need to collect sufficient relevant, reliable data so they can test the Null hypothesis (H 0 ) that: “there is no significant difference in bill length between the two species.” Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid wikimedia commons Photo: Broadbilled hummingbird (wikimedia commons).wikimedia commons

Experimental Design The sample size must be large enough to provide sufficient reliable data and for us to carry out relevant statistical tests for significance. We must also be mindful of uncertainty in our measuring tools and error in our results.

Data Analysis Mean-a measure of the central tendency of a set of data. (Sum of Values/n) n= sample size Data Table Components: Descriptive title and number Uncertainty of instruments included Consistent use of decimal places

Graphing Data Graph Components: Descriptive titles and graph number Labeled points Y-axis labeled, with uncertainty Y-axis should begin at zero X-axis labeled Which species possesses the greatest bill length?

Data Spread Data could be clustered near the mean or have high variability

Calculating Range What is the range of these data? 68, 56, 65, 75, 68, 74, 21, 67, 72, 69, 71, 67

Standard Deviation A measure of the spread of most of the data. 68% of all data fall within 1 standard deviation of the mean.

Practice

Data Set- 4, 5, 5, 5, 6, 6, 6, 7, 7, 9 Mean- 6 Which of the following is the best estimate of standard deviation? A. 0B. 1C. 6D. 5

Solving for standard deviation Methods for Solving: Formula TI-83 or TI 84 (1-Var Stats) Excel (=STDEV)

Standard deviation Which of the data sets has: a. the longest bill length? b. the greatest variability in the data? (Standard deviation can have more than one decimal place)

Error Bars Represent variability in the data (represent standard deviation, range, or confidence intervals) Which of these data sets has: a. the highest mean? b. the greatest variability in the data? *Error bars can be added to Excel graphs

Graph with error bars Graph Components: Title is adjusted to show the source of the error bars. You can see the clear difference in the size of the error bars. Variability has been visualised. What does the overlap in error bars mean?

Significance

Frequency Curves

Practice Which set of data has: a. a larger range (high variability)? b. a greater standard deviation? c. a higher mean? d. a higher frequency at the mean?

The t-test Our results show a very small overlap between the two sets of data. So how do we know if the difference is significant or not? A t-test determines the significance of the difference between the means of two data sets

Probability (P) and t-test If P= 1, the data sets are exactly the same If P= 0, the data sets are not at all the same The higher the value of P, the more the data overlap The smaller the overlap, the more significant the result

The null hypothesis and the t-test Reminder: The null hypothesis, H 0 = there is no significant difference This is the ‘default’ hypothesis that we always test. In our conclusion, we either accept the null hypothesis or reject it. A t-test can be used to test whether the difference between two means is significant. If we accept H 0, then the means are not significantly different. If we reject H 0, then the means are significantly different. Remember: We are never ‘trying’ to get a difference. We design carefully-controlled experiments and then analyze the results using statistical analysis.

Using a t-value We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need. Degrees of Freedom (df)= the total sample size minus two*. We usually use P<0.05 (95% confidence) in Biology, as our data can be highly variable *Simple explanation: we are working in two directions – within each population and across populations.

Worked Example A researcher measured the wing spans of 12 red-throat and 13 broad- billed hummingbirds. H 0 = there is no significant difference df= (12+13)-2 = 23 P=.05 Therefore, critical value= t was calculated for you: t value = > If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) Conclusion: “There is a significant difference in the wing spans of the two populations of birds.” 2-tailed t-table source:

Practice A student measures 16 snail shells on the south side of an island and 15 on the north. She calculates t as 2.02 and chooses a confidence limit of 95% (P=.05). Are her results significantly different? H 0 = there is no significant difference

Practice A student measures the resting heart rates of 10 swimmers and 12 non-swimmers. He calculates t as 3.65 and chooses a confidence limit 95% (P=0.05). Are his results significant?

Using Excel for t-test Excel can calculate P directly If P<.05 our results are significant, reject H 0

Confidence intervals as error bars 95% confidence intervals plotted as error bars give a clearer indication of a signficant result: -where there is overlap, there is not a significant difference -where there is no overlap, there is a significant difference A-if the overlap is small, a t-test should still be carried out

Error bar purpose

The Chi-Squared Test Test whether an observed value or frequency is significantly different to an expected frequency. Null hypothesis (H 0 )= there is no significant difference between observed and expected results Alternate Hypothesis (H 1 )= there is a significant difference between observed and expected values Steps: 1. Calculate the value of chi-squared. 2. Compare it with the critical value at the desired level of certainty and the correct degrees of freedom

Worked Example In a cross between two heterozygous peas, we expect a 3:1 yellow:green ratio in the offspring, as shown in the punnett square. We can test this prediction experimentally and determine whether the results are significantly different from our expected ratio. We will test the H 0, there is no significant difference, by modeling with flipped coins a sample size of 50 offspring. Remember: A ratio (3 yellow:1 green) can also be expressed as probability: There is a.75 chance an offspring will be yellow. There is a.25 chance an offspring will be green.

Results

Calculating Chi-Squared

Using the Chi-Squared Value

Worked Example (try it before moving on)

Worked Example

Correlation Cartoon from:

Correlations

Examples of Correlation

Interpreting Graphs See: What is factual about the graph? What are the axes? What is being plotted What values are present? Think: How is the graph interpreted? What relationship is present? Is cause implied? What explanations are possible and what explanations are not possible? Wonder: Questions about the graph. What do you need to know more about? between-diabetes-obesity-and-physical-activity

Correlations and Causality Diabetes and obesity are ‘risk factors’ of each other. There is a strong correlation between them, but does this mean one causes the other?

Correlation does not imply causality. Pirates vs global warming, from Flying_Spaghetti_Monster#Pi rates_and_global_warming Flying_Spaghetti_Monster#Pi rates_and_global_warming

Correlation does not imply causality. Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other. To be able to determine causality through experimentation we need: One clearly identified independent variable Carefully measured dependent variable(s) that can be attributed to change in the independent variable Strict control of all other variables that might have a measurable impact on the dependent variable. We need: sufficient relevant, repeatable and statistically significant data. Some known causal relationships: Atmospheric CO 2 concentrations and global warming Atmospheric CO 2 concentrations and the rate of photosynthesis Temperature and enzyme activity

Additional Resources Analysis and Evaluation of Evidence Chi-Squared Test