Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analysis adapted from the work of Stephen Taylor.

Similar presentations


Presentation on theme: "Statistical Analysis adapted from the work of Stephen Taylor."— Presentation transcript:

1 Statistical Analysis adapted from the work of Stephen Taylor

2 Variation in populations. Variability in results. affects Confidence in conclusions. Why is this biology? The key methodology in Biology is hypothesis testing through experimentation. Carefully-designed and controlled experiments and surveys give us quantitative (numeric) data that can be compared. We can use the data collected to test our hypothesis and form explanations of the processes involved… but only if we can be confident in our results. We therefore need to be able to evaluate the reliability of a set of data and the significance of any differences we have found in the data

3 Which medicine should be prescribed? Generic drugs are out-of-patent, and are much cheaper than the proprietary (brand-name) equivalents. Doctors need to balance needs with available resources. Which would you choose?

4 Which medicine should be prescribed? Means (averages) in Biology are almost never good enough. Biological systems (and our results) show variability. Which would you choose now?

5 Hummingbirds are nectarivores (herbivores that feed on the nectar of some species of flower). In return for food, they pollinate the flower. This is an example of mutualism – benefit for all. As a result of natural selection, hummingbird bills have evolved. Birds with a bill best suited to their preferred food source have the greater chance of survival. Researchers studying comparative anatomy collect data on bill-length in two species of hummingbirds: Archilochus colubris (red-throated hummingbird) and Cynanthus latirostris (broadbilled hummingbird). Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels. wikimedia commons Dick Daniels Studying Comparative Anatomy

6 Comparative Anatomy To do this, they need to collect sufficient relevant, reliable data so they can test the Null hypothesis (H 0 ) that: “there is no significant difference in bill length between the two species.” Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid wikimedia commons Photo: Broadbilled hummingbird (wikimedia commons).wikimedia commons

7 Experimental Design The sample size must be large enough to provide sufficient reliable data and for us to carry out relevant statistical tests for significance. We must also be mindful of uncertainty in our measuring tools and error in our results.

8

9 Data Analysis Mean-a measure of the central tendency of a set of data. (Sum of Values/n) n= sample size Data Table Components: Descriptive title and number Uncertainty of instruments included Consistent use of decimal places

10 Graphing Data Graph Components: Descriptive titles and graph number Labeled points Y-axis labeled, with uncertainty Y-axis should begin at zero X-axis labeled Which species possesses the greatest bill length?

11 Data Spread Data could be clustered near the mean or have high variability

12 Calculating Range What is the range of these data? 68, 56, 65, 75, 68, 74, 21, 67, 72, 69, 71, 67

13 Standard Deviation A measure of the spread of most of the data. 68% of all data fall within 1 standard deviation of the mean.

14 Practice

15 Data Set- 4, 5, 5, 5, 6, 6, 6, 7, 7, 9 Mean- 6 Which of the following is the best estimate of standard deviation? A. 0B. 1C. 6D. 5

16 Solving for standard deviation Methods for Solving: Formula TI-83 or TI 84 (1-Var Stats) Excel (=STDEV)

17 Standard deviation Which of the data sets has: a. the longest bill length? b. the greatest variability in the data? (Standard deviation can have more than one decimal place)

18 Error Bars Represent variability in the data (represent standard deviation, range, or confidence intervals) Which of these data sets has: a. the highest mean? b. the greatest variability in the data? *Error bars can be added to Excel graphs

19 Graph with error bars Graph Components: Title is adjusted to show the source of the error bars. You can see the clear difference in the size of the error bars. Variability has been visualised. What does the overlap in error bars mean?

20 Significance

21 Frequency Curves

22 Practice Which set of data has: a. a larger range (high variability)? b. a greater standard deviation? c. a higher mean? d. a higher frequency at the mean?

23 The t-test Our results show a very small overlap between the two sets of data. So how do we know if the difference is significant or not? A t-test determines the significance of the difference between the means of two data sets

24 Probability (P) and t-test If P= 1, the data sets are exactly the same If P= 0, the data sets are not at all the same The higher the value of P, the more the data overlap The smaller the overlap, the more significant the result

25 The null hypothesis and the t-test Reminder: The null hypothesis, H 0 = there is no significant difference This is the ‘default’ hypothesis that we always test. In our conclusion, we either accept the null hypothesis or reject it. A t-test can be used to test whether the difference between two means is significant. If we accept H 0, then the means are not significantly different. If we reject H 0, then the means are significantly different. Remember: We are never ‘trying’ to get a difference. We design carefully-controlled experiments and then analyze the results using statistical analysis.

26 Using a t-value We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need. Degrees of Freedom (df)= the total sample size minus two*. We usually use P<0.05 (95% confidence) in Biology, as our data can be highly variable *Simple explanation: we are working in two directions – within each population and across populations.

27 Worked Example A researcher measured the wing spans of 12 red-throat and 13 broad- billed hummingbirds. H 0 = there is no significant difference df= (12+13)-2 = 23 P=.05 Therefore, critical value= 2.069 t was calculated for you: t value = 2.15 2.15 > 2.069 If t < cv, accept H 0 (there is no significant difference) If t > cv, reject H 0 (there is a significant difference) Conclusion: “There is a significant difference in the wing spans of the two populations of birds.” 2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php http://www.medcalc.org/manual/t-distribution.php

28 Practice A student measures 16 snail shells on the south side of an island and 15 on the north. She calculates t as 2.02 and chooses a confidence limit of 95% (P=.05). Are her results significantly different? H 0 = there is no significant difference

29 Practice A student measures the resting heart rates of 10 swimmers and 12 non-swimmers. He calculates t as 3.65 and chooses a confidence limit 95% (P=0.05). Are his results significant?

30 Using Excel for t-test Excel can calculate P directly If P<.05 our results are significant, reject H 0

31 Confidence intervals as error bars 95% confidence intervals plotted as error bars give a clearer indication of a signficant result: -where there is overlap, there is not a significant difference -where there is no overlap, there is a significant difference A-if the overlap is small, a t-test should still be carried out

32 Error bar purpose

33 The Chi-Squared Test Test whether an observed value or frequency is significantly different to an expected frequency. Null hypothesis (H 0 )= there is no significant difference between observed and expected results Alternate Hypothesis (H 1 )= there is a significant difference between observed and expected values Steps: 1. Calculate the value of chi-squared. 2. Compare it with the critical value at the desired level of certainty and the correct degrees of freedom

34 Worked Example In a cross between two heterozygous peas, we expect a 3:1 yellow:green ratio in the offspring, as shown in the punnett square. We can test this prediction experimentally and determine whether the results are significantly different from our expected ratio. We will test the H 0, there is no significant difference, by modeling with flipped coins a sample size of 50 offspring. Remember: A ratio (3 yellow:1 green) can also be expressed as probability: There is a.75 chance an offspring will be yellow. There is a.25 chance an offspring will be green.

35 Results

36 Calculating Chi-Squared

37

38 Using the Chi-Squared Value

39 Worked Example (try it before moving on)

40 Worked Example

41

42

43 Correlation Cartoon from: http://www.xkcd.com/552/ http://www.xkcd.com/552/

44 Correlations

45 Examples of Correlation

46 Interpreting Graphs See: What is factual about the graph? What are the axes? What is being plotted What values are present? Think: How is the graph interpreted? What relationship is present? Is cause implied? What explanations are possible and what explanations are not possible? Wonder: Questions about the graph. What do you need to know more about? http://diabetes-obesity.findthedata.org/b/240/Correlations- between-diabetes-obesity-and-physical-activity

47 Correlations and Causality Diabetes and obesity are ‘risk factors’ of each other. There is a strong correlation between them, but does this mean one causes the other?

48 Correlation does not imply causality. Pirates vs global warming, from http://en.wikipedia.org/wiki/ Flying_Spaghetti_Monster#Pi rates_and_global_warming http://en.wikipedia.org/wiki/ Flying_Spaghetti_Monster#Pi rates_and_global_warming

49 Correlation does not imply causality. Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other. To be able to determine causality through experimentation we need: One clearly identified independent variable Carefully measured dependent variable(s) that can be attributed to change in the independent variable Strict control of all other variables that might have a measurable impact on the dependent variable. We need: sufficient relevant, repeatable and statistically significant data. Some known causal relationships: Atmospheric CO 2 concentrations and global warming Atmospheric CO 2 concentrations and the rate of photosynthesis Temperature and enzyme activity

50 Additional Resources Analysis and Evaluation of Evidence Chi-Squared Test


Download ppt "Statistical Analysis adapted from the work of Stephen Taylor."

Similar presentations


Ads by Google