Presentation is loading. Please wait.

Presentation is loading. Please wait.

The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation.

Similar presentations


Presentation on theme: "The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation."— Presentation transcript:

1 The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation Conduct an experiment or a series of controlled systematic observations Appropriate statistical tests Confirm or reject hypothesis

2 The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation Conduct an experiment or a series of controlled systematic observations Appropriate statistical tests Confirm or reject hypothesis In the intertidal zone, algae seem to be confined to specific areas There will be a positive correlation of algal abundance and tide height Measure tide heights and count number of algae at each Product-moment correlation There is a positive correlation of tide height and algal abundance Algal will grow higher on the shore in areas of high wave action

3 Imagine that you are collecting samples (i.e. individuals) from a population of little ball creatures - Critterus sphericales Little ball creatures come in 3 sizes: Small = Medium = Large =

4 -sample 1 -sample 2 -sample 3 -sample 4 -sample 5 You take a total of five samples

5 The real population (all the little ball creatures that exist) Your samples

6 Each sample is a representation of the population BUT No single sample can be expected to accurately represent the whole population So……………

7 To be statistically valid, each sample must be: 1) Random: Thrown quadrat?? Guppies netted from an aquarium?

8 To be truly random: 20 15 Choose numbers randomly from 1 to 300

9 To be truly random: 20 15 Choose numbers randomly from 1 to 300

10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Assign numbers from a random number table

11 To be statistically valid, each sample must be: 2) Replicated:

12 Bark Samples for levels of cadmium

13 Pseudoreplicated Sample size (n) =1 Not pseudoreplicated Sample size (n) =10 10 samples from 10 different trees 10 samples from the same tree

14 IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation

15 CHARACTERIZING DATA

16 Variables -dependent – in any experiment, the dependent variable is the one being measured by the experimenter -also known as a reponse or test variable -independent – in any experiment, the independent variable is the one being changed by the experimenter -also known as a factor

17 Nominal data (nominal scales, nominal variables) Drosophila genetic traits - data are in categories Species Sex

18 Look at the distribution of lizards in the forests Tree branches Tree trunks Ground Species ASpecies BSpecies CSpecies D

19 - Both the dependent and independent variables are nominal/categorical Habitat GroundTree trunkTree branchSpecies totals Lizard Species Species A901524 Species B901221 Species C95014 Species D910322 Totals36153081

20 - data are in categories -grades Ordinal data (ordinal scales, ordinal variables) - categories are ranked -surveys -behavioural responses

21 Interval data (interval scales, interval variables) zero point depends on the scale used e.g. temperature - constant size interval - no true zero point - values can be treated arithmetically (only +, -) to give a meaningful result

22 Ratio data (or ratio scales or ratio variables) - constant size interval - a zero point with some reality height weight time - values can be treated arithmetically (+, -, x, ÷ ) to give a meaningful result

23 Ratio data (or ratio scales or ratio variables) - constant size interval - a zero point with some reality Can also be continuous - values can be treated arithmetically (+, -, x, ÷ ) to give a meaningful result Or discrete - counts, “number of …..”

24 Kinds of Variables Assignment as a discrete (= categorical) or continuous variable can depend on the method of measurement Dappled Full Open Continuous Discrete ( = categorical)

25 The kind of data you are dealing with is one determining factor in the kind of statistical test you will use.

26 IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation

27 Two ways of arriving at a conclusion 2. Inductive inference sample population sample population 1. Deductive inference

28 IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation

29 Imagine the following experiment: 2 groups of crickets Group 1 – fed a diet with extra supplements Group 2 – fed a diet with no supplements Weights 12.113.913.012.1 14.912.212.914.9 13.612.013.513.6 12.015.912.412.0 10.912.111.010.9 9.18.911.010.1 9.99.28.011.9 8.69.08.59.6 10.010.99.48.0 11.97.110.08.9 Mean = 12.8 Mean = 9.49

30 What you’re doing here is comparing two samples that, because you’ve not violated any of the assumptions we saw before, should represent populations that look like this: 9.4912.8 Are the means of these populations different?? Frequency Weight

31 Are the means of these populations different?? To answer this question – use a statistical test A statistical test is just a method of determining mathematically whether you definitively say ‘yes’ or ‘no’ to this question What test should I use??

32 IF YOU HAVEN’T VIOLATED ANY OF THE ASSUMPTIONS WE MENTIONED BEFORE…… Number of groups compared 2 other than 2 T -test Direction of difference specified? YesNo One-tailedTwo- tailed Does each data point in one data set (population) have a corresponding one in the other data set? YesNo Paired t-testUnpaired t-test Are the means of two populations the same? Are the means of more than two populations the same? Number of factors being tested 12>2 Does each data point in one data set (population) have a corresponding one in the other data sets? Two way ANOVA ANOVA YesNo One way ANOVA Repeated Measures ANOVA Other tests

33 A simple t-test 1. State hypotheses H o – there is no difference between the means of the two populations of crickets (i.e. the extra nutrients had no effect on weight) H 1 – there is a difference between the means of the two populations of crickets (i.e. the extra nutrients had an effect on weight)

34 A simple t-test 2. Calculate a t-value (any stats program does this for you) 3. Use a probability table for the test you used to determine the probability that corresponds to the t- value that was calculated. (for the truly masochistic)

35 A simple t-test 2. Calculate a t-value (any stats program does this for you) 3. Use a probability table for the test you used to determine the probability that corresponds to the t- value that was calculated. DataTest statisticProbability

36 Unpaired t test Do the means of Nutrient fed and No nutrient differ significantly? P value The two-tailed P value is < 0.0001, considered extremely significant. t = 7.941 with 38 degrees of freedom. 95% confidence interval Mean difference = -3.307 (Mean of No nutrient minus mean of Nutrient fed) The 95% confidence interval of the difference: -4.150 to -2.464 Assumption test: Are the standard deviations equal? The t test assumes that the columns come from populations with equal SDs. The following calculations test that assumption. F = 1.192 The P value is 0.7062. This test suggests that the difference between the two SDs is not significant. Assumption test: Are the data sampled from Gaussian distributions? The t test assumes that the data are sampled from populations that follow Gaussian distributions. This assumption is tested using the method Kolmogorov and Smirnov: Group KS P Value Passed normality test? =============== ====== ======== ======================= Nutrient fed 0.1676 >0.10 Yes No nutrient 0.1279 >0.10 Yes

37 Interpretation of p <.0001? This means that there is less than 1 chance in 10,000 that these two means are from the same population. In the world of statistics, that is too small a chance to have happened randomly and so the H o is rejected and the H 1 accepted

38 For all statistical tests that you’ll use, it is convention that the minimum probability that two samples can differ and still be from the same population is 5% or p =.05

39 What happens if you violate any of the assumptions? Step 1 - Panic

40 What happens if you violate any of the assumptions? Step 1 - Panic Step 2 - It depends on what assumptions have been violated. AssumptionOther testsAnother solution? 1. Continuous dataYes 2. Ratio/intervalYes 3. Normal distributionYesTransform the data 4. Equal varianceYes - Welch’s 5. Sample PopulationYes 6. N<10YesTake more samples

41 Nonparametric Tests These tests are used when the assumptions of t-tests and ANOVA have been violated They are called “nonparametric” because there is no estimation of parameters (means, standard deviations or variances) involved. Several kinds: 1)Goodness-of-Fit tests - when you calculate an expected value 2)Non-parametric equivalents of parametric tests

42 SUMMARY Problem - trying to determine the expected frequencies of any result in a particular experiment Type of data Discrete 2 categories & Bernoulli process > 2 categories Use a Binomial model to calculate expected frequencies Use a Poisson distribution to calculate expected frequencies

43 Consider the following problem: Sampling earthworms 25 plots 13 24 31 41 53 60 70 81 92 103 114 125 130 141 153 165 175 182 196 203 211 221 231 240 251 Quadrat# of worms

44 13 24 31 41 53 60 70 81 92 103 114 125 130 141 153 165 175 182 196 203 211 221 231 240 251 Quadrat# of worms N = 25 X = 2.24 worms/quadrat

45 What is the expected number of worms/quadrat? OR What is the probability of x worms being in a particular quadrat?

46 Use a Poisson distribution ->2 mutually exclusive categories -N is relatively large and p is relatively small The distribution of worms in space is expected to be random

47 Formula for a Poisson distribution P x = e -µ µ x X! Probability of observing X individuals in a category Base of natural logarithms (= 2.71828….) True mean of the population (approximated by sample mean) An integer (number of indviduals)

48 Formula for a Poisson distribution P x = e -µ µ x X! Probability of observing X worms in a quadrat Base of natural logarithms (= 2.71828….) µ = X = 2.24 Number of worms)

49 # of worms Probability of finding X worms in a quadrat Calculation 0Po = e -µ (µ x /0!)=e -2.24 =.1065 1Po = e -µ (µ 1 /1!)=e -2.24 (2.24/1) =.2385 2Po = e -µ (µ 2 /2!)=e -2.24 (2.24 2 /2) =.2671 3Po = e -µ (µ 3 /3!)=e -2.24 (2.24 3 /6) =.1994 4Po = e -µ (µ 4 /4!)=e -2.24 (2.24 4 /24) =.1117 5Po = e -µ (µ 5 /5!)=.05 6Po = e -µ (µ 6 /6!)=.0187 7Po = e -µ (µ 7 /7!)=.006 Could go on forever or to ∞ - whichever comes first!

50 Practically…. P 0 + P 1 + P 2 + P 3 + P 4 + P 5 + P 6 + P 7 =.998 And P 8 + P 9 ……=.002 For convenience - P 8 =.002

51 Other kinds of Poisson problems 1. Cell counts in a hemocytometer 2. Number of parasitic mites per fly in a population 3. Number of fish per seine 4. Number of animals in a particular subdivision of the habitat Poisson Distributions are very common in biological work!

52 Goodness-of-Fit Tests Use with nominal scale data e.g. results of genetic crosses Also, you’re using the population to deduce what the sample should look like

53 Classic example - genetic crosses Do they conform to an “expected’ Mendelian ratio? Back to our little ball creatures - Critterus sphericales Phenotypes: A_B_ A_bb aaB_ aabb Mendelian inheritance -Predict a 9:3:3:1 ratio

54 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676

55 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676 Expected (e)18060 20

56 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676 Expected (e)18060 20 o - e14-77-14

57 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676 Expected (e)18060 20 o - e14-77-14 (o - e) 2 19649 196

58 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676 Expected (e)18060 20 o - e14-77-14 (o - e) 2 19649 196 (o - e) 2 e 1.08.82 9.8

59 -sampled 320 animals A_B_A_bbaaB_aabb Observed (o)19453676 Expected (e)18060 20 o - e14-77-14 (o - e) 2 19649 196 (o - e) 2 e 1.08.82 9.8 (o -e) 2 e   2 = = 1.08 +.82 +.82 + 9.8 = 12.52 df = number of classes -1 = 3

60 X 2 = 12.52Critical value for 3 degrees of freedomat.05 level is7.82 X 2 Table Conclusion: Probability of these data fitting the expected distribution is <.05, therefore they are not from a Mendelian population The actual probability of X 2 =12.52 and df = 3 is.01 > p >.001

61 A little X 2 wrinkle - the Yates correction Formula is (o -e) 2 e   2 = Except of df = 1 (i.e. you’re using two categories of data) Then the formula becomes (|o -e| - 0.5) 2 e   2 =

62 Type of dataNumber of samples Are data related? Test to use Nominal2YesMcNemar Nominal2NoFisher’s Exact Nominal>2YesCochran’s Q Summary!

63 Type of dataNumber of samplesAre data related?Test to use Nominal2YesMcNemar Nominal2NoFisher’s Exact Nominal>2YesCochran’s Q Ordinal1NoKomolgorov- Smirnov Ordinal+2YesWilcoxon (paired t-test analogue) Ordinal+2NoMann Whitney U (unpaired t-test analogue) Ordinal+>2NoKruskal Wallis (analogue of one- way ANOVA Ordinal>2YesFriedman two-way ANOVA All of the parametric tests (remember the big flow chart!) have non-parametric equivalents (or analogues)


Download ppt "The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation."

Similar presentations


Ads by Google