The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation Conduct an experiment or a series of controlled systematic observations Appropriate statistical tests Confirm or reject hypothesis
The usual course of events for conducting scientific work “The Scientific Method” Reformulate or extend hypothesis Develop a Working Hypothesis Observation Conduct an experiment or a series of controlled systematic observations Appropriate statistical tests Confirm or reject hypothesis In the intertidal zone, algae seem to be confined to specific areas There will be a positive correlation of algal abundance and tide height Measure tide heights and count number of algae at each Product-moment correlation There is a positive correlation of tide height and algal abundance Algal will grow higher on the shore in areas of high wave action
Imagine that you are collecting samples (i.e. individuals) from a population of little ball creatures - Critterus sphericales Little ball creatures come in 3 sizes: Small = Medium = Large =
-sample 1 -sample 2 -sample 3 -sample 4 -sample 5 You take a total of five samples
The real population (all the little ball creatures that exist) Your samples
Each sample is a representation of the population BUT No single sample can be expected to accurately represent the whole population So……………
To be statistically valid, each sample must be: 1) Random: Thrown quadrat?? Guppies netted from an aquarium?
To be truly random: Choose numbers randomly from 1 to 300
To be truly random: Choose numbers randomly from 1 to 300
Assign numbers from a random number table
To be statistically valid, each sample must be: 2) Replicated:
Bark Samples for levels of cadmium
Pseudoreplicated Sample size (n) =1 Not pseudoreplicated Sample size (n) =10 10 samples from 10 different trees 10 samples from the same tree
IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation
CHARACTERIZING DATA
Variables -dependent – in any experiment, the dependent variable is the one being measured by the experimenter -also known as a reponse or test variable -independent – in any experiment, the independent variable is the one being changed by the experimenter -also known as a factor
Nominal data (nominal scales, nominal variables) Drosophila genetic traits - data are in categories Species Sex
Look at the distribution of lizards in the forests Tree branches Tree trunks Ground Species ASpecies BSpecies CSpecies D
- Both the dependent and independent variables are nominal/categorical Habitat GroundTree trunkTree branchSpecies totals Lizard Species Species A Species B Species C95014 Species D Totals
- data are in categories -grades Ordinal data (ordinal scales, ordinal variables) - categories are ranked -surveys -behavioural responses
Interval data (interval scales, interval variables) zero point depends on the scale used e.g. temperature - constant size interval - no true zero point - values can be treated arithmetically (only +, -) to give a meaningful result
Ratio data (or ratio scales or ratio variables) - constant size interval - a zero point with some reality height weight time - values can be treated arithmetically (+, -, x, ÷ ) to give a meaningful result
Ratio data (or ratio scales or ratio variables) - constant size interval - a zero point with some reality Can also be continuous - values can be treated arithmetically (+, -, x, ÷ ) to give a meaningful result Or discrete - counts, “number of …..”
Kinds of Variables Assignment as a discrete (= categorical) or continuous variable can depend on the method of measurement Dappled Full Open Continuous Discrete ( = categorical)
The kind of data you are dealing with is one determining factor in the kind of statistical test you will use.
IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation
Two ways of arriving at a conclusion 2. Inductive inference sample population sample population 1. Deductive inference
IF YOUR DATA ARE: 1. Continuous data 2. Ratio or interval 3. Approximately normal distribution 4. Equal variance (F-test) 5. Conclusions about population based on sample (inductive) 6. Sample size > 10 samplepopulation
Imagine the following experiment: 2 groups of crickets Group 1 – fed a diet with extra supplements Group 2 – fed a diet with no supplements Weights Mean = 12.8 Mean = 9.49
What you’re doing here is comparing two samples that, because you’ve not violated any of the assumptions we saw before, should represent populations that look like this: Are the means of these populations different?? Frequency Weight
Are the means of these populations different?? To answer this question – use a statistical test A statistical test is just a method of determining mathematically whether you definitively say ‘yes’ or ‘no’ to this question What test should I use??
IF YOU HAVEN’T VIOLATED ANY OF THE ASSUMPTIONS WE MENTIONED BEFORE…… Number of groups compared 2 other than 2 T -test Direction of difference specified? YesNo One-tailedTwo- tailed Does each data point in one data set (population) have a corresponding one in the other data set? YesNo Paired t-testUnpaired t-test Are the means of two populations the same? Are the means of more than two populations the same? Number of factors being tested 12>2 Does each data point in one data set (population) have a corresponding one in the other data sets? Two way ANOVA ANOVA YesNo One way ANOVA Repeated Measures ANOVA Other tests
A simple t-test 1. State hypotheses H o – there is no difference between the means of the two populations of crickets (i.e. the extra nutrients had no effect on weight) H 1 – there is a difference between the means of the two populations of crickets (i.e. the extra nutrients had an effect on weight)
A simple t-test 2. Calculate a t-value (any stats program does this for you) 3. Use a probability table for the test you used to determine the probability that corresponds to the t- value that was calculated. (for the truly masochistic)
A simple t-test 2. Calculate a t-value (any stats program does this for you) 3. Use a probability table for the test you used to determine the probability that corresponds to the t- value that was calculated. DataTest statisticProbability
Unpaired t test Do the means of Nutrient fed and No nutrient differ significantly? P value The two-tailed P value is < , considered extremely significant. t = with 38 degrees of freedom. 95% confidence interval Mean difference = (Mean of No nutrient minus mean of Nutrient fed) The 95% confidence interval of the difference: to Assumption test: Are the standard deviations equal? The t test assumes that the columns come from populations with equal SDs. The following calculations test that assumption. F = The P value is This test suggests that the difference between the two SDs is not significant. Assumption test: Are the data sampled from Gaussian distributions? The t test assumes that the data are sampled from populations that follow Gaussian distributions. This assumption is tested using the method Kolmogorov and Smirnov: Group KS P Value Passed normality test? =============== ====== ======== ======================= Nutrient fed >0.10 Yes No nutrient >0.10 Yes
Interpretation of p <.0001? This means that there is less than 1 chance in 10,000 that these two means are from the same population. In the world of statistics, that is too small a chance to have happened randomly and so the H o is rejected and the H 1 accepted
For all statistical tests that you’ll use, it is convention that the minimum probability that two samples can differ and still be from the same population is 5% or p =.05
What happens if you violate any of the assumptions? Step 1 - Panic
What happens if you violate any of the assumptions? Step 1 - Panic Step 2 - It depends on what assumptions have been violated. AssumptionOther testsAnother solution? 1. Continuous dataYes 2. Ratio/intervalYes 3. Normal distributionYesTransform the data 4. Equal varianceYes - Welch’s 5. Sample PopulationYes 6. N<10YesTake more samples
Nonparametric Tests These tests are used when the assumptions of t-tests and ANOVA have been violated They are called “nonparametric” because there is no estimation of parameters (means, standard deviations or variances) involved. Several kinds: 1)Goodness-of-Fit tests - when you calculate an expected value 2)Non-parametric equivalents of parametric tests
SUMMARY Problem - trying to determine the expected frequencies of any result in a particular experiment Type of data Discrete 2 categories & Bernoulli process > 2 categories Use a Binomial model to calculate expected frequencies Use a Poisson distribution to calculate expected frequencies
Consider the following problem: Sampling earthworms 25 plots Quadrat# of worms
Quadrat# of worms N = 25 X = 2.24 worms/quadrat
What is the expected number of worms/quadrat? OR What is the probability of x worms being in a particular quadrat?
Use a Poisson distribution ->2 mutually exclusive categories -N is relatively large and p is relatively small The distribution of worms in space is expected to be random
Formula for a Poisson distribution P x = e -µ µ x X! Probability of observing X individuals in a category Base of natural logarithms (= ….) True mean of the population (approximated by sample mean) An integer (number of indviduals)
Formula for a Poisson distribution P x = e -µ µ x X! Probability of observing X worms in a quadrat Base of natural logarithms (= ….) µ = X = 2.24 Number of worms)
# of worms Probability of finding X worms in a quadrat Calculation 0Po = e -µ (µ x /0!)=e = Po = e -µ (µ 1 /1!)=e (2.24/1) = Po = e -µ (µ 2 /2!)=e ( /2) = Po = e -µ (µ 3 /3!)=e ( /6) = Po = e -µ (µ 4 /4!)=e ( /24) = Po = e -µ (µ 5 /5!)=.05 6Po = e -µ (µ 6 /6!)= Po = e -µ (µ 7 /7!)=.006 Could go on forever or to ∞ - whichever comes first!
Practically…. P 0 + P 1 + P 2 + P 3 + P 4 + P 5 + P 6 + P 7 =.998 And P 8 + P 9 ……=.002 For convenience - P 8 =.002
Other kinds of Poisson problems 1. Cell counts in a hemocytometer 2. Number of parasitic mites per fly in a population 3. Number of fish per seine 4. Number of animals in a particular subdivision of the habitat Poisson Distributions are very common in biological work!
Goodness-of-Fit Tests Use with nominal scale data e.g. results of genetic crosses Also, you’re using the population to deduce what the sample should look like
Classic example - genetic crosses Do they conform to an “expected’ Mendelian ratio? Back to our little ball creatures - Critterus sphericales Phenotypes: A_B_ A_bb aaB_ aabb Mendelian inheritance -Predict a 9:3:3:1 ratio
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o)
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o) Expected (e)
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o) Expected (e) o - e
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o) Expected (e) o - e (o - e)
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o) Expected (e) o - e (o - e) (o - e) 2 e
-sampled 320 animals A_B_A_bbaaB_aabb Observed (o) Expected (e) o - e (o - e) (o - e) 2 e (o -e) 2 e 2 = = = df = number of classes -1 = 3
X 2 = 12.52Critical value for 3 degrees of freedomat.05 level is7.82 X 2 Table Conclusion: Probability of these data fitting the expected distribution is <.05, therefore they are not from a Mendelian population The actual probability of X 2 =12.52 and df = 3 is.01 > p >.001
A little X 2 wrinkle - the Yates correction Formula is (o -e) 2 e 2 = Except of df = 1 (i.e. you’re using two categories of data) Then the formula becomes (|o -e| - 0.5) 2 e 2 =
Type of dataNumber of samples Are data related? Test to use Nominal2YesMcNemar Nominal2NoFisher’s Exact Nominal>2YesCochran’s Q Summary!
Type of dataNumber of samplesAre data related?Test to use Nominal2YesMcNemar Nominal2NoFisher’s Exact Nominal>2YesCochran’s Q Ordinal1NoKomolgorov- Smirnov Ordinal+2YesWilcoxon (paired t-test analogue) Ordinal+2NoMann Whitney U (unpaired t-test analogue) Ordinal+>2NoKruskal Wallis (analogue of one- way ANOVA Ordinal>2YesFriedman two-way ANOVA All of the parametric tests (remember the big flow chart!) have non-parametric equivalents (or analogues)