1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal
2 Advances in Statistics Extensions to the ANOVA Computer-intensive methods Maximum likelihood
3 Extensions to ANOVA One-way ANOVA –This works for a single explanatory variable –Simplest possible design Two-way ANOVA –Two categorical explanatory variables –Factorial design
4 ANOVA Tables Source of variation Sum of squaresdfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 *
5 Two-factor ANOVA Table Source of variation Sum of Squares dfMean SquareF ratioP Treatment 1SS 1 k 1 - 1SS 1 k MS 1 MSE Treatment 2SS 2 k 2 - 1SS 2 k MS 2 MSE Treatment 1 * Treatment 2 SS 1*2 (k 1 - 1)*(k 2 - 1)SS 1*2 (k 1 - 1)*(k 2 - 1) MS 1*2 MSE ErrorSS error XXXSS error XXX TotalSS total N-1
6 Two-factor ANOVA Table Source of variation Sum of Squares dfMean SquareF ratioP Treatment 1SS 1 k 1 - 1SS 1 k MS 1 MSE Treatment 2SS 2 k 2 - 1SS 2 k MS 2 MSE Treatment 1 * Treatment 2 SS 1*2 (k 1 - 1)*(k 2 - 1)SS 1*2 (k 1 - 1)*(k 2 - 1) MS 1*2 MSE ErrorSS error XXXSS error XXX TotalSS total N-1 Two categorical explanatory variables
7 General Linear Models Used to analyze variation in Y when there is more than one explanatory variable Explanatory variables can be categorical or numerical
8 General Linear Models First step: formulate a model statement Example:
9 General Linear Models First step: formulate a model statement Example: Overall mean Treatment effect
10 General Linear Models Second step: Make an ANOVA table Example: Source of variation Sum of squares dfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 *
11 General Linear Models Second step: Make an ANOVA table Example: Source of variation Sum of squares dfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 * This is the same as a one-way ANOVA!
12 General Linear Models If there is only one explanatory variable, these are exactly equivalent to things we’ve already done –One categorical variable: ANOVA –One numerical variable: regression Great for more complicated situations
13 Example 1: Experiment with blocking Fish experiment: sensitivity of goldfish to light Fish are randomly selected from the population Four different light treatments are applied to each fish
14 Randomized Block Design Blocks (fish) Treatments (light wavelengths)
15 Randomized Block Design
16 Step 1: Make a model statement
17 Step 2: Make an ANOVA table
18 Another Example: Mole Rats Are there lazy mole rats? Two variables: –Worker type: categorical “frequent workers” and “infrequent workers” –Body mass (ln-transformed): numerical
19
20 Step 1: Make a model statement
21 Step 2: Make an ANOVA table
22 Step 2: Make an ANOVA table
23 Step 1: Make a model statement
24 Step 2: Make an ANOVA table
25 Step 2: Make an ANOVA table Also called ANCOVA- Analysis of Covariance
26 General Linear Models Can handle any number of predictor variables Each can be categorical or numerical Tables have the same basic structure Same assumptions as ANOVA
27 General Linear Models Don’t run out of degrees of freedom! Sometimes, the F-statistics will have DIFFERENT denominators - see book for an example
28 Computer-intensive methods Hypothesis testing: –Simulation –Randomization Confidence intervals –Bootstrap
29 Simulation Simulates the sampling process on a computer many times: generates the null distribution from estimates done on the simulated data Computer assumes the null hypothesis is true
30 Example: Social spider sex ratios Social spiders live in groups
31 Example: Social spider sex ratios Groups are mostly females Hypothesis: Groups have just enough males to allow reproduction Test: Whether distribution of number of males is as predicted by chance Problem: Groups are of many different sizes Binomial distribution therefore doesn’t apply
32 Simulation: For each group, the number of spiders is known. The overall proportion of males, p m, is known. For each group, the computer draws the real number of spiders, and each has p m probability of being male. This is done for all groups, and the variance in proportion of males is calculated. This is repeated a large number of times.
33 The observed value (0.44), or something more extreme, is observed in only 4.9% of the simulations. Therefore P =
34 Randomization Used for hypothesis testing Mixes the real data randomly Variable 1 from an individual is paired with variable 2 data from a randomly chosen individual. This is done for all individuals. The estimate is made on the randomized data. The whole process is repeated numerous times. The distribution of the randomized estimates is the null distribution.
35 Without replacement Randomization is done without replacement. In other words, all data points are used exactly once in each randomized data set.
36 Randomization can be done for any test of association between two variables
37 Example: Sage crickets Sage cricket males sometimes offer their hind-wings to females to eat during mating. Do females who eat hind-wings wait longer to re-mate?
38
39 Problems: Unequal variance, non-normal distributions
40 Male wingless Male winged Real data: Randomized data: Male wingless Male winged
41 Note that each data point was only used once
randomizations P < 0.001
43 Randomization: Other questions Q: Is this periodic? (yes)
44 Bootstrap Method for estimation (and confidence intervals) Often used for hypothesis testing too "Picking yourself up by your own bootstraps"
45 Bootstrap For each group, randomly pick with replacement an equal number of data points, from the data of that group With this bootstrap dataset, calculate the estimate -- bootstrap replicate estimate
46 Male wingless Male winged Real data: Bootstrap data: Male wingless Male winged
47
48 Bootstraps are often used in evolutionary trees
49 Likelihood Likelihood considers many possible hypotheses, not just one
50 Law of likelihood A particular data set supports one hypothesis better than another if the likelihood of that hypothesis is higher than the likelihood of the other hypothesis. Therefore we try to find the hypothesis with the maximum likelihood.
51 All estimates we have learned so far are also maximum likelihood estimates.
52 "Simple" example Using likelihood to estimate a proportion Data: 3 out of 8 individuals are male. Question: What is the maximum likelihood estimate of the proportion of males?
53 Likelihood where x is a hypothesized value of the proportion of males. e.g., L(p=0.5) is the likelihood of the hypothesis that the proportion of males is 0.5.
54 For this example only... The probability of getting 3 males out of 8 independent trials is given by the binomial distribution.
55 How to find maximum likelihood hypothesis 1.Calculus or 2.Computer calculations
56 By calculus... Maximum value of L(p=x) is found when x = 3/8. Note that this is the same value we would have gotten by methods we already learned.
57 By computer calculation... Input likelihood formula to computer, plot the value of L for each value of x, and find the largest L.
58 Finding genes for corn yield: Corn Chromosome 5
59 Hypothesis testing by likelihood Compares the likelihood of maximum likelihood estimate to a null hypothesis Log-likelihood ratio =
60 Test statistic With df equal to the number of variables fixed to make null hypothesis
61 Example:3 males out of 8 individuals H 0 : 50% are male Maximum likelihood estimate
62 Likelihood of null hypothesis
63 Log likelihood ratio We fixed one variable in the null hypothesis (p), So the test has df = 1., so we do not reject H 0.