Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.

Similar presentations


Presentation on theme: "1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal."— Presentation transcript:

1 1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal

2 2 Advances in Statistics Extensions to the ANOVA Computer-intensive methods Maximum likelihood

3 3 Extensions to ANOVA One-way ANOVA –This works for a single explanatory variable –Simplest possible design Two-way ANOVA –Two categorical explanatory variables –Factorial design

4 4 ANOVA Tables Source of variation Sum of squaresdfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 *

5 5 Two-factor ANOVA Table Source of variation Sum of Squares dfMean SquareF ratioP Treatment 1SS 1 k 1 - 1SS 1 k 1 - 1 MS 1 MSE Treatment 2SS 2 k 2 - 1SS 2 k 2 - 1 MS 2 MSE Treatment 1 * Treatment 2 SS 1*2 (k 1 - 1)*(k 2 - 1)SS 1*2 (k 1 - 1)*(k 2 - 1) MS 1*2 MSE ErrorSS error XXXSS error XXX TotalSS total N-1

6 6 Two-factor ANOVA Table Source of variation Sum of Squares dfMean SquareF ratioP Treatment 1SS 1 k 1 - 1SS 1 k 1 - 1 MS 1 MSE Treatment 2SS 2 k 2 - 1SS 2 k 2 - 1 MS 2 MSE Treatment 1 * Treatment 2 SS 1*2 (k 1 - 1)*(k 2 - 1)SS 1*2 (k 1 - 1)*(k 2 - 1) MS 1*2 MSE ErrorSS error XXXSS error XXX TotalSS total N-1 Two categorical explanatory variables

7 7 General Linear Models Used to analyze variation in Y when there is more than one explanatory variable Explanatory variables can be categorical or numerical

8 8 General Linear Models First step: formulate a model statement Example:

9 9 General Linear Models First step: formulate a model statement Example: Overall mean Treatment effect

10 10 General Linear Models Second step: Make an ANOVA table Example: Source of variation Sum of squares dfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 *

11 11 General Linear Models Second step: Make an ANOVA table Example: Source of variation Sum of squares dfMean Squares F ratioP Treatment k-1 Error N-k Total N-1 * This is the same as a one-way ANOVA!

12 12 General Linear Models If there is only one explanatory variable, these are exactly equivalent to things we’ve already done –One categorical variable: ANOVA –One numerical variable: regression Great for more complicated situations

13 13 Example 1: Experiment with blocking Fish experiment: sensitivity of goldfish to light Fish are randomly selected from the population Four different light treatments are applied to each fish

14 14 Randomized Block Design Blocks (fish) Treatments (light wavelengths)

15 15 Randomized Block Design

16 16 Step 1: Make a model statement

17 17 Step 2: Make an ANOVA table

18 18 Another Example: Mole Rats Are there lazy mole rats? Two variables: –Worker type: categorical “frequent workers” and “infrequent workers” –Body mass (ln-transformed): numerical

19 19

20 20 Step 1: Make a model statement

21 21 Step 2: Make an ANOVA table

22 22 Step 2: Make an ANOVA table

23 23 Step 1: Make a model statement

24 24 Step 2: Make an ANOVA table

25 25 Step 2: Make an ANOVA table Also called ANCOVA- Analysis of Covariance

26 26 General Linear Models Can handle any number of predictor variables Each can be categorical or numerical Tables have the same basic structure Same assumptions as ANOVA

27 27 General Linear Models Don’t run out of degrees of freedom! Sometimes, the F-statistics will have DIFFERENT denominators - see book for an example

28 28 Computer-intensive methods Hypothesis testing: –Simulation –Randomization Confidence intervals –Bootstrap

29 29 Simulation Simulates the sampling process on a computer many times: generates the null distribution from estimates done on the simulated data Computer assumes the null hypothesis is true

30 30 Example: Social spider sex ratios Social spiders live in groups

31 31 Example: Social spider sex ratios Groups are mostly females Hypothesis: Groups have just enough males to allow reproduction Test: Whether distribution of number of males is as predicted by chance Problem: Groups are of many different sizes Binomial distribution therefore doesn’t apply

32 32 Simulation: For each group, the number of spiders is known. The overall proportion of males, p m, is known. For each group, the computer draws the real number of spiders, and each has p m probability of being male. This is done for all groups, and the variance in proportion of males is calculated. This is repeated a large number of times.

33 33 The observed value (0.44), or something more extreme, is observed in only 4.9% of the simulations. Therefore P = 0.049.

34 34 Randomization Used for hypothesis testing Mixes the real data randomly Variable 1 from an individual is paired with variable 2 data from a randomly chosen individual. This is done for all individuals. The estimate is made on the randomized data. The whole process is repeated numerous times. The distribution of the randomized estimates is the null distribution.

35 35 Without replacement Randomization is done without replacement. In other words, all data points are used exactly once in each randomized data set.

36 36 Randomization can be done for any test of association between two variables

37 37 Example: Sage crickets Sage cricket males sometimes offer their hind-wings to females to eat during mating. Do females who eat hind-wings wait longer to re-mate?

38 38

39 39 Problems: Unequal variance, non-normal distributions

40 40 Male wingless Male winged 01.4 0.71.6 0.71.9 1.42.3 1.62.6 1.82.8 1.92.8 1.92.8 1.93.1 2.23.8 2.13.9 2.14.5 4.7 Real data: Randomized data: Male wingless Male winged 0.72.8 2.31.9 2.1 1.81.6 3.80 1.4 1.92.2 3.92.1 4.71.6 2.64.5 1.92.8 0.7 3.1

41 41 Note that each data point was only used once

42 42 1000 randomizations P < 0.001

43 43 Randomization: Other questions Q: Is this periodic? (yes)

44 44 Bootstrap Method for estimation (and confidence intervals) Often used for hypothesis testing too "Picking yourself up by your own bootstraps"

45 45 Bootstrap For each group, randomly pick with replacement an equal number of data points, from the data of that group With this bootstrap dataset, calculate the estimate -- bootstrap replicate estimate

46 46 Male wingless Male winged 01.4 0.71.6 0.71.9 1.42.3 1.62.6 1.82.8 1.92.8 1.92.8 1.93.1 2.23.8 2.13.9 2.14.5 4.7 Real data: Bootstrap data: Male wingless Male winged 0.71.4 0.71.4 2.8 1.42.8 1.82.8 1.83.1 1.83.1 1.93.9 1.94.5 2.14.7 2.14.7 2.14.7

47 47

48 48 Bootstraps are often used in evolutionary trees

49 49 Likelihood Likelihood considers many possible hypotheses, not just one

50 50 Law of likelihood A particular data set supports one hypothesis better than another if the likelihood of that hypothesis is higher than the likelihood of the other hypothesis. Therefore we try to find the hypothesis with the maximum likelihood.

51 51 All estimates we have learned so far are also maximum likelihood estimates.

52 52 "Simple" example Using likelihood to estimate a proportion Data: 3 out of 8 individuals are male. Question: What is the maximum likelihood estimate of the proportion of males?

53 53 Likelihood where x is a hypothesized value of the proportion of males. e.g., L(p=0.5) is the likelihood of the hypothesis that the proportion of males is 0.5.

54 54 For this example only... The probability of getting 3 males out of 8 independent trials is given by the binomial distribution.

55 55 How to find maximum likelihood hypothesis 1.Calculus or 2.Computer calculations

56 56 By calculus... Maximum value of L(p=x) is found when x = 3/8. Note that this is the same value we would have gotten by methods we already learned.

57 57 By computer calculation... Input likelihood formula to computer, plot the value of L for each value of x, and find the largest L.

58 58 Finding genes for corn yield: Corn Chromosome 5

59 59 Hypothesis testing by likelihood Compares the likelihood of maximum likelihood estimate to a null hypothesis Log-likelihood ratio =

60 60 Test statistic With df equal to the number of variables fixed to make null hypothesis

61 61 Example:3 males out of 8 individuals H 0 : 50% are male Maximum likelihood estimate

62 62 Likelihood of null hypothesis

63 63 Log likelihood ratio We fixed one variable in the null hypothesis (p), So the test has df = 1., so we do not reject H 0.


Download ppt "1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal."

Similar presentations


Ads by Google