Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C.

Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C

Meta-Analysis An increasing trend is for ecological and evolutionary studies that synthesize the large body of existing data, and look for overall trends across species and ecosystems to identify general principles - termed “meta-analysis”, such studies do not generate data but rather analyze patterns across hundreds of pre-existing datasets While such studies can be very powerful, they also may suffer from a crucial underlying problem: treating species as independent data-points

Meta-Analysis Standard frequentist statistics rely on some basic underlying assumptions - one critical assumption is that all observations (data points) are independent, and contribute equally to sample size In turn, sample size influences the statistical significance, or odds of seeing an apparent trend when there isn’t really one present in the data P < 0.05 means, less that 5% chance of seeing this trend by chance alone, when no trend is actually present - i.e., if you had a huge number of samples, the “trend” would disappear

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables - the model has some parameters, which are constants estimated by the analysis; in this example, parameters are the slope and intercept model parameters

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables - the model has some parameters, which are constants estimated by the analysis; in this example, parameters are the slope and intercept error term parameters - also contains an error term for the proportion of variance unexplained by the modeled relationship, but let’s ignore that for now + ε

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables b) a measure of goodness-of-fit: how well does the model fit the data, or describe the relationship c) a significance level of the model fit model goodness-of-fit: a measure of how well the model explains a pattern or trend in the data

Statistical modelling or analysis of data Statistical analyses are sensitive to sample size: the higher the sample size, the more significant a P-value will be, given the same basic goodness-of-fit  the critical underlying assumption here is that each point represents a truly independent measure of the relationship; “duplicate” observations artifically inflate the sample size

Statistical modelling or analysis of data - suppose you wanted to analyze the relationship between brain size and population size in vertebrates, to test whether smaller populations select for larger brains due to increased competition for resources (or whatever reason) n = 17 P < 0.0001

Statistical modelling or analysis of data n = 17 P < 0.0001 fish birds primates What if... 1) most fish are just plentiful and dumb; 2) most birds have medium-sized populations and sing to communicate, so have evolved larger brains; 3) primates live in small tribes due to ecological constraints, and have coincidentally evolved advanced communication What is your sample size here, REALLY?

Statistical modelling or analysis of data n = 17 P < 0.0001 fish birds primates If all primates correspond to one evolutionary “sample”, and within that group population size and brain size covary because that’s the ancestral condition in primates...... then your actual sample size may be n = 3, not n = 17 n = 3 P > 0.1

Exploring relationships among traits This statistical issue is critical to a large field of biology that seeks to indentify relationships among traits or characteristics shared by species For instance... - do organisms lose eyes as a result of living in caves? - do flying clades contain more species than non-flying clades? - do promiscuous species have smaller brains? - do self-polinating plant species go extinct at higher rates? To test these hypotheses, we need to test for a significant correlation between traits among species

Species are not independent observations  species are not independent data points, because they share common ancestry with close relatives - related species may be similar because they all evolved from a common ancestor with a certain combination of traits or features brain pop. size - two totally different possibilities: 1) population size affects the evolution of brain size 2) these traits have no effect on each other’s evolution, but have simply been co-inherited by chance

Species are not independent observations  species are not independent data points, because they share common ancestry with close relatives - related species may be similar because they all evolved from a common ancestor with a certain combination of traits or features brain pop. size are these 8 independent datapoints, or one evolutionary event that produced an apparent association between brain and population size?

Exploring relationships among traits This statistical issue is critical to a large field of biology that seeks to indentify relationships among traits or characteristics shared by species This issue, like almost everything, was first raised by Darwin: “We may often falsely attribute to correlated variation structures which are common to whole groups of species, and which in truth are simply due to inheritance; for an ancient progenitor may have acquired through natural selection some one modification in structure, and, after thousands of generations, some other and independent modification; and these two modifications, having been transmitted to a whole group of descendants with diverse habits, would naturally be thought to be in some necessary manner correlated.” (Darwin, 1872; 6 th ed. of Origin of Species)

Comparative methods Attention was brought to this issue in modern times by Felsenstein (1985) and Harvey & Pagel (1991) Felsenstein J. 1985. Phylogenies and the comparative method. Am.Nat. 125:1–15. Harvey P.H., Pagel M.D. 1991. The comparative method in evolutionary biology. Oxford: Oxford University Press. Nevertheless, researchers continue to ignore this issue to an alarming degree Ecologists in particular often ignore this issue in meta-analyses because it requires an estimate of phylogeny that may not be available for any given set of species

 the similarity among species due to shared ancestry is called phylogenetic effects - all else being equal, closely related species are expected to resemble each other in suites of traits, which may therefore appear to be evolving in a correlated manner traits 1 2 1 2 - correlated trait evolution - apparent correlation due to phylogenetic effects

Any character (= trait) can have two or more possible states, or ways that trait can be expressed - the simplest states are present / absent - some traits are binary, meaning they only have two possible character states (big/small, smart/dumb, slutty/not slutty) - others are multi-state, meaning have >2 possible states: For instance, the character “trophic mode” could have states: 1) herbivore; 2) carnivore; 3) autotrophic (photosynthetic) The character “habitat” could have states: 1) in trees; 2) under rocks; 3) inside human host Characters and character states

how 2 traits are distributed today, in 8 living species 1 2 1212 last common ancestor of 8 species = brain size (small, ; big, ) = population (small, ; big, )

how 2 traits are distributed today, in 8 living species 1 2 last common ancestor of 8 species If a change in trait 1 is always followed by a change in trait 2, then the two traits are evolving in a correlated manner

Comparative methods test for trends among species When one trait is repeatedly associated with change in another trait, they are evolving in a correlated manner - comparative methods can test for, and identify, when this has happened by correcting for phylogenetic effects  model the evolution of traits across a phylogeny - change in traits is more likely to occur on longer branches (= more time for evolution to happen) - thus, change is less likely to happen by chance between closely related species (they are separated by short branch lengths) change likely, somewhere along here change unlikely on short branches

Comparative methods 1: Build a phylogeny GATC A G GGTC GGTT GATC AATC GATC C T G A 1.0 0.01 2035 0.02 0.07 - this is a model of DNA sequence evolution - the mutation rates are the parameters of the model (variables you estimate) - based on DNA sequences of living species, we infer the most likely sequence of mutations that produced the observed data

Comparative methods 2: Model trait evolution 0.5 1.0 - model of character trait evolution - parameters of the model are the transition rates between character states (ways the trait can appear) state 1, small state 2, big states shown for nodes are the most likely ancestral state for that clade, based on our model

Comparative methods 2: Model trait evolution 0.5 1.0 - if the model is forced to allow a character change on a short branch, it is penalized (lower L score) - same way that a likelihood phylogenetic analysis will penalize you for assuming a rare transversion happened states shown for nodes are the most likely ancestral state for that clade, based on our model

Comparative methods 3: Test hypotheses ancestral character state reconstruction is the process of inferring what the state was for a trait across the nodes on a phylogeny - model can infer the relative probability of two or more possible states, plotted as a pie diagram at some nodes, probability of different ancestral states may be roughly equal, in which case we cannot necessarily infer what the ancestor looked like without a more formal test we can be pretty confident the ancestor of these two species had state

To test alternative hypotheses, force the model to make a given node have one state; then estimate overall likelihood score (L, a measure of goodness-of-fit, like r 2 ) 12 log(L 1 ) = -7 log(L 2 ) = -4  L score indicates the likelihood of observing your trait data given (a) the phylogeny, and (b) the model of trait evolution with the estimated parameter values

12 log(L 1 ) = -7 log(L 2 ) = -4 - L scores are reported as log(likelihood), or negative exponents, because L scores are usually tiny fractions L = 0.0000001 = 10 -7 log(L) = -7 L = 0.0001 = 10 -4 log(L) = -4  note: a larger negative log(L) is a smaller likelihood, & thus a worse fit to the data worse fit better fit

The relative likelihood of two alternative models can be compared using the likelihood ratio: likelihood of data given Hypothesis 2 = L 2 likelihood of data given Hypothesis 1 L 1 12 log(L 1 ) = -7 log(L 2 ) = -4

Whether one model is a significantly better fit to the data can be determined using a Chi-Squared test, taking 2 x (difference in log(L) scores) as the test statistic 12 log(L 1 ) = -7 log(L 2 ) = -4 Χ 2 = 2(logL 2 – logL 1 ) = 2[-4 – (-7)] = 6; P < 0.025 favors this scenario

Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C.

Similar presentations

Presentation on theme: "Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C.

Similar presentations

Presentation on theme: "Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C."— Presentation transcript:

Similar presentations

About project

Feedback