Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C.

Slides:



Advertisements
Similar presentations
Chapter 15 Table of Contents Section 1 History of Evolutionary Thought
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Descent with Modification: A Darwinian View of Life.
Objectives (BPS chapter 24)
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
18.2 Modern Evolutionary Classification
18.2 Modern Evolutionary Classification
Warm-Up 3/24 What is a derived characteristic? What is a clade?
Phylogenetic reconstruction
1 Apply Concepts To an evolutionary taxonomist, what determines whether two species are in the same genius 2 Explain What is a derived character 3 Review.
18.2 Modern Evolutionary Classification
Biology 14.2 How Biologists Classify Organisms
Molecular Evolution Revised 29/12/06
Review: What influences confidence intervals?
Simple Linear Regression Analysis
Relationships Among Variables
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
AM Recitation 2/10/11.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Inference for regression - Simple linear regression
Molecular phylogenetics
Chapter 15 Table of Contents Section 1 History of Evolutionary Thought
15.2 Evidence of Evolution 7(A) Analyze and evaluate how evidence of common ancestry among groups is provided by the fossil record, biogeography, and homologies,
Beak of the Finch Natural Selection Statistical Analysis.
Comparative Methods for Studying Trait Evolution “Comparative methods” are used to: 1) compare traits across many species to determine if similar traits.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
GENE 3000 Fall 2013 slides More geologists agree that the age of the Earth is ~4.5 billion years old geneticists have independent data suggesting.
Systematics and the Phylogenetic Revolution Chapter 23.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Evolution Pre-AP Biology. Charles Darwin Known as the Father of Evolution Known as the Father of Evolution Wrote book On the Origin of Species Wrote book.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Why do scientists use a classification system? To organize many diverse organisms (biological diversity) What is a theory? A well-supported,testable explanation.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Methods and Applications CHAPTER 15 ANOVA : Testing for Differences among Many Samples, and Much.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
PCB 3043L - General Ecology Data Analysis.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.
Ch.10: Principles of Evolution
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Interpreting Cladograms
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Chapter 26 Phylogeny and the Tree of Life
Evolution for Beginners. What is evolution? A basic definition of evolution… “…evolution can be precisely defined as any change in the frequency of alleles.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
Chapter 10: The t Test For Two Independent Samples.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Lesson Overview Lesson Overview Modern Evolutionary Classification Lesson Overview 18.2 Modern Evolutionary Classification Darwin’s ideas about a “tree.
Methods of Presenting and Interpreting Information Class 9.
Section 2: Modern Systematics
Section 2: Modern Systematics
Theory of Evolution.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
18.2 Modern Evolutionary Classification
Chapter 18: Evolution and Origin of Species
MGS 3100 Business Analysis Regression Feb 18, 2016
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Evolution Biology Mrs. Johnson.
Presentation transcript:

Midterm mean = 38.9 ± 12.9 >50 = A >40 = B>29 = C

Meta-Analysis An increasing trend is for ecological and evolutionary studies that synthesize the large body of existing data, and look for overall trends across species and ecosystems to identify general principles - termed “meta-analysis”, such studies do not generate data but rather analyze patterns across hundreds of pre-existing datasets While such studies can be very powerful, they also may suffer from a crucial underlying problem: treating species as independent data-points

Meta-Analysis Standard frequentist statistics rely on some basic underlying assumptions - one critical assumption is that all observations (data points) are independent, and contribute equally to sample size In turn, sample size influences the statistical significance, or odds of seeing an apparent trend when there isn’t really one present in the data P < 0.05 means, less that 5% chance of seeing this trend by chance alone, when no trend is actually present - i.e., if you had a huge number of samples, the “trend” would disappear

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables - the model has some parameters, which are constants estimated by the analysis; in this example, parameters are the slope and intercept model parameters

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables - the model has some parameters, which are constants estimated by the analysis; in this example, parameters are the slope and intercept error term parameters - also contains an error term for the proportion of variance unexplained by the modeled relationship, but let’s ignore that for now + ε

Statistical modelling or analysis of data Statistical analyses generally produce three things: a) the model: an equation describing the relationship among variables b) a measure of goodness-of-fit: how well does the model fit the data, or describe the relationship c) a significance level of the model fit model goodness-of-fit: a measure of how well the model explains a pattern or trend in the data

Statistical modelling or analysis of data Statistical analyses are sensitive to sample size: the higher the sample size, the more significant a P-value will be, given the same basic goodness-of-fit  the critical underlying assumption here is that each point represents a truly independent measure of the relationship; “duplicate” observations artifically inflate the sample size

Statistical modelling or analysis of data - suppose you wanted to analyze the relationship between brain size and population size in vertebrates, to test whether smaller populations select for larger brains due to increased competition for resources (or whatever reason) n = 17 P <

Statistical modelling or analysis of data n = 17 P < fish birds primates What if... 1) most fish are just plentiful and dumb; 2) most birds have medium-sized populations and sing to communicate, so have evolved larger brains; 3) primates live in small tribes due to ecological constraints, and have coincidentally evolved advanced communication What is your sample size here, REALLY?

Statistical modelling or analysis of data n = 17 P < fish birds primates If all primates correspond to one evolutionary “sample”, and within that group population size and brain size covary because that’s the ancestral condition in primates then your actual sample size may be n = 3, not n = 17 n = 3 P > 0.1

Exploring relationships among traits This statistical issue is critical to a large field of biology that seeks to indentify relationships among traits or characteristics shared by species For instance... - do organisms lose eyes as a result of living in caves? - do flying clades contain more species than non-flying clades? - do promiscuous species have smaller brains? - do self-polinating plant species go extinct at higher rates? To test these hypotheses, we need to test for a significant correlation between traits among species

Species are not independent observations  species are not independent data points, because they share common ancestry with close relatives - related species may be similar because they all evolved from a common ancestor with a certain combination of traits or features brain pop. size - two totally different possibilities: 1) population size affects the evolution of brain size 2) these traits have no effect on each other’s evolution, but have simply been co-inherited by chance

Species are not independent observations  species are not independent data points, because they share common ancestry with close relatives - related species may be similar because they all evolved from a common ancestor with a certain combination of traits or features brain pop. size are these 8 independent datapoints, or one evolutionary event that produced an apparent association between brain and population size?

Exploring relationships among traits This statistical issue is critical to a large field of biology that seeks to indentify relationships among traits or characteristics shared by species This issue, like almost everything, was first raised by Darwin: “We may often falsely attribute to correlated variation structures which are common to whole groups of species, and which in truth are simply due to inheritance; for an ancient progenitor may have acquired through natural selection some one modification in structure, and, after thousands of generations, some other and independent modification; and these two modifications, having been transmitted to a whole group of descendants with diverse habits, would naturally be thought to be in some necessary manner correlated.” (Darwin, 1872; 6 th ed. of Origin of Species)

Comparative methods Attention was brought to this issue in modern times by Felsenstein (1985) and Harvey & Pagel (1991) Felsenstein J Phylogenies and the comparative method. Am.Nat. 125:1–15. Harvey P.H., Pagel M.D The comparative method in evolutionary biology. Oxford: Oxford University Press. Nevertheless, researchers continue to ignore this issue to an alarming degree Ecologists in particular often ignore this issue in meta-analyses because it requires an estimate of phylogeny that may not be available for any given set of species

 the similarity among species due to shared ancestry is called phylogenetic effects - all else being equal, closely related species are expected to resemble each other in suites of traits, which may therefore appear to be evolving in a correlated manner traits correlated trait evolution - apparent correlation due to phylogenetic effects

Any character (= trait) can have two or more possible states, or ways that trait can be expressed - the simplest states are present / absent - some traits are binary, meaning they only have two possible character states (big/small, smart/dumb, slutty/not slutty) - others are multi-state, meaning have >2 possible states: For instance, the character “trophic mode” could have states: 1) herbivore; 2) carnivore; 3) autotrophic (photosynthetic) The character “habitat” could have states: 1) in trees; 2) under rocks; 3) inside human host Characters and character states

how 2 traits are distributed today, in 8 living species last common ancestor of 8 species = brain size (small, ; big, ) = population (small, ; big, )

how 2 traits are distributed today, in 8 living species 1 2 last common ancestor of 8 species If a change in trait 1 is always followed by a change in trait 2, then the two traits are evolving in a correlated manner

Comparative methods test for trends among species When one trait is repeatedly associated with change in another trait, they are evolving in a correlated manner - comparative methods can test for, and identify, when this has happened by correcting for phylogenetic effects  model the evolution of traits across a phylogeny - change in traits is more likely to occur on longer branches (= more time for evolution to happen) - thus, change is less likely to happen by chance between closely related species (they are separated by short branch lengths) change likely, somewhere along here change unlikely on short branches

Comparative methods 1: Build a phylogeny GATC A G GGTC GGTT GATC AATC GATC C T G A this is a model of DNA sequence evolution - the mutation rates are the parameters of the model (variables you estimate) - based on DNA sequences of living species, we infer the most likely sequence of mutations that produced the observed data

Comparative methods 2: Model trait evolution model of character trait evolution - parameters of the model are the transition rates between character states (ways the trait can appear) state 1, small state 2, big states shown for nodes are the most likely ancestral state for that clade, based on our model

Comparative methods 2: Model trait evolution if the model is forced to allow a character change on a short branch, it is penalized (lower L score) - same way that a likelihood phylogenetic analysis will penalize you for assuming a rare transversion happened states shown for nodes are the most likely ancestral state for that clade, based on our model

Comparative methods 3: Test hypotheses ancestral character state reconstruction is the process of inferring what the state was for a trait across the nodes on a phylogeny - model can infer the relative probability of two or more possible states, plotted as a pie diagram at some nodes, probability of different ancestral states may be roughly equal, in which case we cannot necessarily infer what the ancestor looked like without a more formal test we can be pretty confident the ancestor of these two species had state

To test alternative hypotheses, force the model to make a given node have one state; then estimate overall likelihood score (L, a measure of goodness-of-fit, like r 2 ) 12 log(L 1 ) = -7 log(L 2 ) = -4  L score indicates the likelihood of observing your trait data given (a) the phylogeny, and (b) the model of trait evolution with the estimated parameter values

12 log(L 1 ) = -7 log(L 2 ) = -4 - L scores are reported as log(likelihood), or negative exponents, because L scores are usually tiny fractions L = = log(L) = -7 L = = log(L) = -4  note: a larger negative log(L) is a smaller likelihood, & thus a worse fit to the data worse fit better fit

The relative likelihood of two alternative models can be compared using the likelihood ratio: likelihood of data given Hypothesis 2 = L 2 likelihood of data given Hypothesis 1 L 1 12 log(L 1 ) = -7 log(L 2 ) = -4

Whether one model is a significantly better fit to the data can be determined using a Chi-Squared test, taking 2 x (difference in log(L) scores) as the test statistic 12 log(L 1 ) = -7 log(L 2 ) = -4 Χ 2 = 2(logL 2 – logL 1 ) = 2[-4 – (-7)] = 6; P < favors this scenario