Genomic diversity and differentiation heading toward exam 3.

Slides:



Advertisements
Similar presentations
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Advertisements

Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Sampling distributions of alleles under models of neutral evolution.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Lecture 23: Introduction to Coalescence April 7, 2014.
Phylogenetic reconstruction
Atelier INSERM – La Londe Les Maures – Mai 2004
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
From population genetics to variation among species: Computing the rate of fixations.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Salit Kark Department of Evolution, Systematics and Ecology The Silberman Institute of Life Sciences The Hebrew University of Jerusalem Conservation Biology.
Genetica per Scienze Naturali a.a prof S. Presciuttini Mutation Rates Ultimately, the source of genetic variation observed among individuals in.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic trees Sushmita Roy BMI/CS 576
What Is Phylogeny? The evolutionary history of a group.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Population Genetics Learning Objectives
Molecular phylogenetics
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Semester I Final Review The SEMESTER IS OVA!!!!!!!
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
GENE 3000 Fall 2013 slides More geologists agree that the age of the Earth is ~4.5 billion years old geneticists have independent data suggesting.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Calculating branch lengths from distances. ABC A B C----- a b c.
GENE 3000 Fall 2013 slides wiki. wiki. wiki.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Coalescent Models for Genetic Demography
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Lecture 17: Phylogenetics and Phylogeography
Selectionist view: allele substitution and polymorphism
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Phylogeny Ch. 7 & 8.
NEW TOPIC: MOLECULAR EVOLUTION.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
The plant of the day Pinus longaevaPinus aristata.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Modelling evolution Gil McVean Department of Statistics TC A G.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
There is a Great Diversity of Organisms
Presentation transcript:

genomic diversity and differentiation heading toward exam 3

genome region of arbitrary size, what can you measure and describe? what else might you want to know? if given these data and nothing else, what could you say about them?

how do patterns in sequence data tell us about effective population size? what if there are multiple populations contributing information? how is our answer changed if the population changes in size, or if there is selection for a particular allele? why is this important for understanding phylogenetics (species trees)? learning goals for coalescent theory

patterns mutations happen at a more-or-less constant rate at random location along genome (assumptions can be tested) drift, selection, gene flow, recombination, etc. influence how these mutations turn into patterns we interpret with statistical models - mostly beyond this class

assume genealogy descent with modification focus on non-reticulate gene trees assume every mutation happens at new genome location AVISE 1987, 1994

neutral model assume all these mutations have NO effect on fitness (null model) thus, only drift influences whether allele goes to fixation remember: probability allele goes to fixation is its frequency in population so every new mutation has low but equal probability that will get FIXED (frequency 100%)

so you are collecting data not generally knowing the history of inheritance or how discrete these units may be (actually discrete, resolvably discrete) we are working on how to infer (at least probabilities) how this diversity partitions in space (population), time (frequencies), across genome (paralogs), across species (orthologs) also: copy number variation among loci, among populations, among species SPECIES GENE COPY POPULATION(DEME)

Roman and Palumbi 2003 currently ~10,000 humpback whales; pre- whaling (genetic estimate) maybe ~250,000 how many whales?

how could there be so many? 1. count whales - currently done using censusing and monitoring of whaling vessels, about 10,000 right whales in Atlantic 2. collect DNA samples from some of them, and sequence at least one gene (more is better!) 3. remember π is proportional to effective population size (times mutation rate µ) 4. we know µ (~ substitutions per DNA replication/reproduction) from fossil and biogeographic data, and we can calculate π (average # differences between every pair of sequences) 5. Ne = π/µ, adjusted for inheritance of marker (haploid, maternally inherited mtDNA, versus diploid, biparental nuclear gene) 6. Ne of right whales ~250,000 even though only 10,000 whales now! 7. the genetic diversity is older than human whaling efforts and tells us about the past

AUTOSOMES: ALL 4 COPIES CAN CONTRIBUTE MUTATIONS MTDNA: ONE COMPONENT CONTRIBUTES MUTATIONS WHEN PEOPLE REFER TO THE SMALLER EFFECTIVE SIZE OF THE MITOCHONDRIAL GENOME, THEY ARE REFERRING TO COPY NUMBER NOT THE NUMBER OF INDIVIDUALS IN THE POPULATION!

another look at Ne: drift neutrality: mean Time to Most Recent Common Ancestor (tmrca)=time to homozygosity = -4N e [ plnp + (1-p)ln(1-p) ] gens proportional to Ne; for p=0.5, ~2.77Ne gens heterozygosity declines by 1/(2Ne) per generation compare nuclear gene vs. mitochondrial gene...? DO NOT MEMORIZE THIS

basic summary stats S, number of segregating sites (how many below?) π, average number of differences among sequences (what is it below?) η i, folded site pattern: how many segregating sites appear i times? caccgtattagcattatgctggtata cgccgtactggcattatgctggtata caccgtactagcattgtgctggtatg caccgtactagcattatgccggtatg cactgtactggcattatgctggtgta cactgtactggcattatgctggtata

standard coalescent sample size n has n-1 coalescent events steps of extant size T i, E[T i ]=2/(i(i-1)) measured in units of N genetic (label) differences have no fitness consequence single population constant population size (for now) THE TREE IS UNKNOWN, ANALYSIS IS ASKING WHICH TREES FIT THE DATA AND WHAT THAT TELLS US ABOUT THE INTERVAL BETWEEN BRANCH NODES

mutation # mutations (K) Poisson distributed on genealogy, based on total time t = (T total ) Poisson process: stochastic, each time interval is independent, waiting time is exponentially distributed across time intervals (but when many branches, multiplies opportunity in interval) Applications The classic example of phenomena well modelled by a Poisson process is deaths due to horse kick in the Prussian army, as shown by Ladislaus Bortkiewicz in [4][5] The following examples are also well-modeled by the Poisson process:Ladislaus Bortkiewicz Requests for telephone calls at a switchboard. Goals scored in a soccer match. [6] Requests for individual documents on a web server. [7]ests f Particle emissions due to radioactive decay by an unstable substance. In this case the Poisson process is non- homogeneous in a predictable manner - the emission rate declines as particles are emitted.table substance.

Ewens distribution under neutral model, mutations arise at rate µ and are lost or drift to higher frequency (frequency proportional to AGE) thus we’ve come to expect a certain distribution of allele frequencies, e.g. p=q is unlikely generally a small number of very common alleles, and increasing number of very rare alleles DO NOT MEMORIZE THIS DO RECOGNIZE THIS

um, huh? here is the context: DRIFT causes some alleles to increase in frequency, some to be lost (moving forward in time) moving back in time from NOW, the same process can explain the frequency of alleles in the context of how individuals are related this means we have expectations for how long it takes for a sample of sequences from NOW to coalesce to a common ancestor in the past (about 2 times effective population size) one reason two separate evolutionary populations may not APPEAR completely different, it takes time for ancestral diversity to sort out (now) (most recent common ancestor)

>1 population? lets imagine two populations that rarely exchange migrants but have a common ancestry in the recent evolutionary past drift (moving forwards in time from ancestral population) leads to many that descended from one particular allele different in each population -> how do we know two populations? this pop descended from ‘red allele’ ancestor this pop descended from ‘green allele’ ancestor

evolutionary biology: the populations tell us who they are! shown at right are two LOCATIONS, not necessarily two distinct populations may be one evolutionary population however: if one is 90% A1 and 10% A2, the other is 10% A1 and 90% A2 that means overall 50% A1, 50% A2 should see 25% A1A1 homozygotes, 25% A2A2 if Hardy-Weinberg fits instead see overall ~41% A1A1, 41% A2A2 because we are ‘pooling’ 2 diverged populations

excess of common alleles excess homozygosity could mean that two evolutionary populations are being analyzed as though they are one so we don’t trust “even” allele frequencies: now think frequency dependent selection, balancing selection, or pooling of multiple evolutionary populations

η 1 =2 η 2 =2 η 3 =1 η 4 =1 η 1 =3 η 2 =2 η 3 =1 η 4 =0 η 1 =0 η 2 =1 η 3 =2 η 4 =3 (2, +1 for “η 5 ”) neutral theory: sort of like Goldilocks story just right = “neutral” excess rare alleles = purifying selection or population expansion excess common alleles = positive selection or long-term decline

how do patterns in sequence data tell us about effective population size? what if there are multiple populations contributing information? how is our answer changed if the population changes in size, or if there is selection for a particular allele? why is this important for understanding phylogenetics (species trees)? learning goals for coalescent theory

why is this important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions of how DNA sequences evolve before we use them to reconstruct phylogeny coalescent theory explains why recently-diverged populations may not yet have synapomorphies despite already being on different evolutionary paths this model gives us basis for estimating time to ancestor of ANY two sequences

DNA characters are just like phenotypic characters 4 character states A,C,T,G plus information in insertion-deletion, gene copy number, etc. same concerns of homology and shared descent apply

human population isolated ~200kya “mitochondrial Eve” sets up misunderstanding every locus sampled now has a point in the past where all current alleles coalesce to a common ancestor in recently diverged species, diversity is often older than the species

understanding coalescence 1. larger effective size (Ne), more diversity 2. when time between branching events short relative to Ne, more likely that allelic diversity is older than branching event Ne isolation

"This coalescence does not mean that the population originally consisted of a single individual with that ancestral allele. It just means that particular individual’s allele was the one that, out of all the alleles present at that time, later became fixed in the population."

phylogeny inference 2 basic approaches: algorithm vs. criterion “neighbor joining” shown in book is an algorithm that generates a single tree by finding shortest “distances” (proportion of differences at nucleotide sites) algorithm approaches do not help identify our uncertainty: one answer comes out, whether well supported or not

criterion-based phylogeny 30 tips results in 8.7 x possible trees computer search necessary

3 of >10,000 possible trees which fits data best? depends on the criterion

3 of >10,000 possible trees which fits data best? depends on the criterion 11 changes 7 changes = most parsimonious of these 3

criteria used in phylogeny parsimony - the fewest # of changes indicates the most acceptable tree topology maximum likelihood - both topology (arrangement of branches) and branch lengths are iteratively searched for tree(s) that fit statistical model of molecular evolution (e.g. transitions > transversions) Bayesian - criterion is still maximum likelihood, search strategy is different (sums result over many similar-likelihood trees)

why different criteria? 1. we are making our assumptions explicit for inference of the unknown 2. different scientists have different backgrounds that drive their assumptions 3. using multiple methods/criteria lets us test how safe our assumptions are 4. next time: how do we decide if a tree hypothesis is strongly supported?