1 The Dynamics of Positive Selection on the Mammalian Tree Carolin Kosiol Cornell University Joint with: Tomas Vinar, Rute Da Fonseca,

Slides:



Advertisements
Similar presentations
Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Advertisements

Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Phylogenetic Trees Lecture 4
Lecture 3 Molecular Evolution and Phylogeny. Facts on the molecular basis of life Every life forms is genome based Genomes evolves There are large numbers.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Markov Chains Lecture #5
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Scott Williamson and Carlos Bustamante
Positive selection A new allele (mutant) confers some increase in the fitness of the organism Selection acts to favour this allele Also called adaptive.
Ln(7.9* ) –ln(6.2* ) is  2 – distributed with (n-2) degrees of freedom Output from Likelihood Method. Likelihood: 6.2*  = 0.34.
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Evolutionary Models for Multiple Sequence Alignment CBB/CS 261 B. Majoros.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Gibbs Sampler in Local Multiple Alignment Review by 온 정 헌.
Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
SHI Meng. Abstract Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses.
Estimating evolutionary parameters for Neisseria meningitidis Based on the Czech MLST dataset.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Cis-regulatory Modules and Module Discovery
Why phylogenetics? Barbara Holland School of Physical Sciences University of Tasmania.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
The influence of population size on patterns of natural selection in mammals Carolin Kosiol Cornell University 21 st December 2007 Isaac.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Molecular Ecolution: Phylogenetic trees Eric Xing Lecture 21, April.
What is positive selection?
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Modelling evolution Gil McVean Department of Statistics TC A G.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Maximum likelihood (ML) method
Pipelines for Computational Analysis (Bioinformatics)
Models of Sequence Evolution
by , Christine G. Elsik, Ross L. Tellam, and Kim C. Worley
Study phylogeny in the context of species evolution
Presentation transcript:

1 The Dynamics of Positive Selection on the Mammalian Tree Carolin Kosiol Cornell University Joint with: Tomas Vinar, Rute Da Fonseca, Melissa Hubisz, Carlos Bustamante, Rasmus Nielsen and Adam Siepel

2 6 high-quality genomes of eutherian mammals human / chimp / macaque / mouse / rat / dog orthologous genes. 544 genes identified to be under positive selection using codon models. Positive selection in six mammalian genomes 0.05 subst/site human macaque mouse rat dog chimp

3 Q ij = 0 i, j differ by > 1 nucleotide  j i, j synonymous transversion  j  i, j synonymous transition  j  i, j nonsynonymous transversion  j   i, j nonsynonymous transition (Goldman &Yang 1994,Yang et al., 2000) where  : transition/transversion rate ratio  j : equilibrium frequency of codon j  : nonsynonymous/synonymous rate ratio Codon models  < 1 purifying selection  = 1 neutral evolution  > 1 positive selection

4 Based on continuous-time Markov models of codon evolution Compare null model allowing for negative selection (ω<1) or neutral evolution (ω=1) with alternative model additionally allowing for positive selection (ω>1) Both models allow ω to vary across sites Can have foreground branches with PS and background branches without Applied separately to each gene (Nielsen & Yang, 1998; Yang & Nielsen, 2002) Branch-Site Likelihood Ratio Tests (LRTs)

5 Total: 544 positively selected genes (PSGs) identified chimpmacaquehumanhominid 400 Branch and clade LRTs primate clade primate branch rodent clade rodent branch

6 Co-evolution in complement immunity P<0.05 FDR<0.05

= 511 possible selection histories on the 9 branch mammalian phylogeny

8 Many of the likelihoods of the 511 models might be very similar or identical. Models are not nested. Bayesian analysis looks at distribution of selection histories. Bayesian analysis allows “soft” (probabilistic) choices of selection histories. We can compute prevalence of selection on individual branches and clades that considers uncertainty of selection histories. Why Baysian Model Selection?

9 Two evolutionary modes: Selected Non-selected Parameters describing the switching process:  b,G : probability that gene gains positive selection on branch b  b,L : probability that gene loses positive selection on branch b Bayesian Switching Model

10 X =(X 1, …X N ) be the alignment data, with X i alignment of ith gene Z=(Z 1,…,Z N ) be the set of selection histories, with Z i denoting history of ith gene.  is set of switching parameters Assume independence of genes X and histories Z, and conditional independence X and  given Z. Thus, Bayesian Switching Model

11 Mapping selection histories to switches (cont.) (0,0)(0,1) (1,1) Gain of pos. selection (0,1) :  nbG Absence of gain of pos. selection (0,0) : 1-  nbG Loss of pos. selection (0,1) :  nbL Absence of loss pos. selection (1,1) : 1-  nbL

12 Bayesian Switching model

13 Putting everything together … with (Beta distrib  =1,  =9) (Product relevant switching prob) (Likelihoods from codon models assuming selection histories Z j )

14 Gibbs sampling Variables Z and  are unobserved. We sample from the joint posterior distribution by a Gibbs sampler that alternates between sampling each Z i conditional on X i and previously sampled  and sampling  conditional on a previously sampled Z.

15 Inferred Rates of Gain and Loss gainloss

16 Episodic selection on the mammalian tree Most genes appear to have switched between evolutionary modes multiple times. Posterior expected number of modes switches 1.6 (0.6 gains, 1.0 loses) An expected 95% of PSGs have experienced at least once, 53% at least twice. These observations are qualitatively in agreement with Gillespie’s episodic molecular clock.

17 Inferred Number of Genes Under Positive Selection ( ) ( ) ( ) ( ) ( ) ( ) (32-62) ( ) ( ) ( ) ( ) ( )

18 Complement components C7 and C8B Components C7 and C8B encode proteases in the membrane attack complex Differences in complement proteases are thought to explain certain differences in immune responses of humans and rodents. C7: PP=0.98C8B: PP=0.93 (Puente et al, 2003)

19 Glycoprotein hormones GGA CGA is alpha subunit of chorionic gonadotropin, luteinizing hormone, follicle stimulating, and thyroid stimulating hormone. The alpha subunits of 4 hormones are identical, however, their beta chains are unique and confer biological specificity. Beta subunits CGB1 and CGB2 are thought to have originated from gene duplication in the common ancestor of humans and great apes. PP = 0.82

20 Summary and Future Work Bayesian analysis allows the study of patterns and the episodic nature of positive selection on the mammalian tree. Most probable selection histories can be identified for individual genes. Ideally, we like to model mode switches in continuous time. Compare functions of genes with high and low expected number of switches. Is the selection history predictive of function?

21 Resource

22 Thanks Siepel Lab (Cornell) Adam Siepel, Tomas Vinar, Brona Brejova, Adam Diehl, Andre Luis Martins Bustamante Lab (Cornell) Carlos Bustamante, Adam Boyko, Adam Auton, Keyan Zhao, Abra Brisbin, Kasia Bryc, Jeremiah Degenhardt, Lin Li, Kirk Lohmueller, Weisha Michelle Zhu, Amit Indap Nielsen lab (Berkeley) Rasmus Nielsen Rute Da Fonseca NIH and NSF for funding