Maximum Likelihood Molecular Evolution. Maximum Likelihood The likelihood function is the simultaneous density of the observation, as a function of the.

Slides:



Advertisements
Similar presentations
IMPRS workshop Comparative Genomics 18 th -21 st of February 2013 Lecture 4 Positive selection.
Advertisements

1 Number of substitutions between two protein- coding genes Dan Graur.
Quick Lesson on dN/dS Neutral Selection Codon Degeneracy Synonymous vs. Non-synonymous dN/dS ratios Why Selection? The Problem.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Towards realistic codon models: among site variability and dependency of synonymous and nonsynonymous rates Itay Mayrose Adi Doron-Faigenboim Eran Bacharach.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Likelihood methods Given a particular model of evolution, we can estimate phylogenies using maximum likelihood.
Phylogenetic Trees Lecture 4
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Molecular Evolution Revised 29/12/06
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
1 Detecting selection using phylogeny. 2 Evaluation of prediction methods  Comparing our results to experimentally verified sites Positive (hit)Negative.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Maximum Likelihood. The likelihood function is the simultaneous density of the observation, as a function of the model parameters. L(  ) = Pr(Data| 
Phylogenetic reconstruction
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
1 Functional prediction in proteins (purifying and positive selection)
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model Selection Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Functionality of pack-mule sequences in Rice genome Kousuke Hanada 9/21/’06.
Calculating branch lengths from distances. ABC A B C----- a b c.
How to date Xuhua Xia
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
What is positive selection?
NEW TOPIC: MOLECULAR EVOLUTION.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.
Evolution of individual genes in humans
©1998 Timothy G. Standish From DNA To RNA To Protein Timothy G. Standish, Ph. D.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Modelling evolution Gil McVean Department of Statistics TC A G.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
LBA ProtPars. LBA Prot Dist no Gamma and no alignment.
Phylogeny and the Tree of Life
Multiple Sequence Alignment
Linkage and Linkage Disequilibrium
Pipelines for Computational Analysis (Bioinformatics)
Molecular Evolutionary Analysis
In-Text Art, Ch. 16, p. 316 (1).
Distances.
Molecular basis of evolution.
26.5 Molecular Clocks Help Track Evolutionary Time
Pedir alineamiento múltiple
Stephen Wooding, Un-kyung Kim, Michael J
Presentation transcript:

Maximum Likelihood Molecular Evolution

Maximum Likelihood The likelihood function is the simultaneous density of the observation, as a function of the model parameters. L(  ) = Pr(Data|  ) If the observations are independent, we can decompose the term into

An example Consider the estimation of heads probability of a coin tossed n times Heads probability p Data = HHTTHTHHTTT L(p) = Pr(D|p) = pp(1-p)(1-p)p(1-p)pp(1- p)(1-p)(1-p) = p 5 (1-p) 6

L(p) = p 5 (1-p) 6 = 5/11

Maximum Likelihood Take the derivative of L with respect to p: Equate it to zero and solve: p = 5/11 ^

Log Likelihood For computational reasons, we maximise the logarithm lnL = 5 lnp + 6 ln(1-p) with derivative p = 5/11 ^

A tree (for one column of the alignment) … A … … C … … G …

Tree likelihood: Assumptions 1.Evolution in different sites is independent. 2.Evolution in different lineages is independent.

Pr(A,C,C,C,G,x,y,z,w|T) = Pr(x) Pr(y|x,t 6 ) Pr(A|y,t 1 ) Pr(C|y,t 2 ) Pr(z|x,t 8 ) Pr(C|z,t 3 ) Pr(w|z,t 7 ) Pr(C|w,t 4 ) Pr(G|w,t 5 )

Using models Observed differences Actual changes AG CT Example: Jukes-Cantor, if i=j, if i≠j

DNA substitution models

Comparison of substitution models

Using models Observed differences Actual changes AG CT Example: Jukes-Cantor, if i=j, if i≠j

30 nucleotides from  -globin genes of two primates on a one-edge tree * * Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGG Orangutan GGACTCCTTGAGAAATAAACTGCACACTGG There are two differences and 28 similarities tt lnL  t= lnL=

Goldman-Yang/Muse-Gaut model 60+1 parameters Codon models

Detecting selection

Codon table

Ka/Ks ratio Ka: # non-synonymous changes / #non- synonymous sites Ks: # synonymous changes / # synonymous sites Ka/Ks : indicative of selective action <1 : purifying selection 1 : selectively neutral >1 : positive (darwinian) selection

Counting syn/non-syn changes CCC Pro ACC Thr CAC His CCA Pro CAA Gln ACA Thr CAA Gln AAC Asn ACA Thr AAC Asn AAA Lys Alignment: Seq 1:... CCC... Seq 2:... AAA...

Synonymous and non synonymous substitutions

Molecular clock

Zuckerlandl & Pauling (1965)

Molecular clock: use in taxonomy 18S RNA subunit of ribosome Very slowly evolving: good for microbes

Conservation For most sequences, the molecular clock does not apply Intron-exon structure Domains Sudden evolutionary bursts Varying effective population sizes Varying selective pressures