Download presentation
Presentation is loading. Please wait.
1
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark gorm@cbs.dtu.dk
2
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods 1.Construct multiple alignment of sequences 2.Construct table listing all pairwise differences (distance matrix) 3.Construct tree from pairwise distances Gorilla : ACGTCGTA Human : ACGTTCCT Chimpanzee: ACGTTTCG GoHuCh Go-44 Hu-2 Ch- Go Hu Ch 2 1 1 1
3
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Optimal Branch Lengths: Least Squares Fit between given tree and observed distances can be expressed as “sum of squared differences”:Fit between given tree and observed distances can be expressed as “sum of squared differences”: Q = (D ij - d ij ) 2 Q = (D ij - d ij ) 2 Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree.Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree. S1 S3 S2 S4 a b c d e Distance along tree D 12 d 12 = a + b + c D 13 d 13 = a + d D 14 d 14 = a + b + e D 23 d 23 = d + b + c D 24 d 24 = c + e D 34 d 34 = d + b + e Goal: j>i
4
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Superimposed Substitutions Actual number ofActual number of evolutionary events:5 Observed number ofObserved number of differences:2 Distance is (almost) always underestimatedDistance is (almost) always underestimated ACGGTGC C T GCGGTGA
5
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model-based correction for superimposed substitutions Goal: try to infer the real number of evolutionary events (the real distance) based onGoal: try to infer the real number of evolutionary events (the real distance) based on 1. Observed data (sequence alignment) 2. A model of how evolution occurs
6
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Jukes and Cantor Model Four nucleotides assumed to be equally frequent (f=0.25)Four nucleotides assumed to be equally frequent (f=0.25) All 12 substitution rates assumed to be equalAll 12 substitution rates assumed to be equal Under this model the corrected distance is:Under this model the corrected distance is: D JC = -0.75 x ln(1-1.33 x D OBS ) For instance:For instance: D OBS =0.43 => D JC =0.64 ACGT A -3 C G T
7
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Other models of evolution
8
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS General Time Reversible Model Time-reversibility: The amount of change from state x to y is equal to the amount of change from y to x π A x P AG = π G x P GA => π A x π G x = π G x π A x
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.