Building Phylogenies
Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot be known with certainty! Nevertheless, phylogenies can be useful Human Gorilla Chimp Gibbon Orangutan ?? or
Applications of Phylogenetic Analysis Inferring function –Closely related sequences occupy neighboring branches of tree Tracking changes in rapidly evolving populations (e.g., viruses) –Which genes are under selection?
Phyloinformatics Comparative analysis through phylogenies helps to understand biological function Exploit phylogenies for data mining
Disease Transmission and Medical Forensics
Discovering Snake Antivenin
Methods Distance-based Parsimony Maximum likelihood
Distance Matrices a0 b60 c730 d abcd a b c d
Ultrametric Matrices a0 b20 c660 d abcd a b c d
Least Squares D ij = distance between i and j in matrix d ij = distance between i and j in tree Objetive: Find tree that minimizes
Characters a0111a0111 ABCDABCD c0011c0011 d0110d0110 e0001e0001 f1000f1000 b0111b0111 Characters are represented using matrices A character can be a morphological trait or a letter in a column of an alignment.
Parsimony a0111a0111 ABCDABCD c0011c0011 d0110d0110 e0001e0001 f1000f1000 b0111b0111 ABC D f a, b d c ed Goal: Find the tree with least number of evolutionary changes
Markov models on trees Observed: The species labeling the leaves Hidden: The ancestral states Transition probabilities: The mutation probabilities Assumptions: –Only mutations are allowed –Sites are independent –Evolution at each site occurs according to a Markov process
Models of evolution at a site Transition probability matrix: M = [m ij ], i, j {A, C, T, G} where m ij = Prob(i j mutation in 1 time unit) Different branches of tree may have different lengths
The probability of an assignment AGCT Probability = m TG · m GA · m GG · m TT · m TC · m TT G T T
Ancestral reconstruction: most likely assignment AGCT L* = max X,Y,Z {m XY · m YA · m YG · m XZ · m ZC · m ZT } Y X Z