Download presentation
Presentation is loading. Please wait.
Published byDwayne Black Modified over 9 years ago
1
Inferring phylogenetic trees: Maximum likelihood methods Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington thabangh@gmail.com
2
One-minute responses First part of class was fine. I am struggling with Python. At first it was difficult to complete the program when I get the first half, but it is getting easier now. The class lecture is always fine, but the Python problems are getting tougher. However, they are really interesting and quite informative. We are learning a lot about programming. The class is more interesting every day. I enjoy the Python, especially because I am able to fill in by myself. Thank you for helping us with sys.stdout.write. It will be very useful for future work in Python.
3
Outline Parsimony Distance methods – Computing distances – Finding the tree Maximum likelihood
4
Revision Multiple sequence alignment Pairwise distance matrix Phylo- genetic tree
5
Revision Ideally, distances in a phylogenetic tree would represent time. In practice, however, what do the distance estimate represent? – The expected number of changes per position. What is a “back mutation”? – A pair of mutations that reverse one another (e.g., A C A)
6
Revision Compute the Juke-Cantor distance between the first yeast and mouse sequences shown below. XX X X X XX X X X dha2_yeast 93 LRYTRHEPVGVCGEIIPWNI dhac_mouse 93 FTYTRREPIGVCGQIIPWNI dha5_yeast 92 FAYTLKVPFGVVAQIVPWNI dhal_ecoli 92 LAMIVREPVGVIAAIVPWNI
7
SparSmik-SbaySkud-ScerScasSklu Spar031.530.5300229 Smik-Sbay31.5034.25294223 Skud-Scer30.534.250319.5248 Scas300294319.5095 Sklu229223248950 Smik Sbay Skud Scer Perform the next merger
8
SparSmik-SbaySkud-ScerScasSklu Spar031.530.5300229 Smik-Sbay31.5034.25294223 Skud-Scer30.534.250319.5248 Scas300294319.5095 Sklu229223248950 Smik Sbay Skud Scer Perform the next merger
9
Skud-Scer- Spar Smik-Sbay Skud-Scer- Spar ScasSklu Skud-Scer- Spar 032.8750309.75238.5 Smik-Sbay32.8750 294223 Skud-Scer- Spar 032.8750309.75238.5 Scas309.75294309.75095 Sklu238.5223238.5950 Smik Sbay Skud Scer Perform the next merger
10
Smik-Sbay Skud-Scer- Spar ScasSklu Smik-Sbay032.875294223 Skud-Scer- Spar 32.8750309.75238.5 Scas294309.75095 Sklu2232238.5950 Smik Sbay Skud Scer Extend the corresponding tree Spar Sklu Scas
11
Maximum parsimony for each possible tree for each column of the alignment compute the parsimony score of the column, given the tree return the tree with the best parsimony score
12
Maximum likelihood for each possible tree for each column of the alignment compute the likelihood of the column, given the tree return the tree with the highest likelihood Similar to parsimony, but capable of using a model of evolution. Computationally expensive. DNAML is the Phylip program for maximum likelihood. FastDNAML is a fast clone. http://evolution.genetics.washington.edu/phylip.html http://iubio.bio.indiana.edu/soft/molbio/evolve/fastdnaml/fastDNAml.html
13
Problem #1 What is the probability of observing this column, given this tree and an assumed model of evolution? ACGCGTTGGG ACGCAATGAA ACACAGGGAA T T AG Pr(column|tree,model) +
14
Solution #1 Solution: Enumerate all possible assignments to the internal nodes. Compute the probability of each tree, and sum. T T AGT T AGT T AG A A A A C A A G A
15
Problem #2 What is the probability of observing this column, given this assigned tree and an assumed model of evolution? ACGCGTTGGG ACGCAATGAA ACACAGGGAA T T AG Pr(column|tree,model) + T A A
16
Solution #2 T T AG T A A π A, π C, π G, π T m The probability of the ancestral observation being A is just π A. The probability of observing a substitution from A to T on a branch of length m is given by the evolutionary model.
17
Solution #2 T T AG T A A π A, π C, π G, π T L0 L1L2 L3L4 L5 L6 The desired probability is the product of the probabilities of the branches. L(tree) = L0 L1 L2 L3 L4 L5 L6
18
Computing the likelihood The probability of the tree is the sum of the probabilities of the individual trees. L(tree) = L(tree1) + L(tree2) + L(tree3) + … T T AGT T AGT T AG A A A A C A A G A tree1 tree2tree3
19
Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability
20
Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability Multiply probabilities of independent events. Add probabilities of mutually exclusive events.
21
Overview Parsimony Distance methods – Computing distances – Finding the tree Fitch-Margoliash Neighbor-joining UPGMA Maximum likelihood
22
Representing trees ((mouse, rat), (human, chimp)) myTree = [[mouse, rat], [human, chimp]] mouserathumanchimp
23
Problem #1 Write a program to read a parenthesized tree from a file and count the number of nodes. > cat mytree.txt (yeast, ((fly, spider), (dog, cat))) > python read-tree.py mytree.txt Read 5 species from mytree.txt.
24
Problem #2 Modify the previous program to print the leaves of the tree, indenting according to the depth. > print-tree.py mytree.txt yeast fly spider dog cat
25
Problem #3 Given: a three-column file in which the first two columns contain names of species and the third column contains the distance between them. Print to standard output a formatted matrix in which the species names are listed in the rows and columns, and values are from the input file. – Species should be listed in alphabetical order. – The program should halt and complain if a value is missing. – The matrix is assumed to be symmetric, and each pair appears only once. – Distances of zero along the diagonal are not included in the input. – Columns should be printed in the same width as the corresponding species name.
26
./print-distance-matrix.py distances.txt Read 30 values and 6 species from distances.txt. Maximum species name width = 9. ape cat dog gerbil mouse zebrafish ape 0 0.19 0.15 0.44 0.17 0.69 cat 0.19 0 0.1 0.48 0.24 0.77 dog 0.15 0.1 0 0.43 0.25 0.78 gerbil 0.44 0.48 0.43 0 0.42 0.78 mouse 0.17 0.24 0.25 0.42 0 0.85 zebrafish 0.69 0.77 0.78 0.78 0.85 0
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.