Chapter 19 Molecular Phylogenetics
If genomes evolve by the gradual accumulation of mutations, then the amount of difference in nucleotide sequence between a pair of genomes should indicate how recently those two genomes shared a common ancestor. Two genomes that diverged in the recent past would be expected to have fewer differences than a pair of genomes whose common ancestor is more ancient. This means that by comparing three or more genomes it should be possible to work out the evolutionary relationships between them. These are the objectives of molecular phylogenetics, or updated as phylogenomics, on which you can find several important reviews published in the past year.
19.1 From classification to molecular phylogenetics Linnaeus was a systematicist not an evolutionist, his objective being to place all known organisms into a logical classification. Phylogeny indicates not just the similarities between species but also their evolutionary relationships. Morphological—>molecular—>Genomics Phylogenetics—>phylogenomics
19.1.1 The Origins of Molecular Phylogenetics
Phenetics (数值分类或表征分类) and cladistics (支序分类或分支系统学) Pheneticists argued that classifications should encompass as many variable characters as possible, these characters being scored numerically and analyzed by rigorous mathematical methods. Cladistics also emphasizes the need for large datasets but differs from phenetics in that it does not give equal weight to all characters.
Rather than making assumptions about which characters are 'important', cladistics demands that the evolutionary relevance of individual characters be defined.
Molecular data have three advantages compared with other types of phylogenetic information When molecular data are used, a single experiment can provide information on many different characters: in a DNA sequence, for example, every nucleotide position is a character with four character states, A, C, G and T. Large molecular datasets can therefore be generated relatively quickly. Molecular character states are unambiguous: A, C, G and T are easily recognizable and one cannot be confused with another. Some morphological characters, such as those based on the shape of a structure, can be less easy to distinguish because of overlaps between different character states. Molecular data are easily converted to numerical form and hence are amenable to mathematical and statistical analysis.
Other molecular data and other sequences Immunological data Protein electrophoresis DNA-DNA hybridization data DNA markers such as RFLPs, SSLPs and SNPs Gene order Gene content
Much more data are coming “454 Life Sciences — Aiming for the $10,000 Human Genome Sequence” 454 Sequencing™
19.2 The Reconstruction of DNA-based Phylogenetic Trees The objective of most phylogenetic studies is to reconstruct the tree-like pattern that describes the evolutionary relationships between the organisms being studied.
19.2.1 The key features of DNA-based phylogenetic trees
Gene trees are not the same as species trees mutation/speciation
19.2.2 Tree reconstruction Aligning the DNA sequences and obtaining the comparative data that will be used to reconstruct the tree; Converting the comparative data into a reconstructed tree; Assessing the accuracy of the reconstructed tree; Using a molecular clock to assign dates to branch points within the tree.
Sequence alignment is the essential preliminary to tree reconstruction homologous
More rigorous mathematical approaches to sequence alignment The similarity approach (Needleman and Wunsch, 1970), which aims to maximize the number of matched nucleotides. The distance method (Waterman et al., 1976), in which the objective is to minimize the number of mismatches. Multiple alignment can rarely be done effectively with pen and paper so, as in all steps in a phylogenetic analysis, a computer program is used. Clustal is often the most popular choice. For tree reconstruction and to carry out more sophisticated types of phylogenetic analysis: PAUP, PHYLIP, PAML, MacClade, and HENNIG86
Converting alignment data into a phylogenetic tree The main distinction between the different tree-building methods is the way in which the multiple sequence alignment is converted into numerical data that can be analyzed mathematically in order to reconstruct the tree. The simplest approach is to convert the sequence information into a distance matrix, which is simply a table showing the evolutionary distances between all pairs of sequences in the dataset. The neighbor-joining method (Saitou and Nei, 1987) is a popular tree-building procedure that uses the distance matrix approach. The maximum parsimony method is more rigorous in its approach compared with the neighbor-joining method, but it consume more time and CPU. The same is true with many of the other more sophisticated methods for tree reconstruction, like Maximum Likelihood methods.
Assessing the accuracy of a reconstructed tree bootstrap analysis we use the new alignment in tree reconstruction we do not simply reproduce the original analysis, but we should obtain the same tree. In practice, 1000 new alignments are created so 1000 replicate trees are reconstructed.
Molecular clocks enable the time of divergence of ancestral sequences to be estimated The molecular clock hypothesis, states that nucleotide substitutions (or amino acid substitutions if protein sequences are being compared) occur at a constant rate. Calibration is usually achieved by reference to the fossil record. Now we realize that molecular clocks are different in different organisms and are variable even within a single organism.
19.3 The applications of molecular phylogenetics Human origin AIDS origin SARS origin et al.
Further readings Philippe H, Delsuc F, Brinkmann H, Et Al. Phylogenomics. Annual Review of Ecology Evolution and Systematics 36: 541-562, 2005. Delsuc F, Brinkmann H, Philippe H. Phylogenomics And The Reconstruction Of The Tree Of Life. Nature Reviews Genetics 6 (5): 361-375, 2005 Murphy Wj, Pevzner Pa, O'brien Sj. Mammalian Phylogenomics Comes Of Age. Trends in Genetics 20 (12): 631-639, 2004