Download presentation
Presentation is loading. Please wait.
Published byFay Melton Modified over 9 years ago
1
Chap. 6. Molecular Phylogeny
2
Charles Darwin, 1859 Natural selection Evolution Change in frequency of genes in a population Heritable changes in a population over many generations Process of mutation with selection Two essential factors that define evolution Error-prone self-replication Variation in success at self-replication Evolution
3
Self-replication Whatever is evolving must have the ability to make copies of itself Typical developments, aging etc., are not evolution Genes can self-replicate in the context of cells that they reside in “replicator” can self-replicate Asexual organisms like bacteria can self-replicate Sexual organisms can replicate, but inheriting from parents Darwin focused on genes rather than organisms as the fundamental replicators Error-prone Self-replication
4
Error-prone Copies are not always identical to the originals Perfect copies will not foster evolution In fact, current genes are from gradual changes from previous versions with slight errors Errors are essential for evolution, provided they occur not too frequently Error-prone Self-replication
5
Cell Replication Replication One double-strand DNA to two identical double- strand DNA’s One mother strand is in each of two daughter DNA’s (semi- conservative replication)
6
Replication step 1 Separate the two DNA strands At origin of replication
7
Replication step 2 Synthesize DNA from 5’ to 3’ end and at the same time 3’ to 5’ end DNA polymerase catalyzes only in 5’ to 3’ direction in new chains Original 3’-5’ (leading) strand continues replicating Original 5’-3’ (lagging) strand replicate semi-discontiously at every 1000-2000 bp (Ozaki fragment)
8
Replication step 3 Proofread and repair detect mutation, once in 10 4 to 10 5 bases Mismatch repair in E.Coli (a)Newly synthesized DNA (red) has a mismatch (G-T). (b) MutH, MutS, and MutL link the mismatch with the nearest methylation site (blue) (c) An exonuclease removes from red strand (d) DNA polymerases replace it
9
How to find the origination/termination site ? Chargaff parity rules (CPR) -1951 # of A = # of T; # of C = # of G CPR I – double strands of DNAs Obvious from complementary relationship CPR II – single strand of DNA Cause is not known yet Violation is called ‘skew’ GC skew: (G-C)/(G+C)
10
GC skew Max or min of GC skew appears at ori or ter sites
11
Oligomer skew f i : # of oligomer i in a segment OA i = ln(f i /f i’ )
12
Most organisms can increase exponentially If all organisms survived and multiplied at the same rate, there will be no change in frequency of the variants, and thus no evolution Limited by food, space, predators, etc. When population size is limited, not all variants survive A possibility of natural selection Also, chance effects exist Equal-sized populations with two variants will not stay the same even with the same degree of fitness Called random drift, the chance effect will take over the whole population This implies that evolution can occur even without natural selection, referred to as neutral evolution Variation
13
Any change in a gene sequence that is passed on to offspring Caused by A damage to DNA moledule (from radiation, etc.) Errors in replication Point mutation – simplest form of mutation and occurs all over DNA sequences Transition – mutation within purine (A,G) or pyrimidine (C,T/U) Transversion – mutation between nt groups Effects depend on where mutations occur Non-coding region – no effect on proteins, and neutral But may have significant effects if occurring in control region Coding region Synonymous substitution when a mutation does not change AA Non-synonymous AA is replaced by another stop codon is introduced Mutation
14
Models of nucleotide substitution AG TC transition transversion
15
A Jukes and Cantor one-parameter model of nucleotide substitution ( = ) G TC
16
A Kimura model of nucleotide substitution (assumes ≠ ) G TC
17
Jukes-Cantor (JC) Kimura 2P Tamura
18
Indel mutation Small indels of a single base of a few bases are frequent Caused by slippage during DNA replication Particularly frequent with repeated sequences GCGC…: insertion of extra GC or deletion cause slight slippage CAG repeated region in huntingtin protein can expand, causing Huntington’s disease Indels can cause frame shift, if indels are not multiples of three Gene inversion Whole genes are copied to offspring in reverse direction Translocation Whole genes can be deleted from one genome and inserted into another Mutation
19
Orthologs: members of a gene (protein) family in various organisms. This tree shows globin orthologs. Mutation Example
20
Paralogs: members of a gene (protein) family within a species. This tree shows human globin paralogs.
21
Globin phylogeny by Dayhoff (1972)
22
Globin phylogeny by Dayhoff in evolutionary time (1972)
24
Mature insulin consists of an A chain and B chain heterodimer connected by disulphide bridges The signal peptide and C peptide are cleaved, and their sequences display fewer functional constraints.
26
Note the sequence divergence in the disulfide loop region of the A chain
27
Historical background: insulin By the 1950s, it became clear that amino acid substitutions occur nonrandomly. For example, Sanger and colleagues noted that most amino acid changes in the insulin A chain are restricted to a disulfide loop region. Such differences are called “neutral” changes (Kimura, 1968; Jukes and Cantor, 1969) Subsequent studies at the DNA level showed that rate of nucleotide (and of amino acid) substitution is about six-to ten-fold higher in the C peptide, relative to the A and B chains.
28
Number of nucleotide substitutions/site/year 0.1 x 10 -9 1 x 10 -9
29
Surprisingly, insulin from the guinea pig (and from the related coypu) evolve seven times faster than insulin from other species. Why? The answer is that guinea pig and coypu insulin do not bind two zinc ions, while insulin molecules from most other species do. There was a relaxation on the structural constraints of these molecules, and so the genes diverged rapidly. Historical background: insulin
30
Guinea pig and coypu insulin have undergone an extremely rapid rate of evolutionary change Arrows indicate positions at which guinea pig insulin (A chain and B chain) differs from both human and mouse
31
In the 1960s, sequence data were accumulated for small, abundant proteins such as globins, cytochromes c, and fibrinopeptides. Some proteins appeared to evolve slowly, while others evolved rapidly. Linus Pauling, Emanuel Margoliash and others proposed the hypothesis of a molecular clock: Molecular clock hypothesis For every given protein, the rate of molecular evolution is approximately constant in all evolutionary lineages
32
Millions of years since divergence corrected amino acid changes per 100 residues (m) Dickerson (1971)
33
If protein sequences evolve at constant rates, they can be used to estimate the times that sequences diverged. This is analogous to dating geological specimens by radioactive decay. Molecular clock hypothesis: implications
35
A B C D E F G H I time 6 2 11 2 1 2 6 1 2 2 1 A B C 2 1 2 D E one unit Molecular phylogeny uses trees to depict evolutionary relationships among organisms. These trees are based upon DNA and protein sequence data.
36
Population Genetics Genealogical Tree Evolution tree of a gene without recombination (mtDNA, chromosome) Given the current generation, can trace back to a single copy of the gene – coalescence process Example Human mtDNA is traced back to African woman 200,000 years ago (1996)
37
Coalescence Model Assumptions Constant population of N throughout time Each individual is equally fit (same expected number of offspring) – equally likely to have any of the individuals in the previous generation as mother Pick two individuals in the present generation Prob. of having the same mother = 1/N Prob. that their most recent common ancestor lived T generations ago P(T) = (1 - 1/N) T-1 (1/N) ≈ e -T/N / N Coalescence of the lines of descent of any two individuals is exponentially distributed with the mean time until coalescence of N generations
38
Coalescence Mitochondrial Eve Used highly variable non-coding part, called D-loop The average # of site with difference: 61.1 out of 16,553 bases The average pairwise difference is 76.7 between Africans, and 38.5 between non-Africans There have been different divergent population in Africa for much longer Relatively small population left African and spread through the rest of the world The earliest branch point – 170,000 ± 50,000 Non-African migration – 52,000 ± 27,000
39
Purple/Green – all Africans Yellow/blue – non-Africans
41
Fixation in Neutral Model Mutation 1 does not survive to the present generation Mutation 2 has a chance to spread to the entire population (fixed) Most mutation die out If a mutation is neutral, the prob. of becoming fixed, P fix ? Assume N copies of a gene and that each one is equally likely to mutate Prob. that mutation occurred in the gene copy of an ancestor of the present generation is 1/N = p fix New mutation takes place with the prob. of u Rate of new fixation of new mutations is the rate at which mutations occur, multiplied by the prob. that each mutation is fixed: u fix = (Nu)*p fix = u Shows that the rate of fixation of neutral mutations is equal to the underlying mutation rate and is independent of the population size
42
Fixation in Neutral Model Number of mutation in the population changes on a random basis If m copies of a neutral mutant sequence at one generation, The number of copies at the next generation, n ≈ m Wright-Fisher model Each copy of the gene in the next generation is randomly selected from genes in the previous generation Mutation prob. a = m/N, prob. of no mutation = 1-a Prob. of n mutations in the next generation, p(n) = C N n a n (1-a) N-n The mean value: Na = m Simulation with N=200 with 2,000 generations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.