Evolution of gene function Divergent evolution: the importance of gene duplication a. Ohno’s model b. Subfunctionalization c. Neofunctionalization Introducing novelty: generation of entirely new proteins/functions a. Lateral gene transfer b. Domain fusion c. Intron junction evolution? d. New genes through TEs?
Evolution of gene function First, some key basic concepts: Selection acts on phenotypes, based on their fitness cost/advantage, to affect the population frequencies of the underlying genotypes. In the case of DNA sequence: Neutral substitutions = no effect on fitness, no effect on selection Deleterious substitutions = fitness cost * These are removed by purifying (negative) selection Advantageous substitutions = fitness advantage * These alleles are enriched for through adaptive (positive) selection
The Neutral Theory M. Kimura, 1968 Most DNA substitutions are likely to be neutral = no effect on fitness. They arise through new mutations. Given a ~constant mutation rate, can convert the # of substitutions into time of divergence since speciation = molecular clock theory. Neutral changes evolve by genetic drift, not natural selection. * Most are probably lost, some can become fixed in the population Purifying selection to remove deleterious changes is pervasive, while positive selection may be relatively rare.
The Nearly-Neutral Theory T. Ohta, 1973 Many slightly deleterious (or slightly advantageous) substitutions are not selected against efficiently if population sizes are large. Therefore, many substitutions that are nearly neutral can evolve mostly by drift. * Small populations are more subject to drift (e.g. random events). * Selection is ‘slower’ in large populations … therefore many weakly deleterious substitutions have yet to be removed by selection. ** Practically what this means is that SOME substitutions found in extant sequences can be slightly deleterious & have yet to be removed
therefore removed through purifying selection Most genes are under constraint = many substitutions are deleterious and therefore removed through purifying selection Constraint can be due to maintaining: * Protein function (e.g. catalytic site) * Protein folding & stability * Interactions with other proteins, molecules * Other features like translation efficiency, RNA folding, etc. Then how do new functions emerge? How can proteins evolve?
Evolution of gene function Divergent evolution: the importance of gene duplication a. Ohno’s model b. Subfunctionalization c. Neofunctionalization Introducing novelty: generation of entirely new proteins/functions a. Lateral gene transfer b. Domain fusion c. Intron junction evolution? d. New genes through TEs?
Most functions evolve through divergent evolution due to relaxed constraint Susumu Ohno (1970): gene duplication is the main route to neofunctionalization, where one copy is allowed to evolve an entirely new function. 1. Gene duplication 2. Brief period of complete redundancy & relaxed constraint for both genes x x xx X x x x xx 3. Often one copy is lost as a pseudogene x x x xx 3. Or one copy can evolve a new function
Most functions evolve through divergent evolution due to relaxed constraint Susumu Ohno (1970): gene duplication is the main route to neofunctionalization, where one copy is allowed to evolve an entirely new function. 1. Gene duplication 2. Brief period of complete redundancy & relaxed constraint for both genes x x xx X x x x xx 3. Often one copy is lost as a pseudogene x x x xx 3. Or one copy can evolve a new function Force & Lynch (1999) formalized the concept of subfunctionalization, where both copies evolve and the ancestral function becomes split between the paralogs
Segments often flanked by 1. Segmental (dispersed) duplication & recombination (Homologous or Illegitimate) Segments often flanked by repetitive sequence 2. Tandem duplication through replication slippage 3. Duplication through retrotransposition (= loss of introns & flanked by repeats) 4. Whole-genome duplication (WGD, covered in Lecture 5) Once a gene has been duplicated, gene conversion through recombination can obscure rates
* Identified gene duplicates (BLAST) in 9 taxa Science 2000 * Identified gene duplicates (BLAST) in 9 taxa * Dated duplicates based on # of silent substitutions (molecular clock) Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon * often these changes are ASSUMED to be neutral** * given a constant rate of point mutations, Ks can be used to date a sequence ** now people realize that Ks can also be constrained by other things besides codon Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon
The Ka/Ks ratio: a measure of constraint on coding sequences If we assume that Ks reflects the underlying neutral rate of change: Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes * taken to mean NO constraint on gene sequence Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change * implies deleterious codon changes were removed by purifying selection * therefore implies constraint on gene sequence Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes * implies codon changes have been selected for by positive selection Ks can also be used to date the age of sequences according to the ‘molecular clock’ theory
Science 2000 * Identified gene duplicates (BLAST) in 9 taxa * Dated duplicates based on # of silent substitutions (molecular clock) * Measured several features over ‘time’ (# silent substitutions) to show:
Lynch & Conery 2000
* Identified gene duplicates (BLAST) in 9 taxa Science 2000 * Identified gene duplicates (BLAST) in 9 taxa * Dated duplicates based on # of silent substitutions (molecular clock) * Measured several features over ‘time’ (# silent substitutions) to show: Duplicates experience brief window of relaxed constraint before reintroduction of purifying selection Average half-life of gene duplicates is ~4 million years In yeast and drosophila: rate of gene duplication: 0.002 - 0.02 per gene per million years, depending on species (e.g. if 13,000 genes = 31 new duplicates per genome per million years) … may be inflated if gene conversion makes ancient duplicates appear ‘young’ ** The estimated rate of gene duplication is on the same order as rate of new mutations!
X Fate of gene duplicates 1. Lost as a pseudogene x xx 2. Neofunctionalization x x x xx 3. Subfunctionalization x x x xx 4. Retained & conserved * Can be maintained due to advantage of increased dosage * Can promote regulatory innovation
Reverse transcription by TE polymerases De novo creation of new genes Retrotransposition (+/- cooption of other sequences) Often see short flanking repeats due to mechanism of TE integration Integration into the genome (in NUCLEUS) Reverse transcription by TE polymerases (in CYTOSOL) AAAAA Splicing to remove intron AAAAA Pre-mRNA
De novo creation of new genes Retrotransposition (+/- cooption of other sequences) Gene duplication into other sequences = chimeric structure/regulation
De novo creation of new genes Retrotransposition (+/- cooption of other sequences) Gene duplication into other sequences = chimeric structure/regulation Cooption of non-coding DNA (from introns, intergenic sequence)
De novo creation of new genes Retrotransposition (+/- cooption of other sequences) Gene duplication into other sequences = chimeric structure/regulation Cooption of non-coding DNA (from introns, intergenic sequence) Horizontal gene transfer (very prevalent in bacteria: Lecture 5) - also observed from bacterial parasites to insect hosts Challenge in distinguishing Novel Gene vs. missed orthology due to rapid evolution