Modular proteins I Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections – 8.1.3
Protein domains Folded structures of proteins that are larger than residues generally consist of multiple structural domains: Compact, stable units with a unique three-dimensional structure Interactions within a domain are more significant than those between domains Fold independently i.e. structural domains are also folding domains If domain performs distinct function which remains intact in the isolated domain, then it is also a functional domain Many multidomain proteins are homomultimeric i.e. contain multiple copies of a single type of structural domain: Arisen through internal duplication of complete domains Fate of domains determined by similar rules to paralogous genes
Protein domains Many multidomain proteins are heteromeric: Example is plasminogen activator where a trypsin-like serine protease is joined to kringle, finger and EGF domains May occur by fusion of two or more genes (chimeric proteins) Also known as modular proteins, with domains known as modules Certain modules occur in a wide variety of hetero- and homomultimeric proteins: Suggests mechanisms to facilitate duplication and dispersal “Building blocks” of different types of multidomain proteins are known as mobile protein modules Frequency of transfer and incorporation into new protein reflects fixation probability
Modular assembly by intronic recombination Discovery of introns provided potential new mechanisms for protein evolution: Gilbert suggested that recombination within introns could assort exons independently Idea of rapid construction of novel genes from parts of old ones led to the formulation of the exon-shuffling hypothesis According to “introns early” theories, all extant genes were constructed from a limited number of exon types Under the “introns late” theory, intronic recombination and exon shuffling could not have played a major role in the assembly of the earliest genes Original theory was that exons corresponded directly to modules and/or structural motifs
Problems with the “introns early” hypothesis In the case of many genes, no obvious correspondence was observed between protein structure and intron location It is now known that introns can also be inserted into genes i.e. structure of a gene may not be its original structure Introns suitable for exon shuffling did not originate until a relatively late stage of eukaryotic evolution Exon shuffling has only been conclusively demonstrated in “young” proteins unique to higher eukaryotes Only a special group of exons, the “symmetrical” modules, are really valuable for exon shuffling. Intron phase distribution is also a crucial factor.
Self-splicing introns Group I introns: Reaction requires only a guanine nucleotide cofactor: — Provides a free 3’-OH group that attacks the 5’ splice site — 3’-OH generated at the end of the upstream exon — Second transesterification joins the two exons Crucially depends on folded structure of the intron itself Group II introns: Does not require an external cofactor: 2’-OH of an adenine within the intron cuts the 5’ splice site 2’5’ phosphodiester bond (branch site) forms the lariat structure Although folding is still crucial, chemistry, sequence of events and lariat formation are similar to nuclear spliceosomal introns
Spliceosomal intron splicing mechanism
Spliceosomal introns Spliceosomal introns are only spliced in the presence of a complex of specific proteins and RNA known as a spliceosome Majority of intron is unimportant: as long as the 5’ and 3’ splice sites and the branch site are conserved, splicing can take place: Large insertions into spliceosomal introns, or deletions do not affect splicing efficiency Chimeric introns, containing the 5’ end of one intron and the 3’ end of another, are also properly spliced Mutations (directed or otherwise) in these regions lead to aberrant splicing Spliceosomal intron plays a minor role in its own splicing: the actual spliceosome complex is more important
Evolution of spliceosomal introns Both group I and group II self-splicing mechanisms resemble spliceosome catalysed splicing: Initial step is attack by ribose hydroxyl group on 5’ splice site In each case, reactions are transesterifications where phosphate moieties are retained in products In group II and spliceosomal introns, intron is released as a lariat Accepted that spliceosomal-catalysed splicing evolved from group II self-splicing introns : Key step was transfer of catalytic role from intron to other molecules Formation of spliceosome gave spleceosomal introns structural freedom as they no longer had to fulfil the catalytic function Generally found only in nuclear genomes of higher eukaryotes (plants, animals and fungi)
Insertion and spread of spliceosomal introns
Intron loss Plays a significant role in changing exon-intron structure of genes Introns may be eliminated through mechanism that gives rise to processed genes (retroposition) Reverse transcription can also lead to loss of only some introns: Reverse transcription of perfectly spliced mRNA and recombination with the functional gene: mutates original gene Partially processed pre-mRNA could give rise to a semi-processed gene: generates a new paralogue
Gene duplication / deletion due to intronic recombination
Exon shuffling via recombination in introns Believed that insertion of exons may occur by the same mechanism as insertion of introns: Exon shuffling may be a consequence of the occasional inclusion of exon sequences in the insertion cycle of introns Alternative splicing (exon skipping during splicing) may yield exons with flanking introns If such a composite is inserted into the genome by the same mechanism that inserts single introns (reverse splicing) we have exon shuffling Key difference between intronic recombination model and retrotransposition model: In first case, insertion occurs into a pre-existing intron of same phase as introns flanking exon Retrotransposition model does not have this requirement
Evolution of urokinase PSPSGPSGK GK
Evolution of tissue plasminogen activator PSGKPSGKF F K module duplication PSGKFK
Evolution of Factor XII PSGKPSGKG F PSGKGF FN2 PSGKGF Duplication of G module