Richard Deem, Paradoxes Class, March 16, 2014
Nucleus A A T T G G C C A A T T A A G G T T G G C C C C A A T T T T G G C C A A T T A A T T G G C C T T A A G G C C C C T T A A G G C C A A T T A A T T G G C C G G G G C C A A T T A A DNA mRNA Protein Transcription Translation G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U ER G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U Chloroplast Mitochondrion
Guanine (G) Adenine (A) Purines C CHC C N N N N CH H NH 2 C CC C HN N O H2NH2N N N CH H Thymine (T) Cytosine (C) Pyrimidines C CHC C HN N H O O CH 3 CH C C N N H NH 2 O
Nucleotide Glycosidic Bond Adenine (base) C CCH C N N N N HC NH 2 O OH OCH 2 Sugar (Deoxyribose) 5’ 3’ Nucleoside O-O- P O O-O-
HC C C N N NH O H O CH 2 O-O- P O O O C CHC C N N N N CH HN H O H2CH2CO-O- P O O-O- O O C HC C C NH N O H3CH3C O OH H2CH2C O-O- P O O O C CCH C N N N N HC NH H O CH 2 O-O- P O O-O- O Adenine Cytosine Guanine Thymine 5’ 3’ C CHC C HN N O O CH 3 O OH H2CH2CO-O- P O O O C C C C NH N O N N HC H O CH 2 O-O- P O O O C C C C HN N O N N CH H O H2CH2CO-O- P O O O C C N N HN O H O H2CH2CO-O- P O O O Adenine Cytosine Guanine Thymine 3’ 5’ Hydrogen Bond
A A T T G G C C A A T T A A G G T T G G C C C C A A T T T T G G C C A A T T A A T T G G C C T T A A G G C C C C T T A A G G C C A A T T A A T T G G C C G G G G C C A A T T A A
Chromosome Nucleosome DNA Histone H1 4 Histone protein pairs
Electron Micrograph Karyotype Telomere Centromere
Heterochromatin (condensed DNA) Euchromatin (actively transcribed DNA) Nucleus
C C C C N N O O CH 3 HH Deoxyribose C C C C N N O O H HH Ribose T hymine U racil DNARNA A denine C ytosine G uanine
GG AA GG GG AA UU CC AA CC AA UU UU AA GG GG UU CC AA UU AA CC AA UU GG AA GG GG AA UU CC AA CC AA UU UU AA GG GG UU CC AA UU AA CC AA UU Transfer RNA Anti-codon Mesenger RNA (mRNA) GG AA GG CC UU AA UU UU CC GG GG CC CC CC UU AA GG CC UU CC GG CC AA UU CC AA CC GG CC GG AAUU AA CCGG UU AA CC GG CC GG CC GG CC GG CC GG CC GG UU AA CC GG AA AA UU UU Codon Methionine
RibosomesProtein chainsmRNA +
CodonAACodonAACodonAACodonAA UUU Phe UCU Ser UAU Tyr UGU Cys UUCUCCUACUGC UUA Leu UCAUAA Stop UGA Stop UUGUCGUAGUGG Trp CUU Leu CCU Pro CAU His CGU Arg CUCCCCCACCGC CUACCACAA Gln CGA CUGCCGCAGCGG AUU Ile ACU Thr AAU Asn AGU Ser AUCACCAACAGC AUA Met ACAAAA Lys AGA Arg AUGACGAAGAGG GUU Val GCU Ala GAU Asp GGU Gly GUCGCCGACGGC GUAGCAGAA Glu GGA GUG GCGGAG GGG
Four “letters” ( bases A, U, G, C) Four “letters” ( bases A, U, G, C) 64 three letter “words” (codons) 64 three letter “words” (codons) “Redundant” – Many “words” have the identical “meaning” “Redundant” – Many “words” have the identical “meaning” 20 unique “words” (amino acids) 20 unique “words” (amino acids) Unlimited “sentences” (proteins) Unlimited “sentences” (proteins)
Nucleus A A T T G G C C A A T T A A G G T T G G C C C C A A T T T T G G C C A A T T A A T T G G C C T T A A G G C C C C T T A A G G C C A A T T A A T T G G C C G G G G C C A A T T A A DNA mRNA Protein Transcription Translation G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U ER G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U G G A A G G G G A A U U C C A A C C A A U U U U A A G G G G U U C C A A U U A A C C A A U U
Multiple proteins from one gene
DNA Exons introns (between exons) 5’3’ mRNA Translated region Protein Transcribed region Pre-mRNA UTR
Exon5 Exon4 Int4 Exon4 Int3 Exon3 Int2 Exon2 Int1 Exon1 Exon5Exon4Exon2Exon1 Protein isoform A Protein isoform B mRNA Pre-mRNA
Overlapping regulatory and protein codes
Promoter region DNA NFAT Y2 Y1NFAT AP-1 AP NFAT -800 NF B Exons
Used enzyme DNase I Used enzyme DNase I Digested DNA from 81 different cell lines Digested DNA from 81 different cell lines Sequenced and mapped the location of all TF binding sites Sequenced and mapped the location of all TF binding sites »»» NRSF ««« USF ««« SP1 ««« DNase I cleavage per nucleotide (PLBD2 gene)
86% of genes expressed at least one duon sequence 86% of genes expressed at least one duon sequence Duons comprise 14% of all exonic coding Duons comprise 14% of all exonic coding Over 12 million base pairs Over 12 million base pairs Andrew B. Stergachis et al Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution. Science 342, 1367.
Protein Sequence LeuGln IleThrArgGlyArgSerThr CTGCAGGCCATCACCAGGGGGCGCAGCAC C CCACCAGGGGGCGCA DNA Sequence CTCF Binding Sequence CELSR2 Gene: Chr1: Andrew B. Stergachis et al Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution. Science 342, 1367.
Andrew B. Stergachis et al Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution. Science 342, 1367.
Multiple proteins from alternative reading frames
LeuGln IleThrArgGlyArgSerThr CTGCAGGCCATCACCAGGGGGCGCAGCAC C CysArgProSerProGly Ala GlyHis GlnGlyAlaGlnHis GACGTCCGGTAGTGGTCCCCCGCGTCGTG G GlnLeuGlyAspAlaProGlyArg Gly Ala MetValLeuProArgLeuVal AlaProTrpStopTrpProAlaCys
Coding of multiple proteins by overlapping reading frames is not a feature one would associate with eukaryotic genes. Indeed, codependency between codons of overlapping protein-coding regions imposes a unique set of evolutionary constraints, making it a costly arrangement. Yet in cases of tightly coexpressed interacting proteins, dual coding may be advantageous. Here we show that although dual coding is nearly impossible by chance, a number of human transcripts contain overlapping coding regions. Wen-Yu Chung, et al. A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes. PLoS Computational Biology 3 (5) e91.
Evolutionary assumptions underestimate true numbers of dual coding genes Evolutionary assumptions underestimate true numbers of dual coding genes 9% of human and 7% of mouse 9% of human and 7% of mouse Less than 30% shared: mouse:human Less than 30% shared: mouse:human 90% of genes on opposite strands 90% of genes on opposite strands 1259 human alternative proteins detected by mass spectrometry 1259 human alternative proteins detected by mass spectrometry Chaitanya R Sanna, et al. Overlapping genes in the human and mouse genomes. BMC Genomics 2008, 9:169. Benoıt Vanderperre, et al Direct Detection of Alternative Open Reading Frames Translation… PLoS ONE 8(8): e70698.
A man, a plan, a canal: Panama A man, a plan, a canal: Panama Live not on evil Live not on evil Was it a car or a cat I saw? Was it a car or a cat I saw?
GACAGAAGAAATTCTGGCAGATGTGCTCAAGGTGGAAGTCTTCAGACAGACAGTGGCGAC CCAGGTGCTAGTAGGAAGCTACTGTGTCTTCAGCAATCA LeuGln ThrAspGlySerArgSer GlyAlaLeuGlnGlySerGlnGlyAspAsnProGlyArg AlaCysArgLeuSerLysLeuCysArg Frame aa Han Liang and Laura F. Landweber A genome-wide study of dual coding regions in human alternatively spliced genes. Genome Research 16:190–196. Frame aa PheAsnGln ThrVal AspLeuValAlaLeuTyrSerVal ArgAlaThrIleGlnGluGlyAspLeuValGluPheGlnSerCysValGlu
ProGluHisArgAspTrpGlnArgGluLeuThrGluAlaGlyLeuValIleValMetValAsnLeuAspLysAlaGlnGluLeuGluValAlaPheAsp Angelo Theodoratos, et al. Splice variants of the condensin II gene Ncaph2 include alternative reading frame... FEBS Journal 279 (2012) 1422–1432. Exon 2 Long GlyCysLeuArgGlySerLeuAlaThrGlyArg SerTrpIleLeuMetCysThrProThrArgSer Trp HisThrArg Exon 2 Short GluMetValGluAsp Exon 2 Intermediate LeuIleAspGln LeuIleAspGln LeuIleAspGln Exon 1 Alternative Transcripts Alternative Reading Frame
50 bp 200 bp LadderThymusMuscleBrainBone MarrowKidneyTestisHeartSpleenLiver Lung Int 215 bp Long 232 bp Short 140 bp
At least three independent examples of design in DNA At least three independent examples of design in DNA – Alternative splicing producing multiple proteins from one gene – Duons–overlapping sequences of coding and transcription factor binding – Dual coding genes