Chapter 8 The Molecular Genetics of Gene Expression
Gene Expression Steps Gene expression is the process by which information contained in genes is decoded to produce other molecules that determine the phenotypic traits of organisms The principal steps in gene expression are: Transcription: RNA molecules are synthesized by an enzyme, RNA polymerase, which uses a segment of a single strand of DNA as a template strand to produce a strand of RNA complementary in base sequence to the template DNA
Gene Expression Steps Processing: in the nucleus of eukaryotic cells, the RNA usually undergoes chemical modification Translation: the processed RNA molecule is used to specify the order in which amino acids are joined together to form a polypeptide chain. In this manner, the amino acid sequence in a polypeptide is a direct consequence of the base sequence in the DNA The protein made is called the gene product
Polypeptides Polypeptide chains are linear polymers of amino acids There are twenty naturally occurring amino acids, the fundamental building blocks of proteins Peptide bonds link the carboxyl group of one amino acid to the amino group of the next amino acid The sequence of amino acids in proteins is specified by the coding information in specific genes
Figure 08.01: The general structure of an amino acid.
Figure 08.03: Properties of a polypeptide chain.
Protein Domains Most polypeptides include regions that can fold in upon themselves to acquire well-defined structures – domains Domains interact with each other and often have specialized functions Individual domains in a protein usually have independent evolutionary origins; they come together in various combinations to create genes with novel functions via duplication of their coding regions and genomic rearrangements
Protein Domains Domains can be identified through computer analysis of the amino acid sequence Vertebrate genomes have relatively few proteins or protein domains not found in other organisms. Their complexity arises in part from innovations in bringing together preexisting domains to create novel proteins that have more complex domain architectures than those found in other organisms.
Colinearity The linear order of nucleotides in a gene determines the linear order of amino acids in a polypeptide This attribute of genes and polypeptides is called colinearity, which means that the sequence of base pairs in DNA determines the sequence of amino acids in the polypeptide in a colinear, or point-to-point, manner Colinearity is universally found in prokaryotes In eukaryotes, noninformational DNA sequences interrupt the continuity of most genes
Figure 08.05: Differences between the structures. Transcription Transcription is the process of synthesis of an RNA molecule copied from the segment of DNA that constitutes the gene RNA differs from DNA in that it is single stranded, contains ribose sugar instead of deoxyribose and the pyrimidine uracil in place of thymine Figure 08.05: Differences between the structures.
RNA Synthesis The nucleotide sequence in the transcribed mRNA is complementary to the base sequence in DNA In the synthesis of RNA, a sugar–phosphate bond is formed between the 3'- hydroxyl group of one nucleotide and the 5'- OH triphosphate of the next nucleotide in line RNA synthesis does not require a primer The enzyme used in transcription is RNA polymerase
Figure 08.06ABC: RNA synthesis.
Figure 08.07: Subunit structure of RNA polymerase from T. aquaticus. RNA Polymerases RNA polymerases are large, multisubunit complexes whose active form is called the RNA polymerase holoenzyme Bacterial cells have only one RNA polymerase holoenzyme, which contains six polypeptide chains Figure 08.07: Subunit structure of RNA polymerase from T. aquaticus. Reprinted from Cell., vol. 98, R. Mooney and R. Landick, " RNA Polymerase Unveiled", pg. 4, copyright (1999), with permission from Elsevier [http://www.sciencedirect.com/science/journal/00928674].
RNA Polymerases Eukaryotes have several types of RNA polymerase: RNA polymerase I transcribes ribosomal RNA. RNA polymerase II–all protein-coding genes as well as the genes for small nuclear RNAs RNA polymerase III–tRNA genes and the 5S component of rRNA
RNA Synthesis Particular nucleotide sequences define the beginning and end of a gene Promoter: nucleotide sequence, 20–200 bp long, is the initial binding site of RNA polymerase and transcription initiation factors Promoter recognition by RNA polymerase is a prerequisite for transcription initiation Figure 08.09: Base sequences in promoter regions of several genes in E. coli.
RNA Synthesis The consensus promoter regions in E. coli are: TTGACA (-35), centered approximately 35 base pairs upstream from the transcription start site (numbered the +1 site) TATAAT (-10) = “TATA” box: 10 base pairs upstream Transcription termination sites are inverted repeat sequences that can form loops in RNA: stop signal
Figure 08.13: Transcription termination.
Eukaryotic Transcription Eukaryotic transcription involves the synthesis of RNA specified by DNA template strand to form a primary transcript Primary transcript is processed to form mRNA that is transported to the cytoplasm The first processing step adds 7-methylguanosine to the 5'-end of the primary transcript, cap
Eukaryotic Transcription Translation of an mRNA molecule rarely starts exactly at one end and proceeds to the other end: initiation of protein synthesis may begin many nucleotides downstream from the 5'-end The 5'untranslated region followed by an open reading frame (ORF), which specifies polypeptide chain In many eukaryotic genes ORFs are interrupted by noncoding segments, introns
Eukaryotic Transcription Primary transcript contains exons and introns; introns are subsequently removed by splicing The 3'-end of an mRNA molecule following the ORF also is not translated; it is called the 3’ untranslated region The 3'- end is usually modified by the addition of a sequence called the poly-A tail
Figure 08.16: In eukaryotes, transcription and RNA processing are coupled.
Splicing RNA splicing occurs in nuclear particles known as spliceosomes The specificity of splicing comes from the five small snRNP—RNAs denoted U1, U2, U4, U5, and U6, which contain sequences complementary to the splice junctions
Figure 08.17: Dynamic interactions between some small nuclear RNAs that are involved in splicing. Adapted from H. D. Madhani and C. Guthrie, Annu. Rev. Genet. 1 (1994): 1-26.
Splicing Human genes tend to be very long even though they encode proteins of modest size The average human gene occupies 27 kb of genomic DNA, yet only 1.3 kb (~ 5 %) is used to encode amino acids
Splicing The correlation between exons and domains found in some genes suggests that the genes were originally assembled from smaller pieces The model of protein evolution through the combination of different exons is called the exon shuffle model
Translation The translation system consists of five major components: Messenger RNA is needed to provide the coding sequence of bases that determines the amino acid sequence in the resulting polypeptide chain Ribosomes are particles on which protein synthesis takes place Transfer RNA is a small adaptor molecule that translates codons into amino acid Aminoacyl-tRNA synthetases: set of molecules catalyzes the attachment of an amino acid to its corresponding tRNA molecule Initiation, elongation, and termination factors
Translation: Initiation In eukaryotes, initiation takes place by scanning the mRNA for an initiation codon The elongation factor eIF4F binds to the 5' cap on the mRNA and recruits eIF4A and eIF4B This creates a binding site for a charged tRNAMet (an initiator tRNA), bound with elongation factor eIF2, and a small 40S ribosomal subunit together with eIF3 and eIF5 These components all come together at the 5' cap and form the 48S initiation complex
Figure 08.19 Initiation of protein synthesis.
Translation: Initiation The initiation complex moves along the mRNA in the 3' direction, scanning for the first of the initial codon AUG eIF5 causes the release of all the initiation factors and the recruitment of a large 60S ribosomal subunit This subunit includes binding sites for tRNA molecules: the E (exit) site, the P (peptidyl) site, and the A (aminoacyl) site. At the beginning the tRNAMet is located in the P site and the A site is the next in line to be occupied. The tRNA binding is accomplished by hydrogen bonding between the AUG codon in the mRNA and the three-base anticodon in the tRNA.
Translation: Elongation The first step: the 40S subunit moves one codon farther along the mRNA, and the charged tRNA corresponding to the new codon is brought into the A site on the 60S subunit A peptidyl transferase catalyzes a reaction in which the bond connecting the methionine to the tRNAMet is transferred to the amino group of the next amino acid, forming the first peptide bond The next step: the 60S subunit swings forward to catch up with the 40S, and simultaneously the tRNAs in the P and A sites of the large subunit are shifted to the E and P sites, respectively
Figure 08.20AB: Elongation cycle in protein synthesis.
Figure 08.20CD: Elongation cycle in protein synthesis.
Translation: Elongation One cycle of elongation is now completed, and the entire procedure is repeated for the next codon Eukaryotes synthesize a polypeptide chain at the rate of about 15 amino acids per second Elongation in prokaryotes is a little faster (about 20 amino acids per second), but the essential processes are very similar
Translation: Termination When a stop codon is encountered, the tRNA holding the polypeptide remains in the P site, and a release factor (RF) binds with the ribosome. GTP hydrolysis provides the energy to cleave the polypeptide from the tRNA to which it is attached The 40S and 60S subunits are recycled to initiate translation of another mRNA Eukaryotes have only one release factor There are three release factors in E. coli
Figure 08.21: Termination of protein synthesis.
Figure 08.26: Direction of synthesis of RNA. Translation The mRNA is translated in the 5' -to-3' direction. The polypeptide is synthesized from the amino end toward the carboxyl end Figure 08.26: Direction of synthesis of RNA.
Translation Most polypeptide chains fold correctly as they exit the ribosome: they pass through a tunnel in the large ribosomal subunit that is long enough to include about 35 amino acids Emerging from the tunnel, protein enters into a sort of cradle formed by a protein associated with the ribosome: it provides a space where the polypeptide is able to undergo its folding process. The proper folding of more complex polypeptides is aided by proteins called chaperones and chaperonins
Figure 08.24: Alternative pathways in protein folding.
Translation: Prokaryotes In prokaryotes, mRNA molecules have no cap and there is no scanning mechanism In E. coli, IF-1 and IF-3 initiation factors interact with the 30S subunit and IF-2 binds with a special tRNA charged with formylmethionine tRNAfMet These components bind with an mRNA at the ribosome-binding site, RBS or the Shine–Dalgarno sequence. Together, they recruit a 50S sub-unit
Translation: Prokaryotes mRNA molecules contain information for the amino acid sequences of several different proteins; such a molecule is called a polycistronic mRNA Cistron: DNA sequence that encodes a single polypeptide chain In a polycistronic mRNA, each protein coding region is preceded by its own ribosome-binding site and AUG initiation codon
Translation: Prokaryotes After the synthesis of one polypeptide is finished, the next along the way is translated The genes contained in a polycistronic mRNA often encode the different proteins of a metabolic pathway. Figure 08.25: Different products are translated from a threecistron mRNA molecule by the ribosomes of prokaryotes and eukaryotes.
Genetic Code The genetic code is the list of all codons and the amino acids that they encode Main features of the genetic code were proved in genetic experiments carried out by F. Crick and collaborators: Translation starts from a fixed point There is a single reading frame maintained throughout the process of translation
Genetic Code Each codon consists of three nucleotides. Evidence for a triplet code came from three-base insertions and deletions. Code is nonoverlapping Code is degenerate: each amino acid is specified by more than one codon Most of the codons were determined from in vitro polypeptide synthesis
Figure 08.28: Shift in the reading frame. Mutations that delete or add a base pair shift the reading frame and are called frameshift mutations. Figure 08.28: Shift in the reading frame.
Figure 08.29: Interpretation of the rII frameshift mutations.
Table 08.03: The Standard Genetic Code. Genetic code is universal: the same triplet codons specify the same amino acids in all species Mutations occur when changes in codons alter amino acids in proteins Table 08.03: The Standard Genetic Code.