Bioinformatics Presented by Frank H. Osborne, Ph. D. © 2005 Bio 2900 Computer Applications in Biology
Bioinformatics Bioinformatics is the computational branch of molecular biology. It involves using computers in the analysis of DNA, RNA and protein sequences. It is part of a larger field of biology called Computational Biology.
Protein Synthesis Generally, we begin with DNA. DNA is transcribed to produce RNA. RNA is then translated to produce protein. The protein is the result of the expression of a gene.
Amino Acids Proteins are made of amino acids. There are about 20 that are generally used in protein molecules. A set of three-letter abbreviations is used for the amino acids in biochemistry. The International Union of Pure and Applied Chemistry (IUPAC) has created one-letter abbreviations to ease work in bioinformatics.
Amino Acid Table
Additional Amino Acid Codes Additionally, IUPAC recognizes other code letters for special situations. There are an additional four codes that may be used.
Additional Amino Acid Code Table
DNA Deoxyribonucleic acid (DNA) is made up of purine bases (adenine and guanine) and pyrimidine bases (cytosine and thymine). Bases are part of nucleotides which are formed using the sugar deoxyribose. Nucleotides are connected by condensation reaction from the 5’OH to the 3’OH.
DNA For DNA sequences, the IUPAC has established the one-letter codes shown below.
RNA The IUPAC one-letter codes for RNA are shown below.
Gene structure A gene is a sequence of bases of DNA. It begins at a location known as a promoter and ends at another location called the terminator.
Gene expression Genes are expressed by transcription and translation of DNA. DNA is first transcribed to make messenger RNA. The genetic code of the messenger RNA is translated into protein.
RNA polymerase Transcription uses DNA-dependent RNA- polymerase. RNA polymerase holoenzyme consists of a core enzyme of four polypeptides and another factor called factor. Core enzyme = – 2 identical subunits – , ’ similar but different proteins Holoenzyme = core enzyme + factor There are different types of promoters that are recognized by different factors.
Transcription Transcription consists of three stages called initiation, elongation and termination. Note that these are not the same as initiation, elongation and termination of protein synthesis, which make up the process of translation.
Stages of transcription Initiation –RNA polymerase attaches to the promoter. An open complex forms. Elongation –RNA polymerase moves along the DNA molecule making a molecule of RNA as it travels. Termination –RNA polymerase reaches the terminator. The RNA is released.
Translation The mRNA molecule is translated into protein using the standard genetic code. There are some exceptions, especially during protein synthesis in mitochondria.
Stages of translation Initiation –Ribosomes bind to the ribosome-binding site on the mRNA molecule known as the Shine- Dalgarno sequence adjacent to AUG. Elongation –Transfer RNA brings each amino acid to the amino-acyl site according to the specified codons. Termination –The completed protein is released from the peptidyl site.
Gene organization in Bacteria A cistron is a distinct region of DNA that codes for a particular polypeptide. The term is used in the context of a protein which is made up of several subunits, each of which is coded by a different gene. An operon is a common form of gene organization in bacteria.
Genotypes and phenotypes The genotype is an actual gene in the chromosome. The phenotype is the observed effect of that gene. Genotypes are given using italic letters. Phenotypes are written in ordinary, regular letters. Thus, two of the tryptophan genes in E. coli would be trpA and trpB. When expressed, they produce polypeptides. The trpA gene produces trpA (TrpA) polypeptide and the trpB gene produces trpB (TrpB) polypeptide.
Regulation of gene expression The lac operon The lac operon contains the genes necesary to utilize lactose. Lactose is a -galactoside sugar containing galactose (1,4) as shown below.
Regulation of gene expression Products of the lac operon The lac operon codes for three proteins; LacZ, LacY, LacA; which are directly involved in galactoside (lactose) utilization. –LacZ - b-D-galactosidase (EC ) –LacY - galactoside permease (M protein) –LacA - galactoside acetyltransferase (EC ) These enzymes appear adjacent to each other on the E. coli chromosome. They are preceded by a region of the chromosome responsible for the regulation of these genes.
Regulation of gene expression Function of the lac operon lacI - gene for the lac repressor protein lacPi - promoter for lacI lacP - promoter for lac operon lacO - operator: binding site for the repressor LacI is a repressor that binds to the promoter (lacP) and prevents the gene from being transcribed. This type of control is known as transcriptional regulation.
Induction and repression When lactose is present it induces the operon by binding to the repressor and changing its shape, causing it to fall off the operator. When lactose is removed, the repressor goes back to its original shape and can bind to the operator again. Because the repressor binds to the operator, the RNA polymerase is said to be primed, meaning that it is ready to use as soon as the block comes off the operator.
Structure of the lac operon
Gene Expression in Eukaryotes DNA in eukaryotic organisms is organized into chromosomes. The eukaryotic chromosome consists of DNA interwound with proteins known as histones. Much eukaryotic DNA has either no function or unknown function. Unlike bacteria, only about 10% of eukaryotic DNA codes for proteins.
Gene Expression in Eukaryotes Eukaryotic DNA has numerous repeated nucleotide sequences. The protein-coding regions are separated by non-coding regions. The non-coding regions are called introns. The coding sequences that are expressed as protein are called exons.
Transcription in Eukaryotic Cells
The End