Genomics and Gene Recognition CIS 667 April 27, 2004.

Slides:



Advertisements
Similar presentations
12.1 DNA Griffith – Questioned how bacteria made people sick/ pneumonia – Smooth strains caused, harmless strains rough – Heat killed; however, heat killed.
Advertisements

Control of Gene Expression
Toe-Tapping Transcription and Translation From Gene to Protein... Chapter 17.
Gene Expression and Control Part 2
DNA replication—when? Where? Why? What else does a cell do?
Lecture 10 DNA Translation and Control
12-5 Gene Regulation.
Section 8.6: Gene Expression and Regulation
The Molecular Genetics of Gene Expression
(CHAPTER 12- Brooker Text)
DNA / RNA Chapter 08.
Gene expression.
RNA Ribonucleic Acid.
The Genetic Code and Transcription
{ DNA Processes: Transcription and Translation By: Sidney London and Melissa Hampton.
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Transcription Transcription- synthesis of RNA from only one strand of a double stranded DNA helix DNA  RNA(  Protein) Why is RNA an intermediate????
Medical Genetics-Transcription and Translation Robert F. Waters, PhD
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
From Gene To Protein Chapter 17. The Connection Between Genes and Proteins Proteins - link between genotype (what DNA says) and phenotype (physical expression)
Quiz tiiiiime What 3 things make up a nucleotide?
Activate Prior Knowledge
Gene Expression Chapter 13.
NUCLEIC ACIDS AND PROTEIN SYNTHESIS. QUESTION 1 DNA.
From DNA to Protein Chapter DNA, RNA, and Gene Expression  What is genetic information and how does a cell use it?
RNA and Protein Synthesis
RNA AND PROTEIN SYNTHESIS RNA vs DNA RNADNA 1. 5 – Carbon sugar (ribose) 5 – Carbon sugar (deoxyribose) 2. Phosphate group Phosphate group 3. Nitrogenous.
Part Transcription 1 Transcription 2 Translation.
The information content of DNA is in the form of specific sequences of nucleotides The DNA inherited by an organism leads to specific traits by dictating.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
From Gene to Protein A.P. Biology. Regulatory sites Promoter (RNA polymerase binding site) Start transcription DNA strand Stop transcription Typical Gene.
Raven - Johnson - Biology: 6th Ed. - All Rights Reserved - McGraw Hill Companies Genes and How They Work Chapter 15 Copyright © McGraw-Hill Companies Permission.
Chapter 17 From Gene to Protein
How Genes Work Ch. 12.
Chapter 7 Gene Expression and Control Part 2. Transcription: DNA to RNA  The same base-pairing rules that govern DNA replication also govern transcription.
Chapter 17 From Gene to Protein. Gene Expression DNA leads to specific traits by synthesizing proteins Gene expression – the process by which DNA directs.
Section 2 CHAPTER 10. PROTEIN SYNTHESIS IN PROKARYOTES Both prokaryotic and eukaryotic cells are able to regulate which genes are expressed and which.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Ch. 17 From Gene to Protein. Genes specify proteins via transcription and translation DNA controls metabolism by directing cells to make specific enzymes.
GENE EXPRESSION What is a gene? Mendel –Unit of inheritance conferring a phenotype Modern definition –Unit of DNA directing the synthesis of a polypeptide.
Control of Eukaryotic Genome
Chapter 14.  Ricin (found in castor-oil plant used in plastics, paints, cosmetics) is toxic because it inactivates ribosomes, the organelles which assemble.
Chapter 17.1 & 17.2 Process from Gene to Protein.
The Genetic Code and Transcription Chapter 12 Honors Genetics Ms. Susan Chabot.
The Building of Proteins from a Nucleic Acid Template
Protein Synthesis-Transcription Why are proteins so important? Nearly every function of a living thing is carried out by proteins … -DNA replication.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Genes and Protein Synthesis
Lecture 4: Transcription in Prokaryotes Chapter 6.
DNA and RNA II Sapling Chapter 6 short version You are responsible for textbook material covered by the worksheets. CP Biology Paul VI Catholic High School.
Transcription and The Genetic Code From DNA to RNA.
Protein Synthesis RNA, Transcription, and Translation.
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu How Proteins Are Made Chapter 10 Table of Contents Section 1 From.
Chapter 13: Gene Regulation. The Big Picture… A cell contains more genes than it expresses at any given time – why? Why are cells in multicellular organisms.
The Central Dogma of Life. replication. Protein Synthesis The information content of DNA is in the form of specific sequences of nucleotides along the.
Ch 12 DNA and RNA 12-1DNA 12-2 Chromosomes and DNA Replication 12-3 RNA and Protein Synthesis 12-4 Mutations 12-5 Gene Regulation 12-1DNA 12-2 Chromosomes.
DNA and RNA Structure of DNA Chromosomes and Replication Transcription and Translation Mutation and Gene Regulation.
Chapter 12 Gene Expression. From DNA to Protein  Things to remember:  Proteins can be structural (muscles) or functional (enzymes).  Proteins are polymers.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
The flow of genetic information:
Transcription and Translation HL 2014!
Gene Expression - Transcription
Transcription.
GENE EXPRESSION AND REGULATION
Transcription.
Protein Synthesis.
Chapter 10 How Proteins Are Made.
How to Use This Presentation
How Proteins are Made Biology I: Chapter 10.
Presentation transcript:

Genomics and Gene Recognition CIS 667 April 27, 2004

Genomics and Gene Recognition How do we recognize the genes given the raw sequence data? Two different cases:  Prokaryotes: relatively easy  Eukaryotes: relatively difficult  Much “junk DNA” to search through Signals determine the beginnings and ends of genes  Need to find the signals

Prokaryotic Genomes Genomic information of prokaryotes dedicated mainly to basic tasks  Make and replicate DNA  Make new proteins  Obtain and store energy Over 60 prokaryotic genomes have been completely sequenced since mid-1990s

Prokaryotic Genomes Recall - prokaryotes have a single circular chromosome Also - no cell nucleus, therefore no splicing out of introns Therefore, prokaryotic gene structure is quite simple Transcriptional start site Promoter region Operator sequence Open Reading Frame Transcriptional stop site Translational start site (AUG) Translational stop site

Promoter Elements Gene expression begins with transcription  RNA copy of a gene made by an RNA polymerase  Prokaryotic RNA polymerases are assemblies of several different proteins   ’ protein binds to DNA template   protein links nucleotides   protein holds subunits together   protein recognizes specific nucleotide sequences of promoters

Promoter Elements  ’,  and  often very similar from one bacterial species to another  can vary (less well conserved)  Several variants often found in a cell  The ability to use several different  factors allows a cell to turn on or off expression of whole sets of genes  For example,  32 turns on gene expressions for genes associated with heat shock while   does the same for nitrogen stress and genes that always need to be expressed are transcribed by polymerases with  

Promoter Elements Each  factor recognizes a particular sequence of nucleotides upstream from the gene    looks for -35 sequence TTGACA and -10 sequence TATAAT  Other  factors look for other -35 and -10 sequences  The match need not always be exact  The better the match, the more likely transcription will be initiated

Promoter Elements Protein products from some genes are always used in tandem with those from some other genes  These related genes may share a single promoter in prokaryotic genomes and be arranged in an operon  When one gene is transcribed, so are all of the others - one polycistronic RNA molecule is produced  The lactose operon contains three genes involved in metabolism of the sugar lactose in bacterial cells

Operon

The protein encoded by the regulatory gene (pLacI) can bind to lactose or to the operator sequence of the operon  So when lactose is abundant, less likely to bind to operator sequence  When it does, it blocks transcription, thus acting as a negative regulator  Even without negative regulation, we have low levels of operon expression due to poor match of consensus sequence for the  factor A positive regulator (CRP) promotes expression

Operon

Lac Operon

Open Reading Frames Recall - 3 of the 64 codons are stop codons (UAA, UAG, UGA) - they cause translation to stop Most prokaryotic proteins are longer than 60 amino acids  Since on average we expect to find a stop codon once in every 21 (3/64) codons, the presence of a run of 30 or more codons with no stop codons (an Open Reading Frame - ORF) is good evidence that we are looking at the coding sequence of a prokaryotic gene

Open Reading Frames AUG is a start codon  Defines where translation begins  If no likely promoter sequences are found upstream of a start codon at the start of an ORF before the end of the preceding ORF, assume the two genes are part of an operon whose promoter sequence is further upstream

Termination Sequence Most prokaryotic operons contain specific signals for the termination of transcription called intrinsic terminators  Must have a sequence of nucleotides that includes an inverted repeat followed by  A run of roughly six uracils  The inverted repeat allows the RNA to form a loop structure that greatly slows down RNA synthesis  Together with the chemical properties of uracil, this is enough to end transcription

Termination Sequence

GC Content in Prokaryotic Genomes For every G within a double-stranded DNA genome there must be a C - likewise an A for every T  Only constraint on fraction of nucleotides that are G/C as opposed to A/T is that the two must add to 100%  Can use genomic GC content to identify bacterial species (ranges from 25% to 75%)  Can also use GC content to identify genes that have been obtained from other bacteria by horizontal gene transfer

Prokaryotic Gene Densities Gene density within prokaryotic genomes is very high  Between 85% and 88% of the nucleotides are typically associated with coding regions of genes  Just as large portions of chromosomes can be acquired, they can also be deleted  Portions left are those which code for essential genes

Gene Recognition in Prokaryotes Long ORFs (60 or more codons) Matches to simple promoter sequences Recognizable transcriptional termination signal (inverted repeats followed by run or uracils) Comparison with nucleotide (or amino acid) sequences of known protein coding regions from other organisms

Eukaryotic Genomes Much more complex  Internal membrane-bound compartments allows wide variety of chemical environments in each cell  Multicellular organisms  Each cell type has distinct gene expression  Size of genome may be larger  Allows for “junk DNA” Gene expression more complex and flexible than in prokaryotes

Eukaryotic Gene Structure

Promoter Elements Each different cell type requires different gene expression  Therefore eukaryotes have elaborate mechanisms for starting transcription  Prokaryotes have a single RNA polymerase - eukaryotes have three  RNA polymerase I - Ribosomal RNAs  RNA polymerase II - Protein-coding genes  RNA polymerase III - tRNAs, other small RNAs

Promoter Elements Most RNA polymerase II promoters contain a set of sequences known as a basal promoter where an initiation complex is assembled and transcription begins Also have several upstream promoter elements (typically at least 5) to which other proteins bind  Without the proteins binding upstream, initiation complex assembly is difficult

Promoter Elements RNA polymerase II does not directly recognize the basal sequences of promoters  Basal transcription factors including a TATA-binding protein (TBP) and at least 12 TBP-associated factors bind to the promoter in a specific order, facilitating binding of RNA polymerase  TATA-box 5’-TATAWAW-3’ (W is A or T) at -25 relative to transcriptional start site  Initiator sequence 5’-YYCARR-3’ (Y is C or T and R is G or A) at transcriptional start site

Transcription

Regulatory Protein Binding Sites Transcription initiation in eukaryotes relies heavily on positive regulation  Constitutive factors work on many genes and don’t respond to external signals  Regulatory factors have limited number of genes and respond to external signals  Response factors (e.g. heat shock)  Cell-specific factors (e.g. pituitary cells only)  Developmental factors (e.g. early embryo organization)

Open Reading Frames Before translation, a heterogeneous RNA (hnRNA) is transformed into mRNA by being  Capped  5’ end chemically altered  Spliced  Various splicings can occur  Polyadenylated  Long stretch of A’s added at 3’ end

Introns and Exons The introns are spliced out of the hnRNA  Protein-coding genes conform to the GU-AG rule  These are the nucleotides at the 5’ and 3’ end of the intron  Other nucleotides are examined as well Most of these are inside the intron These signals constrain introns to be at least 60 bp long - but there is no upper limit

Alternative Splicing About 20% of human genes give rise to more than one type of mRNA sequence due to alternative splicing Splice junctions can be masked, causing an exon to be spliced out The following slide shows how alternative splicing based on different splicing factors (proteins) can stop a useful protein from being produced

Alternative Splicing

GC Content Overall GC content between different genomes does not vary as much in eukaryotes as in prokaryotes  However variations in GC content within a genome can help us to recognize genes  Of all of the pairs of nucleotides, statistically, CG is found only at 20% of its expected value  No other pair is under or over represented

GC Content The expected levels of are found, however, in stretches of 1 -2 kbp at the end of the 5’ ends of many human genes  These are called CpG islands and are associated with methylation  Can cause make it easy for CG to mutate to TG or CA  High levels of methylation imply low levels of acetylation of histones (a protein which, when acetylated makes transcription of DNA possible)

Isochores Vertebrates and plants display a level of organization called isochores that is intermediate between that of genes and chromosomes  The GC content of an isochore is relatively uniform throughout  There are five classes of isochores depending on the level of GC content  Those with high GC content also have high gene density  The types of genes found in different classes differs as well

Codon Usage Bias Another hint for gene hunting can be derived from the fact that every organism prefers some equivalent triplet codon to code for proteins Real exons generally reflect the bias while randomly chosen strings of triplets do not

Gene Recognition In summary, useful DNA sequence features for gene hunting include  Known promoter elements (I.e. TATA boxes)  CpG islands  Splicing signals associated with introns  ORFs with characteristic codon utilization  Similarity to the sequences of ESTs or genes from other organisms.

Gene Expression Expression varies greatly however Tools for determining gene expression levels include cDNAs and ESTs  Complementary DNAs are synthesized from mRNAs and can be used to provide expressed sequence tags useful for contig assembly or gene recognition

cDNA

Microarrays Gene expression patterns can be studied using microarrays  Small silica (glass) chips covered with thousands of short sequences of nucleotides of known sequence  The microarray can then be used to compare the expression of all of the genes in the genome simultaneously  A gene is represented by a set of 16 probes

Microarrays The probes representing genes are arranged in a grid on the chip Fluorescently labeled cDNA from the tissue/organism we want to test is washed over the chip from the tissue/organism we want to test If a gene is expressed, it will bind to the genes tags We can detect this through pattern recognition

Microarrays Make cDNA from cells after treatment with a drug Make cDNA from cells before treatment with a drug

Microarrays

Transposition Transposons result from insertion of duplicate sequence from another part of the genome aided by a transposase enzyme  If inserted in “junk DNA”, not harmful  More common are retrotransposons which are by retroviruses (encapsulated RNA and reverse transcriptase which use a host to duplicate) like HIV

Retrovirus Replication

Virus Replication