Finding genes in the genome

Slides:



Advertisements
Similar presentations
The Central Dogma DNA  RNA  Protein  Function Replication
Advertisements

Prokaryotic Gene Regulation:
Application to find Eukaryotic Open reading frames. Lab.
Prokaryotic Gene Regulation: Lecture 5. Introduction The two types of transcription regulation control in prokaryotic cells The lac operon an inducible.
An Introduction to Bioinformatics Finding genes in prokaryotes.
Regulation of eukaryotic gene sequence expression Lecture 6.
On line (DNA and amino acid) Sequence Information Lecture 7.
Lecture 4: DNA transcription
Eukaryotic Gene Regulation. Introduction Difference between eukaryotic and prokaryotic DNA Regulation at chromosome level Regulation at the transcription.
JEOPARDY #2 DNA and RNA Chapter 12 S2C06 Jeopardy Review
Sequencing a genome and Basic Sequence Alignment Lecture 10 1Global Sequence.
Finding Eukaryotic Open reading frames.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence.
Biological Motivation Gene Finding in Eukaryotic Genomes
Finding prokaryotic genes and non intronic eukaryotic genes
Sequencing a genome and Basic Sequence Alignment
Regulation of eukaryotic gene sequence expression
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
The Genetic Code and Transcription
Day 2! Chapter 15 Eukaryotic Gene Regulation Almost all the cells in an organism are genetically identical. Differences between cell types result from.
Gene Structure and Identification
Medical Genetics-Transcription and Translation Robert F. Waters, PhD
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Introns and Exons DNA is interrupted by short sequences that are not in the final mRNA Called introns Exons = RNA kept in the final sequence.
Transcription BIT 220 Chapter 12 Basic process of Transcription Figures 12.3 Figure 12.5.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Control of Gene Expression Prokaryotes and Operons.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
Chapter 10 Transcription RNA processing Translation Jones and Bartlett Publishers © 2005.
Sequencing a genome and Basic Sequence Alignment
Fig.1.8 DNA STRUCTURE 5’ 3’ Antiparallel DNA strands Hydrogen bonds between bases DOUBLE HELIX 5’ 3’
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Overview of Bioinformatics 1 Module Denis Manley..
Copyright © 2009 Pearson Education, Inc. Chapter 14 The Genetic Code and Transcription Copyright © 2009 Pearson Education, Inc.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Assignment sample solution: Lecture 5. overview Generic types of regulation control Regulation of the “sugar” lactose gene(s) for the bactria e. coli.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Exam #1 is T 2/17 in class (bring cheat sheet). Protein DNA is used to produce RNA and/or proteins, but not all genes are expressed at the same time or.
Annotation of eukaryotic genomes
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Expression & Regulation Chapter 8.6. KEY CONCEPT Gene expression is carefully regulated in both prokaryotic and eukaryotic cells.
HOW DO CELLS KNOW WHEN TO EXPRESS A GENE? DO NOW:.
Eukaryotic Gene Regulation
Protein Synthesis Introduction Chapter 17. What you need to know! Key terms: gene expressions, transcription, and translation How eukaryotic cells modify.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Transcription.
Exam #1 is T 9/23 in class (bring cheat sheet).
Prokaryotic cells turn genes on and off by controlling transcription.
Gene architecture and sequence annotation
Prokaryotic cells turn genes on and off by controlling transcription.
Recitation 7 2/4/09 PSSMs+Gene finding
Regulation of Gene Expression
Introduction to Bioinformatics II
Prokaryotic cells turn genes on and off by controlling transcription.
Prokaryotic cells turn genes on and off by controlling transcription.
From Mendel to Genomics
Prokaryotic cells turn genes on and off by controlling transcription.
Prokaryotic cells turn genes on and off by controlling transcription.
Gene Structure.
Eukaryotic Gene Regulation
Prokaryotic cells turn genes on and off by controlling transcription.
Gene Structure.
Presentation transcript:

Finding genes in the genome Lecture 7 Global Sequence

Introduction The open reading frame: (OFR) Finding genes in prokaryotes. Finding gene in Eukaryotes EST (cDNA) and there role. Finding promoters

Introduction Homolgous approach: sequence similarity [discussed in the next lecture] Has proven to be useful but only if similar sequences already exist in the database. However, if there is no similar sequences then must apply the general property of genes; start codon…stop codon to analyse our sequence. Global Sequence

Open Reading Frames (ORF) If the homologous approach is not successful then you look for ORF This is a region of the DNA which could be a coding sequence (CDS) of a gene [not the promoter, untranslated region (UTR)… It has a start codon (ATG) and an end codon [ one of three] (TAA, TAG, TGA) If you have a novel sequence you would look for all ORF in all 6 reading, 3 reading frames per strand, as a

Finding potential OFR Translate each reading frame beginning at: Base 1: 5’ 3’ frame 1 Base 2: 5’ 3’ frame 2 Base 3: 5’3’ frame 2 Why no need for frame 4? Get the “reverse compliment of the given strand” and repeat the process”; 3’ 5’ frame 1…. Look for start and stop codons (amino acids). Note: in a fasta file the gene will be in the given sequence (strand ) so no need to get the reverse compliment. Global Sequence

Is the ORF a gene First check length of the ORF; {consider the smallest protein is about 20 aa in length.] Check for the presence of promoters upstream of the ORF (TATAAT) sequence… Search for genes with similar aa sequences to the candidate gene. Prokaryotes and eukaryotes take different approaches which takes into account the difference in their gene structure.

ORF’s in prokaryotic genes In prokaryotic genes the ORF or protein coding sequence beings with a start codon and ends with a stop codon. Gene density is about 1 per kilobase, ORF every 1000 bases. In some cases the genes density can cause the stop codon of one gene to overlap with the promoter of another [ Zvelebil chapter 9] E. G. Within the lac operon there are 3 genes (CDS) all in close proximity: so the ATG lac Y is close to TAG of LacZ…. Global Sequence

Review of Eukaryotic gene expression expression Eukaryotic expression showing exons/ introns…, adapted from Zhang 2002

ORF in Eukaryotes Gene density is much lower; genes are further apart and can vary significantly between chromosomes (~ 1.5% of human DNA is CDS). ORF contain introns between the coding sequences (CDS) of exons. Further detail can be found at klug 2010. An added problem in relation to interpretating the data is; e.g. if the intron contains a stop codon sequence it means it is only a; e.g. a “tta”, sequence and not a stop codon Further details on finding and prediction of exons can be found at (Baxevanis 2005) Global Sequence

Finding Coding regions in Eukaryotes Identify the TTS and the Untranslated regions: Like coding region they also contain exons and introns there are Untranslated regions (UTRs), on both sides of the CDS (both at the 5’ and 3’ end of the coding mRNA) and they play a part in regulating translation via: degradation, attaching to the ribosome and promote or inhibit translation. Identify start and stop signals (Zhang 2002 Chasin 2007) Initial exon (start and 5’ splice site) Internal exon (3’ and 5’ site) Terminal site (3’ and and stop codon) There is compositional bias in: the coding regions; and also at splice sites Database pattern searches can also be used where it is assumed that coding regions have a higher degree of conservation than not coding regions. It is important to be aware that the length exons and introns may not be multiples of 3.[ Zvelebil chapter 9 and chapter 5 Baxevanis] Global Sequence

Promoter Analysis The existence of a “potential” ORF indicates the presence of a near by promoter. Promoter are essential elements upstream of the protein coding sequence that are essential in the transcription process and exist in both eukaryotic and prokaryotic organisms. The figure below illustrates a number of eukaryotic promoters and illustrates the variability. [klug 7th ed] . However it also illustrates the common features: TATA box… Global Sequence

Promoter Analysis In Prokaryotes: the TATAAT region, pribnow box, just upstream, of the TTS (transcription start site). (-10 b.p.) A further marker, TTGACA, may also be found 25 p.b. from this position. (-35 bp) In Eukaryotes there are 3 subsections of the promoter. The core/basal promoter (~80 bp from the TSS) (klug p. 321) In most cases in contains a TATA box (25 bp upstream of TSS) Many contain a CAAT box and are GC elements rich. The proximal/upstream promoter (~ 250 bp from the TSS) There is wide variation in this region from one gene to another. The distal promoter (much further upstream) Global Sequence

Promoter Analysis The identification of a Core promoter indicates the presence of a gene and visa versa so prediction of both to an extent complement each other. Promoters characterisation (discovering transcription factor binding patterns) takes two basic approaches (Chapter 5 Baxevanis 2005): Pattern Driven Algorithms: depends on existing annotated data, in bioinformatics databases, that relate to binding sites Sequence-driven algorithms: the assumption that common, promoter functionality can be obtained from underlying conserved, sequences. Genes that are co-regulation or co-expression provide good candidates for obtaining data for this approach. Global Sequence

Potential exam questions Open reading frames (ORFs) are an essential part of finding genes in genomes: Discuss how you would attempt to find ORF’s and why such ORF’s are a more accurate prediction of protein structure in bacterial cells as opposed to animal cells A critical part of finding the protein coding regions of DNA sequences is the discovery of open reading frames (ORF). Discuss the difficulties associated with finding such sequences in Eukaryotic cells

Reference Baxevanis, A.D. 2005 Bioinformatics: a practical guide to the analysis of genes and proteins. Wiley; Chapter 5. [book is in the library] Kel, A. E. et al 2003: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences; Nucleic Acids Res. 2003 July 1; 31(13): 3576–3579 Klug, W.A. et al 2010; Concepts of Genetics; Pearson Education p. 596-p.597 Zhang, M.Q. 2002 Computational prediction of eukaryotic coding genes. Nat Rev. Genet. 3 698-709. Chasin, L.A. 2007 Searching for splicing motifs. Adv Exp Med Biol. 623:85-106 Zvelebil M. “understanding bioinformatics” chapter 9 {book is in the library] Global Sequence