Download presentation
1
Finding genes in the genome
Lecture 7 Global Sequence
2
Introduction The open reading frame: (OFR)
Finding genes in prokaryotes. Finding gene in Eukaryotes EST (cDNA) and there role. Finding promoters
3
Introduction Homolgous approach:
sequence similarity [discussed in the next lecture] Has proven to be useful but only if similar sequences already exist in the database. However, if there is no similar sequences then must apply the general property of genes; start codon…stop codon to analyse our sequence. Global Sequence
4
Open Reading Frames (ORF)
If the homologous approach is not successful then you look for ORF This is a region of the DNA which could be a coding sequence (CDS) of a gene [not the promoter, untranslated region (UTR)… It has a start codon (ATG) and an end codon [ one of three] (TAA, TAG, TGA) If you have a novel sequence you would look for all ORF in all 6 reading, 3 reading frames per strand, as a
5
Finding potential OFR Translate each reading frame beginning at:
Base 1: 5’ 3’ frame 1 Base 2: 5’ 3’ frame 2 Base 3: 5’3’ frame 2 Why no need for frame 4? Get the “reverse compliment of the given strand” and repeat the process”; 3’ 5’ frame 1…. Look for start and stop codons (amino acids). Note: in a fasta file the gene will be in the given sequence (strand ) so no need to get the reverse compliment. Global Sequence
6
Is the ORF a gene First check length of the ORF; {consider the smallest protein is about 20 aa in length.] Check for the presence of promoters upstream of the ORF (TATAAT) sequence… Search for genes with similar aa sequences to the candidate gene. Prokaryotes and eukaryotes take different approaches which takes into account the difference in their gene structure.
7
ORF’s in prokaryotic genes
In prokaryotic genes the ORF or protein coding sequence beings with a start codon and ends with a stop codon. Gene density is about 1 per kilobase, ORF every 1000 bases. In some cases the genes density can cause the stop codon of one gene to overlap with the promoter of another [ Zvelebil chapter 9] E. G. Within the lac operon there are 3 genes (CDS) all in close proximity: so the ATG lac Y is close to TAG of LacZ…. Global Sequence
8
Review of Eukaryotic gene expression expression
Eukaryotic expression showing exons/ introns…, adapted from Zhang 2002
9
ORF in Eukaryotes Gene density is much lower; genes are further apart and can vary significantly between chromosomes (~ 1.5% of human DNA is CDS). ORF contain introns between the coding sequences (CDS) of exons. Further detail can be found at klug 2010. An added problem in relation to interpretating the data is; e.g. if the intron contains a stop codon sequence it means it is only a; e.g. a “tta”, sequence and not a stop codon Further details on finding and prediction of exons can be found at (Baxevanis 2005) Global Sequence
10
Finding Coding regions in Eukaryotes
Identify the TTS and the Untranslated regions: Like coding region they also contain exons and introns there are Untranslated regions (UTRs), on both sides of the CDS (both at the 5’ and 3’ end of the coding mRNA) and they play a part in regulating translation via: degradation, attaching to the ribosome and promote or inhibit translation. Identify start and stop signals (Zhang 2002 Chasin 2007) Initial exon (start and 5’ splice site) Internal exon (3’ and 5’ site) Terminal site (3’ and and stop codon) There is compositional bias in: the coding regions; and also at splice sites Database pattern searches can also be used where it is assumed that coding regions have a higher degree of conservation than not coding regions. It is important to be aware that the length exons and introns may not be multiples of 3.[ Zvelebil chapter 9 and chapter 5 Baxevanis] Global Sequence
11
Promoter Analysis The existence of a “potential” ORF indicates the presence of a near by promoter. Promoter are essential elements upstream of the protein coding sequence that are essential in the transcription process and exist in both eukaryotic and prokaryotic organisms. The figure below illustrates a number of eukaryotic promoters and illustrates the variability. [klug 7th ed] . However it also illustrates the common features: TATA box… Global Sequence
12
Promoter Analysis In Prokaryotes:
the TATAAT region, pribnow box, just upstream, of the TTS (transcription start site). (-10 b.p.) A further marker, TTGACA, may also be found 25 p.b. from this position. (-35 bp) In Eukaryotes there are 3 subsections of the promoter. The core/basal promoter (~80 bp from the TSS) (klug p. 321) In most cases in contains a TATA box (25 bp upstream of TSS) Many contain a CAAT box and are GC elements rich. The proximal/upstream promoter (~ 250 bp from the TSS) There is wide variation in this region from one gene to another. The distal promoter (much further upstream) Global Sequence
13
Promoter Analysis The identification of a Core promoter indicates the presence of a gene and visa versa so prediction of both to an extent complement each other. Promoters characterisation (discovering transcription factor binding patterns) takes two basic approaches (Chapter 5 Baxevanis 2005): Pattern Driven Algorithms: depends on existing annotated data, in bioinformatics databases, that relate to binding sites Sequence-driven algorithms: the assumption that common, promoter functionality can be obtained from underlying conserved, sequences. Genes that are co-regulation or co-expression provide good candidates for obtaining data for this approach. Global Sequence
14
Potential exam questions
Open reading frames (ORFs) are an essential part of finding genes in genomes: Discuss how you would attempt to find ORF’s and why such ORF’s are a more accurate prediction of protein structure in bacterial cells as opposed to animal cells A critical part of finding the protein coding regions of DNA sequences is the discovery of open reading frames (ORF). Discuss the difficulties associated with finding such sequences in Eukaryotic cells
15
Reference Baxevanis, A.D Bioinformatics: a practical guide to the analysis of genes and proteins. Wiley; Chapter 5. [book is in the library] Kel, A. E. et al 2003: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences; Nucleic Acids Res July 1; 31(13): 3576–3579 Klug, W.A. et al 2010; Concepts of Genetics; Pearson Education p. 596-p.597 Zhang, M.Q Computational prediction of eukaryotic coding genes. Nat Rev. Genet Chasin, L.A Searching for splicing motifs. Adv Exp Med Biol. 623:85-106 Zvelebil M. “understanding bioinformatics” chapter 9 {book is in the library] Global Sequence
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.