Download presentation
Presentation is loading. Please wait.
1
Gene architecture and sequence annotation
Week 2
2
Last week: How to search genomic databases such as NCBI and ensembl
How to obtain sequence files
3
This week we will learn to identify genetic architecture within sequence files
Sequence of the Cystic Fibrosis Gene: CFTR
4
This week will learn the differences between the two types of Nucleic Acid Sequences
Genomic—the sequence of nucleotides on a chromosome Expressed sequences—the sequence of nucleotides in mRNA/cDNA
5
The expression of genomic information
DNA RNA protein Bioinformatics and Functional Genomics, 2nd Edition. (2014).
6
DNA RNA protein genome transcriptome proteome
Bioinformatics and Functional Genomics, 2nd Edition. (2014).
7
DNA RNA protein phenotype protein sequence databases cDNA ESTs UniGene
genomic DNA databases Bioinformatics and Functional Genomics, 2nd Edition. (2014).
8
Learning Objectives: Understand sequence differences between genomic and expressed sequences Use programs to determine the correct open reading frame (ORF) of an expressed sequence Annotate sequence files
9
Genomic DNA is one source of nucleic acid sequence
Strachan, T. & Read, A.P. Human Molecular Genetics. (New York; Wiley-Liss, 1999).
10
The chemical properties of DNA are important for sequence analysis
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
11
DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
12
DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
13
DNA is composed of two anti-parallel strands
5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Strand 2: 5’ AGT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
14
DNA has strict base pairing rules that determine the sequence of the complementary strand
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
15
Transcription is the process of making RNA from a DNA template
protein Bioinformatics and Functional Genomics, 2nd Edition. (2014).
16
During transcription and RNA molecule is synthesized from genomic DNA
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
17
RNA polymerase adds bases to the 3’ end of the growing RNA molecule
Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
18
The rule of complementary base pairing are followed for RNA transcription
During RNA transcription Uridine is added instead of Thymine. Uridine base pairs with Adenine. In Bioinformatics we ignore this fact—all Uridine are written as Thymine. Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
19
The template strand is anti-parallel to the growing mRNA molecule
Template strand= antisense 5’ 3’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000). 3’ 5’
20
The template strand is anti-parallel to the growing mRNA molecule
non-template strand = sense strand Template strand= antisense 5’ 3’ This strand has the same sequence as the mRNA molecule 3’ 5’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).
21
Genes can be found on both strands of a chromosome
Forward strand 5’ 5’ Reverse strand
22
The original RNA molecule undergoes processing that changes the sequence
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
23
The original RNA molecule is processed
Exons are segments of DNA that are found in mature mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
24
The original RNA molecule is processed
Introns are segments of DNA that are removed through splicing. They are not found in mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
25
The original RNA molecule is processed
The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
26
The original RNA molecule is processed
The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
27
In the mRNA the exons are joined together as one continuous sequence
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
28
Translation is the process by which an mRNA molecule is used to make a protein
+1 is the first translated nucleotide (usually the A (followed by TG (ATG=Methionine)
29
Translation is the process by which an mRNA molecule is used to make a protein
The red indicates all the sequence within the mRNA that will be used during translation to code for protein
30
The sequences within an mRNA that do not directly code for protein are called Untranslated Regions
5’ UTR- UnTranslated Region before start codon—does not code for protein 3’ UTR- UnTranslated Region after stop codon—does not code for protein
31
mRNA is converted to cDNA using reverse transcription
Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).
32
Because it is cDNA, not mRNA that is sequenced we use T not U in sequence files
Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).
33
How do we identify introns/exons in our sequence files?
34
We will use KRAS as an example
35
The KRAS gene produces 4 transcripts (splice variants)
Table
36
This is the transcript diagram for this gene region
37
The Transcript Diagram shows the organization of the transcripts generated from the gene locus
38
Use the link under the “Transcript ID” column identify the exons and introns in a specific transcript
39
The exon/intron map for a specific transcript
The lines are intronic sequence
40
The exon/intron map for a specific transcript
The lines are intronic sequence Bars are exonic sequence: filled bars mean coding sequence and unfilled bars are UTR sequence
41
The exon/intron map for a specific transcript
The number of introns is always the number of exons -1. 5 exons, means 4 introns
42
The RefSeq link will direct you to the NCBI nucleotide record for that gene
43
NCBI nucleotide record
44
NCBI nucleotide record continued
45
NCBI nucleotide record also contains the sequence
46
Every nucleotide within the sequence has an exact position
60 Each nucleotide has a number associated with its position
47
NCBI nucleotide contains the annotation of the sequence
48
The numbers refer to nucleotide positions
49
Viewing features within the sequence file
50
Once you select a sequence feature, the nucleotide sequence of the feature become highlighted
51
CDS stands for coding sequence and this will also show you the translation of the nucleotide sequence into amino acid sequence
52
The genetic code DNA RNA protein
Bioinformatics and Functional Genomics, 2nd Edition. (2014).
53
The genetic code is based on three nucleotides “coding” for one amino acid
Codons Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).
54
An Open Reading Frame (ORF) begins with ATG and ends with TAA, TAG or TGA
Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).
55
To find the coding sequence you must identify the start and stop codons within the sequence
56
Which start codon is right?
57
Which start codon is right?
The correct ORF is the longest translated sequence
58
Any sequence has 6 possible reading frames
Two strands of DNA Triplet code (three nucleotides in a codon)
59
Any sequence has 6 possible reading frames
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ C GCA TGG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTT AA 3’ FRAME +2 5’ CG CAT GGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TTA A 3’ FRAME +3
60
The next three reading frames are based on the reverse complement sequence
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement
61
Generating the reverse complement sequence
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement
62
The 6 possible reading frames
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement 5’ TTA AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC CAT GCG 3’ FRAME -1 5’ T TAA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACC ATG CG 3’ FRAME -2 5’ TT AAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CCA TGC G 3’ FRAME -3
63
The correct reading frame will have the largest ORF
5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ M V L R W S S H G S V Ter 3’ (amino acids) Always ends with a stop codon Always begins with ATG ATG (M) is the start codon TAA, TAG or TGA are the three stop codons—they do not code for an amino acid
64
Using the ORF-finder program to identify ORFs
Or Google “ORF-finder”
65
Using ORF-finder
66
Using ORF-finder
67
Using ORF-finder
68
Results from ORF-finder
69
There are 6 possible reading frames
70
For our purposes, the largest ORF is the correct one
71
Selecting an ORF gives you the translation
72
ORFs begin with a start codon and end with a stop codon
73
ORF-finder results match with NCBI nucleotide
74
Sequences found in the genomic DNA are removed from the mRNA
75
Sequences found in the genomic DNA are removed from the mRNA
Introns are the sequences that are removed The mature mRNA sequence contains only exonic sequence
76
An mRNA sequence includes 5’UTR, ORF, 3’UTR
Coding sequence (red) 3’ UTR- Untranslated region after stop codon—does not code for protein 5’ UTR- Unstranslated region before start codon—does not code for protein
77
There are 6 possible reading frames in a nucleic acid sequence
78
The correct ORF is usually the largest
79
ORFs start with ATG and end with a stop codon
80
Worksheet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.