Gene architecture and sequence annotation

Slides:



Advertisements
Similar presentations
7.3 Transcription in prokaryotes State that transcription is carried out in a 5’→ 3’ direction. Nucleotides are added in the form of ribonucleoside.
Advertisements

ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
Supplementary Fig.1: oligonucleotide primer sequences.
Transcription and Translation
Transcription & Translation Worksheet
Introduction to Molecular Biology. G-C and A-T pairing.
This week, we talked about DNA… T goes with…? G goes with…? What does DNA look like? Ok, so now we’ll talk about RNA…
 Genetic information, stored in the chromosomes and transmitted to the daughter cells through DNA replication is expressed through transcription to RNA.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Reading the blueprint of life DNA sequencing. Introduction The blueprint of life is contained in the DNA in the nuclei of eukaryotic cells and simply.
GENE MUTATIONS aka point mutations. DNA sequence ↓ mRNA sequence ↓ Polypeptide Gene mutations which affect only one gene Transcription Translation © 2010.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Nature and Action of the Gene
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Gene Prediction in silico Nita Parekh BIRC, IIIT, Hyderabad.
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
GENE EXPRESSION.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Introduction: DNA REPLICATION ________ Chromosomes in the original cell ________ Chromosomes after DNA replication Two cells; each with _______ Chromosomes.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Finding genes in the genome
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Structure and Function of DNA DNA Replication and Protein Synthesis.
GENE EXPRESSION. Transcription 1. RNA polymerase unwinds DNA 2. RNA polymerase adds RNA nucleotides (A ↔ U, G ↔ C) 3. mRNA is formed! DNA reforms a double.
 Molecules of DNA are composed of long chains of _______.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
THE ROLES OF DNA.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
DNA RNA PROTEIN EOC REVIEW.
The Central Dogma Biology I.
RNA and Protein Synthesis
Transcription and Translation
Protein Synthesis DNA RNA Protein.
Modelling Proteomes.
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Supplemental Table 3. Oligonucleotides for qPCR
Molecular Biology DNA Expression
Sequence Alignments—part 2
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Human Cells Gene Expression
BIOLOGY NOTES GENETICS PART 7 PAGES
Review Sheet: DNA, RNA & Protein Synthesis
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
DNA By: Mr. Kauffman.
DNA and RNA.
BIOLOGY NOTES GENETICS PART 7 PAGES
More on translation.
Molecular engineering of photoresponsive three-dimensional DNA
DNA, RNA, & Proteins Chapter 13.
Fundamentals of Protein Structure
NOTE SHEET 13 – Protein Synthesis
BIOLOGY NOTES GENETICS PART 7 PAGES
Bellwork What are the three parts of a DNA nucleotide?
Python.
BIOLOGY NOTES GENETICS PART 7 PAGES
Key Area 1.3 – Gene Expression
Protein Synthesis.
Transcription and Translation
Shailaja Gantla, Conny T. M. Bakker, Bishram Deocharan, Narsing R
Presentation transcript:

Gene architecture and sequence annotation Week 2

Last week: How to search genomic databases such as NCBI and ensembl How to obtain sequence files

This week we will learn to identify genetic architecture within sequence files Sequence of the Cystic Fibrosis Gene: CFTR

This week will learn the differences between the two types of Nucleic Acid Sequences Genomic—the sequence of nucleotides on a chromosome Expressed sequences—the sequence of nucleotides in mRNA/cDNA

The expression of genomic information DNA RNA protein Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

DNA RNA protein genome transcriptome proteome Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

DNA RNA protein phenotype protein sequence databases cDNA ESTs UniGene genomic DNA databases Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

Learning Objectives: Understand sequence differences between genomic and expressed sequences Use programs to determine the correct open reading frame (ORF) of an expressed sequence Annotate sequence files

Genomic DNA is one source of nucleic acid sequence Strachan, T. & Read, A.P. Human Molecular Genetics. (New York; Wiley-Liss, 1999).

The chemical properties of DNA are important for sequence analysis Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands 5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands 5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA is composed of two anti-parallel strands 5’ is the beginning of the sequence and 3’ is the end of the sequence DNA sequence is always written with 5’ at the left side and 3’ at the right side Strand 1: 5’ GAT… Strand 2: 5’ AGT… Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

DNA has strict base pairing rules that determine the sequence of the complementary strand Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

Transcription is the process of making RNA from a DNA template protein Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

During transcription and RNA molecule is synthesized from genomic DNA Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

RNA polymerase adds bases to the 3’ end of the growing RNA molecule Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

The rule of complementary base pairing are followed for RNA transcription During RNA transcription Uridine is added instead of Thymine. Uridine base pairs with Adenine. In Bioinformatics we ignore this fact—all Uridine are written as Thymine. Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

The template strand is anti-parallel to the growing mRNA molecule Template strand= antisense 5’ 3’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000). 3’ 5’

The template strand is anti-parallel to the growing mRNA molecule non-template strand = sense strand Template strand= antisense 5’ 3’ This strand has the same sequence as the mRNA molecule 3’ 5’ Cooper, G.M. The Cell: A Molecular Approach (Sunderland; Sinauer Associates, 2000).

Genes can be found on both strands of a chromosome Forward strand 5’ 5’ Reverse strand

The original RNA molecule undergoes processing that changes the sequence Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The original RNA molecule is processed Exons are segments of DNA that are found in mature mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The original RNA molecule is processed Introns are segments of DNA that are removed through splicing. They are not found in mRNA Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The original RNA molecule is processed The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

The original RNA molecule is processed The sequence in red is the coding sequence (often abbreviated CDS) Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

In the mRNA the exons are joined together as one continuous sequence Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Translation is the process by which an mRNA molecule is used to make a protein +1 is the first translated nucleotide (usually the A (followed by TG (ATG=Methionine)

Translation is the process by which an mRNA molecule is used to make a protein The red indicates all the sequence within the mRNA that will be used during translation to code for protein

The sequences within an mRNA that do not directly code for protein are called Untranslated Regions 5’ UTR- UnTranslated Region before start codon—does not code for protein 3’ UTR- UnTranslated Region after stop codon—does not code for protein

mRNA is converted to cDNA using reverse transcription Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

Because it is cDNA, not mRNA that is sequenced we use T not U in sequence files Alberts, B. et al. Molecular Biology of the Cell (New York; Garland, 1994).

How do we identify introns/exons in our sequence files?

We will use KRAS as an example

The KRAS gene produces 4 transcripts (splice variants) Table

This is the transcript diagram for this gene region

The Transcript Diagram shows the organization of the transcripts generated from the gene locus

Use the link under the “Transcript ID” column identify the exons and introns in a specific transcript

The exon/intron map for a specific transcript The lines are intronic sequence

The exon/intron map for a specific transcript The lines are intronic sequence Bars are exonic sequence: filled bars mean coding sequence and unfilled bars are UTR sequence

The exon/intron map for a specific transcript The number of introns is always the number of exons -1. 5 exons, means 4 introns

The RefSeq link will direct you to the NCBI nucleotide record for that gene

NCBI nucleotide record

NCBI nucleotide record continued

NCBI nucleotide record also contains the sequence

Every nucleotide within the sequence has an exact position 60 Each nucleotide has a number associated with its position

NCBI nucleotide contains the annotation of the sequence

The numbers refer to nucleotide positions

Viewing features within the sequence file

Once you select a sequence feature, the nucleotide sequence of the feature become highlighted

CDS stands for coding sequence and this will also show you the translation of the nucleotide sequence into amino acid sequence

The genetic code DNA RNA protein Bioinformatics and Functional Genomics, 2nd Edition. http://www.bioinfbook.org (2014).

The genetic code is based on three nucleotides “coding” for one amino acid Codons Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).

An Open Reading Frame (ORF) begins with ATG and ends with TAA, TAG or TGA Korf, Y., Yandell, M. & Bedell, J. BLAST: an essential Guide to the Basic Local Alignment Search Tool (Sebastopol; O’Reilly, 2003).

To find the coding sequence you must identify the start and stop codons within the sequence

Which start codon is right?

Which start codon is right? The correct ORF is the longest translated sequence

Any sequence has 6 possible reading frames Two strands of DNA Triplet code (three nucleotides in a codon)

Any sequence has 6 possible reading frames 5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ C GCA TGG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTT AA 3’ FRAME +2 5’ CG CAT GGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TTA A 3’ FRAME +3

The next three reading frames are based on the reverse complement sequence 5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

Generating the reverse complement sequence 5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement

The 6 possible reading frames 5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 3’ GCGTACCAGAATGCGACCTCGAGAGTACCTAGCCAAATT 5’ Complement Sequence 5’ TTAAACCGATCCATGAGAGCTCCAGCGTAAGACCATGCG 3’ Reverse Complement 5’ TTA AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC CAT GCG 3’ FRAME -1 5’ T TAA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACC ATG CG 3’ FRAME -2 5’ TT AAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CCA TGC G 3’ FRAME -3

The correct reading frame will have the largest ORF 5’ CGCATGGTCTTACGCTGGAGCTCTCATGGATCGGTTTAA 3’ 5’ CGC ATG GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT TAA 3’ FRAME +1 5’ M V L R W S S H G S V Ter 3’ (amino acids) Always ends with a stop codon Always begins with ATG ATG (M) is the start codon TAA, TAG or TGA are the three stop codons—they do not code for an amino acid

Using the ORF-finder program to identify ORFs http://www.ncbi.nlm.nih.gov/gorf/gorf.html Or Google “ORF-finder”

Using ORF-finder

Using ORF-finder

Using ORF-finder

Results from ORF-finder

There are 6 possible reading frames

For our purposes, the largest ORF is the correct one

Selecting an ORF gives you the translation

ORFs begin with a start codon and end with a stop codon

ORF-finder results match with NCBI nucleotide

Sequences found in the genomic DNA are removed from the mRNA

Sequences found in the genomic DNA are removed from the mRNA Introns are the sequences that are removed The mature mRNA sequence contains only exonic sequence

An mRNA sequence includes 5’UTR, ORF, 3’UTR Coding sequence (red) 3’ UTR- Untranslated region after stop codon—does not code for protein 5’ UTR- Unstranslated region before start codon—does not code for protein

There are 6 possible reading frames in a nucleic acid sequence

The correct ORF is usually the largest

ORFs start with ATG and end with a stop codon

Worksheet