BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:

Slides:



Advertisements
Similar presentations
The Central Dogma DNA  RNA  Protein  Function Replication
Advertisements

Chapter 17~ From Gene to Protein
Biological Motivation Gene Finding
Lecture 4: DNA transcription
JEOPARDY #2 DNA and RNA Chapter 12 S2C06 Jeopardy Review
Finding Eukaryotic Open reading frames.
Section 8.6: Gene Expression and Regulation
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
GENE: RNA polymerases and transcription factors. Structure of genes Prokaryotic and eukaryotic genes differ in their structure, however there are a number.
Day 2! Chapter 15 Eukaryotic Gene Regulation Almost all the cells in an organism are genetically identical. Differences between cell types result from.
Transcription Transcription- synthesis of RNA from only one strand of a double stranded DNA helix DNA  RNA(  Protein) Why is RNA an intermediate????
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Introns and Exons DNA is interrupted by short sequences that are not in the final mRNA Called introns Exons = RNA kept in the final sequence.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
Gene structure in prokaryotes * In prokaryotic cells such as bacteria, genes are usually found grouped together in operons. * The operon is a cluster of.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
Genetics: Chapter 7. What is genetics? The science of heredity; includes the study of genes, how they carry information, how they are replicated, how.
Part Transcription 1 Transcription 2 Translation.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Genomics: Gene prediction and Annotations Kishor K. Shende Information Officer Bioinformatics Center, Barkatullah University Bhopal.
1 Gene expression Transcription and Translation. 2 1.Important Features: Eukaryotic cells a. DNA contains genetic template for proteins. b. DNA is found.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop January 31, 2012.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
Protein Synthesis. Transcription DNA  mRNA Occurs in the nucleus Translation mRNA  tRNA  AA Occurs at the ribosome.
6D Gene expression the process by which the heritable information in a gene, the sequence of DNA base pairs, is made into a functional gene product, such.
Section 2 CHAPTER 10. PROTEIN SYNTHESIS IN PROKARYOTES Both prokaryotic and eukaryotic cells are able to regulate which genes are expressed and which.
Copyright © 2009 Pearson Education, Inc. Chapter 14 The Genetic Code and Transcription Copyright © 2009 Pearson Education, Inc.
From Genomes to Genes Rui Alves.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Prokaryotic cells turn genes on and off by controlling transcription.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN SYNTHESIS HOW GENES ARE EXPRESSED. BEADLE AND TATUM-1930’S One Gene-One Enzyme Hypothesis.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
From Gene to Protein n ie: Transcription & Translation.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Regulation of Gene Expression
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Structure and Regulation. Gene Expression The expression of genetic information is one of the fundamental activities of all cells. Instruction stored.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Transcription Gene Expression Part 1 Gene Expression There are 4 major events that occur durin the process of gene expression –Transcription –RNA processing.
bacteria and eukaryotes
Eukaryotic Gene Structure
Transcription.
Transcription and Translation
Chapter 10 How Proteins are Made.
Prokaryotic cells turn genes on and off by controlling transcription.
Gene Structure and Identification
Prokaryotic cells turn genes on and off by controlling transcription.
Recitation 7 2/4/09 PSSMs+Gene finding
Gene Structure and Identification
7.2 Transcription & Gene Expression
Prokaryotic cells turn genes on and off by controlling transcription.
Prokaryotic cells turn genes on and off by controlling transcription.
credit: modification of work by NIH
From gene to protein.
Prokaryotic cells turn genes on and off by controlling transcription.
Prokaryotic cells turn genes on and off by controlling transcription.
Gene Structure.
Gene Structure.
Presentation transcript:

BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) Website:

Genes Protein coding genesgenes –ORF –Regulatory signals Depend on organism RNA genes –rRNA –tRNA –snRNA, others…

Prokaryotic Gene Expression PromoterCistron1Cistron2CistronNTerminator TranscriptionRNA Polymerase mRNA 5’3’ Translation Ribosome, tRNAs, Protein Factors 12N Polypeptides N C N C N C 123

Eukaryotic Gene Expression PromoterTranscribed RegionTerminator TranscriptionRNA Polymerase II Primary transcript 5’ 3’ Translation Polypeptide N C Enhancer Exon1Exon2 Intron1 Cap Splice Cleave/Polyadenylate 7m GAnAn AnAn Transport

Gene Finding Comparative –Compare your sequence to what is already known – BLASTN, BLASTX Predictive: Stitch together a consensus –HMM, GRAIL… –Frames, Testcode –Findpatterns … Empirical approach –cDNA OR protein OR genetic evidence

ORF Characteristics Primary characters –Start Codon – ( ATG ) –Stop Coden - (TAA, TAG, TGA) Secondary characters –Codon bias –Biased nucleotide distribution

ORF finding tools GCG –Frames, Map VectorNTI –ORF WWW tools –ORF Finder (NCBI) –…

Vector NTI - ORF ORFs of the lac operon GI:

Statistical analysis as a means to find genes ORF example Codon Bias Fickett’s Statistic

Codon Bias Genetic code degenerate Codon usage varies –organism to organism –gene to gene high bias correlates with high level expression bias correlates with tRNA isoacceptors Change bias or tRNAs, change expression

Codon Bias GAL4 ADH1 Gly GGG Gly GGA Gly GGT Gly GGC Gene Differences GCG: CodonFrequency

Codon Bias Organism Differences PcMl

Codon Bias Calculation frequency/synonymous family frequency Pref = frequency in random/Family frequency in random Bias >1 in CORRECT frame Bias < 1 in Incorrect frame

Codon-Biased Gene Ribosomal Protein S2, Ef-Ts Frame 2 Frame 3 rpsB tsf

Fickett’s Statistic rpsB tsf -analyzes the local nonrandomness at every third base in the sequence in a frame-independent fashion. -does not use codon frequency statistics

Error-rich DNA Fickett’s Normal Corrupted 1% substitution 2 indels

ORF Found, Now What? Find ORFs is the biggest target, but easiest to find Find Promoter elements –Should be upstream of 5’-most ORF Remember, one promoter can regulate expression of multiple cistrons –May have ambiguous sequence Find Ribosome Binding Site(s) and Start Codon(s) –1 WITHIN each ORF (cistron) near 5’ end –RBS is close to (~5-10nt) and upstream of the start codon P

More complex signals/regulatory elements More genes Combinatorial regulation common Introns/exons ORF Found, Now What?

Eukaryotic Gene Complexity Yeast –introns rare –promoters adjacent –genome dense

Eukaryotes, cont’d “higher” Eukaryotes –introns common, LONGER than exons –Promoter/enhancer –genome sparse Fungi –introns common, short relative to exons –promoter/enhancer –genome dense

Fungi and “higher” eukaryotes Sew together exons –ORF regions –consensus sequences –domain/polypeptide matches

Exon/Intron Structure CCACATTgt n(30-10,000) a n(5-20) agCAGAA...CCACATTCAGAA ProHisSerGlu...

Alternative Splice CCACATTgtn(30-10,000)an(5-20)agcagAA...CCACATTAA......ProHisSTOP

How do we know what sequences to look for? Promoter sites Intron/Exon Transcription Termination/PolyA Translation initiation

Finding Functional Sequences Known Consensus Sequences Consensus Sequence Generation –Position Weight Matrices –Sequence Logos –Hidden Markov Models Functional Tests

Gene finding Tools-WWW GRAIL II: integrated gene parsing GenLang GENIE HMMGene GENESCAN GENEMARK

GLIMMER for gene-finding in bacteria (

YOU are the best universal gene finder… You understand the “rules” –ORF, Promoter, RBS –Organism specific You understand relationships/sequences –5’ to 3’ You are a good sequence finder –search patterns You can resolve ambiguities EXPERIENCE

Exercise ORF analysis using Vector NTI: Open Vector NTI Retrieve the E. coli lac operon sequence –Find Tools -> Open Link -> GID in the molecular display window –Type in in the Genbank ID required window Do ORF analysis –Find Analysis->ORF in the molecular display window –Use the Default Start & Stop setting Present a figure showing your ORF analysis result and report the start and stop positions and lengths of the ORF's.

Exercise (cont’d) ORF analysis using GeneMark Go to Genmark web site: ark24.cgi ark24.cgi Paste in the lac operon sequence Choose E. coli as the organism Report the start and stop positions and lengths of the predicted ORF's and compare them to those found with the Vector NTI ORF

Assignment #2 Download from Blackboard –Go to “Assignment” page –Open “Assignment #2” –Download the file “Assignment1” Submit to Blackboard –Go to “Assignment” page –Open “Assignment #2” – Submit your answer through Tools->Digital Drop Box Assignment #2 – due March 12