Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:

Similar presentations


Presentation on theme: "BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:"— Presentation transcript:

1 BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: http://biocore.unl.eduglu3@unl.eduhttp://biocore.unl.edu

2 Genes Protein coding genesgenes –ORF –Regulatory signals Depend on organism RNA genes –rRNA –tRNA –snRNA, others…

3 Prokaryotic Gene Expression PromoterCistron1Cistron2CistronNTerminator TranscriptionRNA Polymerase mRNA 5’3’ Translation Ribosome, tRNAs, Protein Factors 12N Polypeptides N C N C N C 123

4 Eukaryotic Gene Expression PromoterTranscribed RegionTerminator TranscriptionRNA Polymerase II Primary transcript 5’ 3’ Translation Polypeptide N C Enhancer Exon1Exon2 Intron1 Cap Splice Cleave/Polyadenylate 7m GAnAn AnAn Transport

5 Gene Finding Comparative –Compare your sequence to what is already known – BLASTN, BLASTX Predictive: Stitch together a consensus –HMM, GRAIL… –Frames, Testcode –Findpatterns … Empirical approach –cDNA OR protein OR genetic evidence

6 ORF Characteristics Primary characters –Start Codon – ( ATG ) –Stop Coden - (TAA, TAG, TGA) Secondary characters –Codon bias –Biased nucleotide distribution

7 ORF finding tools GCG –Frames, Map VectorNTI –ORF WWW tools –ORF Finder (NCBI) –…

8 Vector NTI - ORF ORFs of the lac operon GI: 146575

9 Statistical analysis as a means to find genes ORF example Codon Bias Fickett’s Statistic

10 Codon Bias Genetic code degenerate Codon usage varies –organism to organism –gene to gene high bias correlates with high level expression bias correlates with tRNA isoacceptors Change bias or tRNAs, change expression

11 Codon Bias GAL4 ADH1 Gly GGG 0.21 0 Gly GGA 0.17 0 Gly GGT 0.38 0.93 Gly GGC 0.24 0.07 Gene Differences GCG: CodonFrequency

12 Codon Bias Organism Differences PcMl

13 Codon Bias Calculation frequency/synonymous family frequency Pref = frequency in random/Family frequency in random Bias >1 in CORRECT frame Bias < 1 in Incorrect frame

14 Codon-Biased Gene Ribosomal Protein S2, Ef-Ts Frame 2 Frame 3 rpsB tsf

15 Fickett’s Statistic rpsB tsf -analyzes the local nonrandomness at every third base in the sequence in a frame-independent fashion. -does not use codon frequency statistics

16 Error-rich DNA Fickett’s Normal Corrupted 1% substitution 2 indels

17 ORF Found, Now What? Find ORFs is the biggest target, but easiest to find Find Promoter elements –Should be upstream of 5’-most ORF Remember, one promoter can regulate expression of multiple cistrons –May have ambiguous sequence Find Ribosome Binding Site(s) and Start Codon(s) –1 WITHIN each ORF (cistron) near 5’ end –RBS is close to (~5-10nt) and upstream of the start codon P

18 More complex signals/regulatory elements More genes Combinatorial regulation common Introns/exons ORF Found, Now What?

19 Eukaryotic Gene Complexity Yeast –introns rare –promoters adjacent –genome dense

20 Eukaryotes, cont’d “higher” Eukaryotes –introns common, LONGER than exons –Promoter/enhancer –genome sparse Fungi –introns common, short relative to exons –promoter/enhancer –genome dense

21 Fungi and “higher” eukaryotes Sew together exons –ORF regions –consensus sequences –domain/polypeptide matches

22 Exon/Intron Structure CCACATTgt n(30-10,000) a n(5-20) agCAGAA...CCACATTCAGAA...... ProHisSerGlu...

23 Alternative Splice CCACATTgtn(30-10,000)an(5-20)agcagAA...CCACATTAA......ProHisSTOP

24 How do we know what sequences to look for? Promoter sites Intron/Exon Transcription Termination/PolyA Translation initiation

25 Finding Functional Sequences Known Consensus Sequences Consensus Sequence Generation –Position Weight Matrices –Sequence Logos –Hidden Markov Models Functional Tests

26 Gene finding Tools-WWW GRAIL II: integrated gene parsing GenLang GENIE HMMGene GENESCAN GENEMARK

27 GLIMMER for gene-finding in bacteria (www.tigr.org)www.tigr.org

28 YOU are the best universal gene finder… You understand the “rules” –ORF, Promoter, RBS –Organism specific You understand relationships/sequences –5’ to 3’ You are a good sequence finder –search patterns You can resolve ambiguities EXPERIENCE

29 Exercise ORF analysis using Vector NTI: Open Vector NTI Retrieve the E. coli lac operon sequence –Find Tools -> Open Link -> GID in the molecular display window –Type in 146575 in the Genbank ID required window Do ORF analysis –Find Analysis->ORF in the molecular display window –Use the Default Start & Stop setting Present a figure showing your ORF analysis result and report the start and stop positions and lengths of the ORF's.

30 Exercise (cont’d) ORF analysis using GeneMark Go to Genmark web site: http://opal.biology.gatech.edu/GeneMark/genem ark24.cgi http://opal.biology.gatech.edu/GeneMark/genem ark24.cgi Paste in the lac operon sequence Choose E. coli as the organism Report the start and stop positions and lengths of the predicted ORF's and compare them to those found with the Vector NTI ORF

31 Assignment #2 Download from Blackboard –Go to “Assignment” page –Open “Assignment #2” –Download the file “Assignment1” Submit to Blackboard –Go to “Assignment” page –Open “Assignment #2” – Submit your answer through Tools->Digital Drop Box Assignment #2 – due March 12


Download ppt "BIOS816/VBMS818 Lecture 7 – Gene Prediction Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:"

Similar presentations


Ads by Google