Download presentation
Presentation is loading. Please wait.
Published byMillicent Haynes Modified over 9 years ago
1
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction1 10/21/05 Gene Prediction (formerly Gene Prediction - 3)
2
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction2 Announcements Exam 2 - next Friday Posted online: Exam 2 Study Guide 544 Reading Assignment (2 papers)
3
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction3 Announcements 544 Semester Projects - Information needed: Please send email to me (or David) ddobbs@iastate.edu Briefly describe: Your background & current grad research Is there a problem related to your research you would like to learn more about & develop as project for this course? or What would your ‘dream’ project be?
4
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction4 Announcements 2 Bioinformatics Seminars today (Fri Oct 21) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino “Protein Networks” Bob Jernigan, BBMB & Director,Baker Center for Bioinformatics & Biological Statistics http://www.bcb.iastate.edu/courses/BCB691-F2005.html#Oct%2021 4:10 PM GDCB Special Seminar in 1414 MBB “Integrating the Unknown-eome with Abiotic Stress Response Networks in Arabidopsis” Ron Mittler, Dept. of Biochem & Mol Biology University of Nevada, Reno
5
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction5 Gene Prediction & Regulation Mon- Gene structure review: Eukaryotes vs prokaryotes Wed - Regulatory regions: Promoters & enhancers Fri - Predicting genes - Predicting regulatory regions (?) Next week: Predicting RNA structure (miRNAs, too)
6
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction6 Reading Assignment Mount Bioinformatics Chp 9 Gene Prediction & Regulation pp 361-385 Predicting Promoters Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html http://www.bioinformaticsonline.org/help/errata2.html * Brown Genomes 2 (NCBI textbooks online)NCBI textbooks online) Sect 9 Overview: Assembly of Transcription Initiation Complex http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002 Sect 9.1-9.3 DNA binding proteins, Transcription initiation http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016 * NOTE: Don’t worry about the details!!
7
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction7 Optional Reading Reviews: 1)Zhang MQ (2002) Computational prediction of eukaryotic protein- coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html 2)Wasserman WW & Sandelin (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html
8
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction8 Review last lecture: Gene Regulation (formerly Gene Prediction-2) cDNAs & ESTs UniGene Regulatory regions Eukaryotes vs prokaryotes
9
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction9 DNARNA cDNA Phenotypeprotein [1] Transcription [2] RNA processing (splicing) [3] RNA export [4] RNA surveillance Pevsner p160
10
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction10 UniGene: unique genes via ESTs Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene UniGene clusters contain many ESTs UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution Pevsner p164
11
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction11 Today: Gene Prediction (formerly Gene Prediction - 3) Predicting genes Mon - Predicting regulatory regions Focus on promoters Introduction to RNA Later: Genome browsers
12
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction12 Gene Prediction Overview of steps & strategies What sequence signals can be used? What other types of information can be used? Algorithms HMMs, discriminant functions, neural nets Gene prediction software 3 major types many,many programs!
13
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction13 Predicting Genes - Basic steps: Obtain genomic sequence Translate in all 6 reading frames Compare with protein sequence database Perform database similarity search with EST & cDNA databases, if available Use gene prediction program to locate genes Analyze gene regulatory sequences
14
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction14 Overview of gene prediction strategies What sequence signals can be used? Transcription: TF binding sites, promoter, initiation site, terminator Processing signals: splice donor/acceptors, polyA signal Translation: start (AUG = Met) & stop (UGA,UUA, UAG) ORFs, codon usage What other types of information can be used? cDNAs & ESTs (experimental data,pairwise alignment) homology (sequence comparison, BLAST)
15
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction15 Automated gene prediction strategies 1)Similarity-based or Comparative BLAST - Do other organisms have similar sequence? (Is sequence similar to known gene or protein) 2)Ab initio = “from the beginning” Predict without explicit comparison with cDNA or proteins via “rule-based” gene models - but rules are derived from statistical analysis of datasets 3)Combined "evidence-based" Combine gene models with alignment to known ESTs & protein sequences BEST RESULTS? Combined
16
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction16 Examples of gene prediction software 1)Similarity-based or Comparative BLAST SGP2 (extension of GeneID) 2)Ab initio = “from the beginning” GeneID - (used in lab this week) GENSCAN - (used in lab this week) GeneMark.hmm - (should try this!) 3)Combined "evidence-based” GeneSeqer (Brendel et al., ISU) BEST? GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task
17
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction17 Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Why? Smaller genomes Simpler gene structures More sequenced genomes! (for comparative approaches) Methods? Previously, mostly HMM-based Now: similarity-based methods because so many genomes available
18
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction18 GeneSeqerGeneSeqer - Brendel et al. http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi
19
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction19 Thanks to Volker Brendel, ISU for following Figs & Slides Slightly modified from: BSSI Genome Informatics Module http://www.bioinformatics.iastate.edu/BBSI/course_de sc_2005.html#moduleB V Brendel vbrendel@iastate.eduvbrendel@iastate.edu
20
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction20 GT AG exon intron Splice sites Donor site Acceptor site Signals: Pre-mRNA Splicing Translation Protein Splicing mRNA Cap- -Poly(A) Transcription pre-mRNA Cap--Poly(A) Genomic DNA Start codonStop codon Brendel 2005
21
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction21 Brendel - Spliced Alignment I: Compare with cDNA or EST probes Genomic DNA Start codonStop codon mRNA -Poly(A) Cap- 5’-UTR 3’-UTR Start codonStop codon Brendel 2005
22
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction22 Brendel - Spliced Alignment II: Compare with protein probes Genomic DNA Start codonStop codon Protein Brendel 2005
23
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction23 Brendel Spliced Alignment Algorithm Perform pairwise alignment with large gaps in one sequence (introns) Align genomic DNA with cDNA, EST or protein Score semi-conserved sequences at splice junctions Score coding constraints in translated exons Brendel 2005
24
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction24 8883107104316311GTAG Zea mays 86538611929792472301922929GTAG Arabidopsis thaliana 157163176172221217GTAGAspergillus 119118118122170179GTAG S. pombe 207892062620500203253702936864GTAG C. elegans 5245366706719891001GTAGDrosophila 107103238228288284GTAG Gallus gallus 147140408386450442GTAG Rattus norvegicus 5215041185113912121194GTAG Mus musculus 303729795277519465866555GTAG Home sapiens Number of True Splice Sites / Phase 1 2 3 TypeSpecies Donor (GT) & Acceptor (AG) Sites Used for Model Training Brendel 2005
25
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction25 Information Content I i : Extent of Splice Signal Window: i : i th position in sequence Ī : average information content over all positions i > 20 nt from splice site Ī : average standard deviation of Ī Splice Site Detection Brendel 2005
26
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction26 Human T2_GT Human Fi_AG Human T2_AG Human F1_AG A. thaliana T2_GT A. thaliana F1_AG A. thaliana Fi_AG A. thaliana T2_AG Results? Brendel 2005
27
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction27 Bayesian Splice Site Prediction where H indexes the hypotheses of GT or AG at - True site in reading phase 1, 2, or 0 - False within-exon site in reading phase 1, 2, or 0 - False within-intron site Let S = s -l s -l+1 s -l+2 …s -1 GT s 1 s 2 s 3 …s r Brendel 2005
28
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction28 Bayes Factor as Decision Criterion H 0 : H=T: - 2-class model: - 7 class model: Brendel 2005
29
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction29 in terms of Critical Value c = 2 lnBF Positive evidence for H 0 if 2 c 6 Positive evidence for H 0 if 2 c 6 Strong support for H 0 if 6 c 10 Strong support for H 0 if 6 c 10 Very strong support for H 0 if c > 10 Very strong support for H 0 if c > 10 Interpretation of Bayes Factor Brendel 2005
30
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction30 Evaluation of Splice Site Prediction Sensitivity: Normalized specificity: Specificity: Misclassification rates: = Coverage Actual TrueFalse PP=TP+FP PN=FN+TN AP=TP+FNAN=FP+TN Predicted True False TNFN FPTP Brendel 2005
31
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction31 48.1 73.2 91.0 41.9 62.0 81.2 93.2 97.6 99.3 92.3 96.4 98.6 99.5 95.6 87.1 99.2 96.4 87.1 036036036036 9027 10196 613 614 GT AG 7C A. thaliana 40.4 64.3 85.4 58.2 76.9 88.5 92.7 97.1 99.1 97.2 98.8 99.5 97.8 94.2 84.8 98.8 96.2 90.2 036036036036 7460 10132 400 GT AG 7C C. elegans 34.1 53.6 75.0 28.7 41.4 59.4 94.8 97.6 99.1 94.8 97.0 98.5 95.4 90.0 83.9 95.7 92.1 85.1 036036036036 11501 14920 329 GT AG 2C Drosophila 16.4 34.8 57.6 9.7 15.7 25.6 90.5 96.3 98.5 88.4 92.9 96.1 98.5 91.7 66.3 96.3 90.3 76.1 036036036036 44411 65103 921 920 GT AG 2C Homo sapiens Sp (%) (%) Sn (%) Bayes Factor Test Site Set True False SiteModelSpecies Brendel 2005
32
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction32 Sn Human GT site Human AG site Sn C. elegans GT site C. elegans AG site Sn A. thaliana AG site A. thaliana GT site Brendel 2005 Performance?
33
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction33 enen e n+1 inin i n+1 PGPG P A(n) P G (1-P G )P D(n+1) (1-P G )(1-P D(n+1) ) 1-P A(n) PGPG Markov Model for Spliced Alignment Brendel 2005
34
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction34 Performance vs other methods Comparison with ab initio gene prediction programs? Depends on: Availability of ESTs Availability of protein homologs Brendel 2005
35
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction35 Target protein alignment score 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0102030405060708090100 Exon (Sn + Sp) / 2 GeneSeqer NAP GENSCAN Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs NAP vs GENSCAN (Exon prediction)
36
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction36 Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs NAP vs GENSCAN (Intron prediction)
37
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction37 GeneSeqer Genomic Sequence EST or protein database (Suffix Array/ Suffix Tree) Fast Search Spliced Alignment Output Assembly Brendel 2005
38
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction38 Brendel 2005
39
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction39 Brendel 2005
40
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction40 Brendel 2005
41
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction41 Brendel 2005
42
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction42 Brendel 2005
43
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction43 Brendel 2005
44
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction44 Gene Structure Annotation - Problems False positive intergenic region: 2 annotated genes actually correspond to a single gene False negative intergenic region: One annotated gene structure actually contains 2 genes False negative gene prediction: Missing gene (no annotation) Other: partially incorrect gene annotation missing annotation of alternative transcripts Brendel 2005
45
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction45 Brendel 2005
46
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction46 Other Resources Current Protocols in Bioinformatics http://www.4ulr.com/products/currentprotocols/bioinformatics.html Finding Genes 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations 4.2 Using MZEF To Find Internal Coding Exons 4.3 Using GENEID to Identify Genes 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.