10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction1 10/21/05 Gene Prediction (formerly Gene Prediction - 3)

Slides:



Advertisements
Similar presentations
An Introduction to Bioinformatics Finding genes in prokaryotes.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Ab initio gene prediction Genome 559, Winter 2011.
10/26/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction (really!)1 10/26/05 Promoter Prediction (really!)
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Finding Eukaryotic Open reading frames.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Introduction to BioInformatics GCB/CIS535
Gene Finding Charles Yan.
Comparative ab initio prediction of gene structures using pair HMMs
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Biological Motivation Gene Finding in Eukaryotic Genomes
Finding prokaryotic genes and non intronic eukaryotic genes
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Gene Structure and Identification
Fine Structure and Analysis of Eukaryotic Genes
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
10/19/05 D Dobbs ISU - BCB 444/544X: Gene Regulation1 10/19/05 Gene Regulation (formerly Gene Prediction - 2)
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Genomics: Gene prediction and Annotations Kishor K. Shende Information Officer Bioinformatics Center, Barkatullah University Bhopal.
Sequence & course material repository Annotation (sequences & evidence) Manuals (DNA, Subway, Apollo, JalView) Presentations.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
10/17/05 D Dobbs ISU - BCB 444/544X: Genes & Genomes1 10/17/05 Genes & Genomes (formerly Gene Prediction - 1)
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Genome Annotation Rosana O. Babu.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction1 10/24/05 Promoter Prediction RNA Structure & Function Prediction.
110/24/07BCB 444/544 F07 ISU Dobbs #27 - Gene Prediction II BCB 444/544 Lecture 27 Gene Prediction II #27_Oct24.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
110/29/07BCB 444/544 F07 ISU Dobbs #28- Promoter Prediction BCB 444/544 Lecture 28 Gene Prediction - finish it Promoter Prediction #28_Oct29.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
Research about Alternative Splicing recently 楊佳熒.
110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction BCB 444/544 Lecture 26 Gene Prediction #26_Oct22.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Applied Bioinformatics
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.
Definitions of Annotation Interpreting raw sequence data into useful biological information Information attached to genomic coordinates with start and.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Genes, Genomes, and Genomics
PlantGDB: Annotation Principles & Procedures
Eukaryotic Gene Finding
Ab initio gene prediction
Introduction to Bioinformatics II
Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang
Introduction to Alternative Splicing and my research report
Presentation transcript:

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction1 10/21/05 Gene Prediction (formerly Gene Prediction - 3)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction2 Announcements Exam 2 - next Friday Posted online: Exam 2 Study Guide 544 Reading Assignment (2 papers)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction3 Announcements 544 Semester Projects - Information needed: Please send to me (or David) Briefly describe: Your background & current grad research Is there a problem related to your research you would like to learn more about & develop as project for this course? or What would your ‘dream’ project be?

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction4 Announcements 2 Bioinformatics Seminars today (Fri Oct 21) 12:10 PM BCB Faculty Seminar in E164 Lagomarcino “Protein Networks” Bob Jernigan, BBMB & Director,Baker Center for Bioinformatics & Biological Statistics 4:10 PM GDCB Special Seminar in 1414 MBB “Integrating the Unknown-eome with Abiotic Stress Response Networks in Arabidopsis” Ron Mittler, Dept. of Biochem & Mol Biology University of Nevada, Reno

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction5 Gene Prediction & Regulation Mon- Gene structure review: Eukaryotes vs prokaryotes Wed - Regulatory regions: Promoters & enhancers Fri - Predicting genes - Predicting regulatory regions (?) Next week: Predicting RNA structure (miRNAs, too)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction6 Reading Assignment Mount Bioinformatics Chp 9 Gene Prediction & Regulation pp Predicting Promoters Ck Errata: * Brown Genomes 2 (NCBI textbooks online)NCBI textbooks online) Sect 9 Overview: Assembly of Transcription Initiation Complex Sect DNA binding proteins, Transcription initiation * NOTE: Don’t worry about the details!!

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction7 Optional Reading Reviews: 1)Zhang MQ (2002) Computational prediction of eukaryotic protein- coding genes. Nat Rev Genet 3: )Wasserman WW & Sandelin (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5:

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction8 Review last lecture: Gene Regulation (formerly Gene Prediction-2) cDNAs & ESTs UniGene Regulatory regions Eukaryotes vs prokaryotes

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction9 DNARNA cDNA Phenotypeprotein [1] Transcription [2] RNA processing (splicing) [3] RNA export [4] RNA surveillance Pevsner p160

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction10 UniGene: unique genes via ESTs Find UniGene at NCBI: UniGene clusters contain many ESTs UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution Pevsner p164

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction11 Today: Gene Prediction (formerly Gene Prediction - 3) Predicting genes Mon - Predicting regulatory regions Focus on promoters Introduction to RNA Later: Genome browsers

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction12 Gene Prediction Overview of steps & strategies What sequence signals can be used? What other types of information can be used? Algorithms HMMs, discriminant functions, neural nets Gene prediction software 3 major types many,many programs!

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction13 Predicting Genes - Basic steps: Obtain genomic sequence Translate in all 6 reading frames Compare with protein sequence database Perform database similarity search with EST & cDNA databases, if available Use gene prediction program to locate genes Analyze gene regulatory sequences

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction14 Overview of gene prediction strategies What sequence signals can be used? Transcription: TF binding sites, promoter, initiation site, terminator Processing signals: splice donor/acceptors, polyA signal Translation: start (AUG = Met) & stop (UGA,UUA, UAG) ORFs, codon usage What other types of information can be used? cDNAs & ESTs (experimental data,pairwise alignment) homology (sequence comparison, BLAST)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction15 Automated gene prediction strategies 1)Similarity-based or Comparative BLAST - Do other organisms have similar sequence? (Is sequence similar to known gene or protein) 2)Ab initio = “from the beginning” Predict without explicit comparison with cDNA or proteins via “rule-based” gene models - but rules are derived from statistical analysis of datasets 3)Combined "evidence-based" Combine gene models with alignment to known ESTs & protein sequences BEST RESULTS? Combined

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction16 Examples of gene prediction software 1)Similarity-based or Comparative BLAST SGP2 (extension of GeneID) 2)Ab initio = “from the beginning” GeneID - (used in lab this week) GENSCAN - (used in lab this week) GeneMark.hmm - (should try this!) 3)Combined "evidence-based” GeneSeqer (Brendel et al., ISU) BEST? GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction17 Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Why? Smaller genomes Simpler gene structures More sequenced genomes! (for comparative approaches) Methods? Previously, mostly HMM-based Now: similarity-based methods because so many genomes available

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction18 GeneSeqerGeneSeqer - Brendel et al.

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction19 Thanks to Volker Brendel, ISU for following Figs & Slides Slightly modified from: BSSI Genome Informatics Module sc_2005.html#moduleB V Brendel

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction20 GT AG exon intron Splice sites Donor site Acceptor site Signals: Pre-mRNA Splicing Translation Protein Splicing mRNA Cap- -Poly(A) Transcription pre-mRNA Cap--Poly(A) Genomic DNA Start codonStop codon Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction21 Brendel - Spliced Alignment I: Compare with cDNA or EST probes Genomic DNA Start codonStop codon mRNA -Poly(A) Cap- 5’-UTR 3’-UTR Start codonStop codon Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction22 Brendel - Spliced Alignment II: Compare with protein probes Genomic DNA Start codonStop codon Protein Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction23 Brendel Spliced Alignment Algorithm Perform pairwise alignment with large gaps in one sequence (introns) Align genomic DNA with cDNA, EST or protein Score semi-conserved sequences at splice junctions Score coding constraints in translated exons Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction GTAG Zea mays GTAG Arabidopsis thaliana GTAGAspergillus GTAG S. pombe GTAG C. elegans GTAGDrosophila GTAG Gallus gallus GTAG Rattus norvegicus GTAG Mus musculus GTAG Home sapiens Number of True Splice Sites / Phase TypeSpecies Donor (GT) & Acceptor (AG) Sites Used for Model Training Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction25 Information Content I i : Extent of Splice Signal Window: i : i th position in sequence Ī : average information content over all positions i > 20 nt from splice site  Ī : average standard deviation of Ī Splice Site Detection Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction26 Human T2_GT Human Fi_AG Human T2_AG Human F1_AG A. thaliana T2_GT A. thaliana F1_AG A. thaliana Fi_AG A. thaliana T2_AG Results? Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction27 Bayesian Splice Site Prediction where H indexes the hypotheses of GT or AG at - True site in reading phase 1, 2, or 0 - False within-exon site in reading phase 1, 2, or 0 - False within-intron site Let S = s -l s -l+1 s -l+2 …s -1 GT s 1 s 2 s 3 …s r Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction28 Bayes Factor as Decision Criterion H 0 : H=T: - 2-class model: - 7 class model: Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction29 in terms of Critical Value c = 2 lnBF Positive evidence for H 0 if 2  c  6 Positive evidence for H 0 if 2  c  6 Strong support for H 0 if 6  c  10 Strong support for H 0 if 6  c  10 Very strong support for H 0 if c > 10 Very strong support for H 0 if c > 10 Interpretation of Bayes Factor Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction30 Evaluation of Splice Site Prediction Sensitivity: Normalized specificity: Specificity: Misclassification rates: = Coverage Actual TrueFalse PP=TP+FP PN=FN+TN AP=TP+FNAN=FP+TN Predicted True False TNFN FPTP Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction GT AG 7C A. thaliana GT AG 7C C. elegans GT AG 2C Drosophila GT AG 2C Homo sapiens Sp (%)  (%) Sn (%) Bayes Factor Test Site Set True False SiteModelSpecies Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction32   Sn Human GT site Human AG site Sn C. elegans GT site C. elegans AG site Sn A. thaliana AG site A. thaliana GT site     Brendel 2005 Performance?

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction33 enen e n+1 inin i n+1 PGPG P A(n) P  G (1-P  G )P D(n+1) (1-P  G )(1-P D(n+1) ) 1-P A(n) PGPG Markov Model for Spliced Alignment Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction34 Performance vs other methods Comparison with ab initio gene prediction programs? Depends on: Availability of ESTs Availability of protein homologs Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction35 Target protein alignment score Exon (Sn + Sp) / 2 GeneSeqer NAP GENSCAN Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs NAP vs GENSCAN (Exon prediction)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction36 Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs NAP vs GENSCAN (Intron prediction)

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction37 GeneSeqer Genomic Sequence EST or protein database (Suffix Array/ Suffix Tree) Fast Search Spliced Alignment Output Assembly Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction38 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction39 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction40 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction41 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction42 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction43 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction44 Gene Structure Annotation - Problems False positive intergenic region: 2 annotated genes actually correspond to a single gene False negative intergenic region: One annotated gene structure actually contains 2 genes False negative gene prediction: Missing gene (no annotation) Other: partially incorrect gene annotation missing annotation of alternative transcripts Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction45 Brendel 2005

10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction46 Other Resources Current Protocols in Bioinformatics Finding Genes 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations 4.2 Using MZEF To Find Internal Coding Exons 4.3 Using GENEID to Identify Genes 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences