110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction BCB 444/544 Lecture 26 Gene Prediction #26_Oct22.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Ab initio gene prediction Genome 559, Winter 2011.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
Finding Eukaryotic Open reading frames.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The Molecular Genetics of Gene Expression
Introduction to BioInformatics GCB/CIS535
Gene Finding Charles Yan.
CSE182-L12 Gene Finding.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Eukaryotic Gene Finding
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Biological Motivation Gene Finding in Eukaryotic Genomes
Gene Structure and Identification
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
10/19/05 D Dobbs ISU - BCB 444/544X: Gene Regulation1 10/19/05 Gene Regulation (formerly Gene Prediction - 2)
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Posttranscriptional Modification
10/21/05 D Dobbs ISU - BCB 444/544X: Gene Prediction1 10/21/05 Gene Prediction (formerly Gene Prediction - 3)
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Chapter 10 Transcription RNA processing Translation Jones and Bartlett Publishers © 2005.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Genomics: Gene prediction and Annotations Kishor K. Shende Information Officer Bioinformatics Center, Barkatullah University Bhopal.
Genome Annotation Rosana O. Babu.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction1 10/24/05 Promoter Prediction RNA Structure & Function Prediction.
110/24/07BCB 444/544 F07 ISU Dobbs #27 - Gene Prediction II BCB 444/544 Lecture 27 Gene Prediction II #27_Oct24.
Mark D. Adams Dept. of Genetics 9/10/04
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
11/04/05 D Dobbs ISU - BCB 444/544X: Protein Structure & Function1 11/4/05 Protein Structure & Function.
110/19/07BCB 444/544 F07 ISU Dobbs #25 - More RNA Structure & BCB 544 Projects BCB 444/544 Lecture 25  More RNA Structure  BCB 544 Projects #25_Oct19.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Motif Search and RNA Structure Prediction Lesson 9.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
Unit 1: DNA and the Genome Structure and function of RNA.
9/24/07BCB 444/544 F07 ISU Dobbs #14 - Review: Nucleus, Chromosomes, Genes, RNA, Protein1 BCB 444/544 Lecture 14 Review: Nucleus, Chromosomes, Genes, RNA,
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Fig Prokaryotes and Eukaryotes
Predicting RNA Structure and Function
Eukaryotic Gene Finding
Introduction to Bioinformatics II
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
#30 - Phylogenetics Distance-Based Methods
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Genome Annotation and the Human Genome
Gene Structure.
Gene Structure.
Presentation transcript:

110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction BCB 444/544 Lecture 26 Gene Prediction #26_Oct22

210/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Mon Oct 22 - Lecture 26 Gene Prediction Chp 8 - pp Wed Oct 24 - Lecture 27 (will not be covered on Exam 2) Regulatory Element Prediction Chp 9 - pp Thurs Oct 25 - Review Session & Project Planning Fri Oct 26 - EXAM 2 Required Reading (before lecture)

310/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Assignments & Announcements Sun Oct 21 - Study Guide for Exam 2 was posted Mon Oct 22 - HW#4 Due (no "correct" answer to post) Thu Oct 25 - Lab = Optional Review Session for Exam 544 Project Planning/Consult with DD & MT Fri Oct 26 - Exam 2 - Will cover: Lectures (thru Mon Sept 17) Labs 5-8 HW# 3 & 4 All assigned reading: Chps 6 (beginning with HMMs), 7-8, Eddy: What is an HMM Ginalski: Practical Lessons…

410/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction BCB 544 "Team" Projects 544 Extra HW#2 is next step in Team Projects Write ~ 1 page outline Schedule meeting with Michael & Drena to discuss topic Read a few papers Write a more detailed plan You may work alone if you prefer Last week of classes will be devoted to Projects Written reports due: Mon Dec 3 (no class that day) Oral presentations (15-20') will be: Wed-Fri Dec 5,6,7 1 or 2 teams will present during each class period  See Guidelines for Projects posted online

510/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction BCB 544 Only: New Homework Assignment 544 Extra#2 (posted online Thurs?) No - sorry! sent by on Sat… Due: PART 1 - ASAP PART 2 - Fri Nov 2 by 5 PM Part 1 - Brief outline of Project, to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas

610/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: Oct 25 Thur - BBMB Seminar 4:10 in 1414 MBB Dave Segal UC Davis Zinc Finger Protein Design Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI Guang Song ComS, ISU Probing functional mechanisms by structure-based modeling and simulations

710/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) RNA Function Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation

810/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Fig 6.2 Baxevanis & Ouellette 2005 Covalent & non-covalent bonds in RNA Primary: Covalent bonds Secondary/Tertiary Non-covalent bonds H-bonds (base-pairing) Base stacking This is a new slide

910/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction RNA Pseudoknots & Tetraloops huang/QD/mckay_hr.gif This is a new slide Review/Annual-Reports/1995/images/rna.gif Often have important regulatory or catalytic functions PseudoknotTetraloop

1010/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Base Pairing in RNA G-C, A-U, G-U ("wobble") & many variants See: IMB Image Library of Biological MoleculesIMB Image Library of Biological Molecules This slide has been changed

1110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction RNA Secondary Structure Prediction Methods Two (three, recently) main types of methods: 1.Ab initio - based on calculating most energetically favorable secondary structure(s) Energy minimization (thermodynamics) 2.Comparative approach - based on comparisons of multiple evolutionarily-related RNA sequences Sequence comparison (co-variation) 3.Combined computational & experimental Use experimental constraints when available This slide has been changed

1210/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction RNA Secondary structure prediction - 3 3) Combined experimental & computational Experiments: Map single-stranded vs double- stranded regions in folded RNA How? Enzymes: S1 nuclease, T1 RNase Chemicals: kethoxal, DMS, OH  Software: Mfold Sfold RNAStructure RNAFold RNAlifold This is a new slide Kethoxal modification (mild) (strong) DMS modification (mild) (strong) G DMS

1310/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Ab Initio Prediction: Clarifications Free energy is calculated based on parameters determined in the wet lab Correction: Use known energy associated with each type of nearest-neighbor pair (base-stacking) (not base-pair) Base-pair formation is not independent: multiple base-pairs adjacent to each other are more favorable than individual base-pairs - cooperative - because of base-stacking interactions Bulges and loops adjacent to base-pairs have a free energy penalty This slide has been changed

1410/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction A U A=U Basepair  G = -1.2 kcal/mole A U U A A=U U=A  G = -1.6 kcal/mole Basepair What gives here? C Staben 2005 Energy minimization: What are the rules? This is a new slide

1510/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Energy minimization calculations: Base-stacking is critical - Tinocco et al. C Staben 2005 This is a new slide

1610/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Ab Initio Energy Calculation Search for all possible base-pairing patterns Calculate total energy of each structure based on all stabilizing and destabilizing forces Fig 6.3 Baxevanis & Ouellette 2005 Total free energy for a specific RNA conformation = Sum of incremental energy terms for: helical stacking (sequence dependent) loop initiation unpaired stacking (favorable "increments" are < 0) This slide has been changed

1710/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Dynamic Programming Finding optimal secondary structure is difficult - lots of possibilities Compare RNA sequence with itself Apply scoring scheme based on energy parameters for base stacking, cooperativity, and penalties for destabilizing forces (loops, bulges) Find path that represents most energetically favorable secondary structure This slide has been changed

1810/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 3 - Popular Programs that use Combined Computational Experimental Approaches Mfold Sfold RNAStructure RNAFold RNAlifold

1910/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction SL X SL Y SL Z SL Y SL Z SL X SL Y SL Z SL X SL Y SL Z SL X Mfold kcal/mol RNAstructure kcal/molRNAfold kcal/mol Sfold kcal/mol Comparison of Predictions for Single RNA using Different Methods JH Lee 2007

2010/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Mfold plus constraints kcal/mol Mfold kcal/mol Comparison of Mfold Predictions: -/+ Constraints JH Lee 2007

2110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Performance Evaluation Ab initio methods? correlation coefficient = 20-60% Comparative approaches? correlation coefficient = % Programs that require user to supply MSA are more accurate Comparative programs are consistently more accurate than ab initio Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace BEST APPROACH? Methods that combine computational prediction (ab initio & comparative) with experimental constraints (from chemical/enzymatic modification studies) This slide has been changed

2210/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Chp 8 - Gene Prediction SECTION III GENE AND PROMOTER PREDICTION Xiong: Chp 8 Gene Prediction Categories of Gene Prediction Programs Gene Prediction in Prokaryotes Gene Prediction in Eukaryotes

2310/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction What is a gene? segment of DNA, some of which is "structural," i.e., transcribed to give a functional RNA product, & some of which is "regulatory" Genes can encode: mRNA (for protein) other types of RNA (tRNA, rRNA, miRNA, etc.) Genes differ in eukaryotes vs prokaryotes (& archaea), both structure & regulation What is a Gene?

2410/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Gene Finding Problem: Given a new genomic DNA sequence, identify coding regions and their predicted RNA and protein sequences ATTACCATGGGGCAGGGTCAGATATAATGCCCTCATTTT Steps: 1.Search against protein / EST database 2.Apply gene prediction programs (many programs available) 3.Analyze regulatory regions

2510/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Gene Prediction in Prokaryotes vs Eukaryotes Prokaryotes Small genomes ·10 6 bp About 90% of genome is coding Simple gene structure Prediction success ~99% Eukaryotes Large genomes 10 7 – bp Often less than 2% coding Complicated gene structure (splicing, long exons) Prediction success % ATGTAA Promotor Open reading frame (ORF) Start codonStop codon Promotor 5’ UTR ExonsIntrons 3’ UTR ATGTAA Splice sites

2610/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction DNA "Signals" Used by Gene Finding Algorithms 1.Exploit the regular gene structure ATG—Exon1—Intron1—Exon2—…—ExonN—STOP 2.Recognize “coding bias” CAG-CGA-GAC-TAT-TTA-GAT-AAC-ACA-CAT-GAA-… 3.Recognize splice sites Intron—cAGt—Exon—gGTgag—Intron 4.Model the duration of regions Introns tend to be much longer than exons, in mammals Exons are biased to have a given minimum length 5.Use cross-species comparison Gene structure is conserved in mammals Exons are more similar (~85%) than introns

2710/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Computational Gene Finding Approaches Ab initio methods Search by signal: find DNA sequences involved in gene expression. Search by content: Test statistical properties distinguishing coding from non-coding DNA Similarity based methods Database search: exploit similarity to proteins, ESTs, and cDNAs Comparative genomics: exploit aligned genomes Do other organisms have similar sequence? Hybrid methods - best

2810/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Examples of Gene Prediction Software  Ab initio  Genscan, GeneMark.hmm, Genie, GeneID…  Similarity-based  BLAST, Procrustes…  Hybrids  GeneSeqer, GenomeScan, GenieEST, Twinscan, SGP, ROSETTA, CEM, TBLASTX, SLAM.  BEST? Ab initio - Genescan (according to some assessments) Hybrid - GeneSeqer But depends on organism & specific task Lists of Gene Prediction Software

2910/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Synthesis & Processing of Eukaryotic mRNA exon 1exon 2exon 3intron Transcription Splicing (remove introns) Capping & polyadenylation Export to cytoplasm AAAAA 3’5’ 3’ 5’3’ 7Me G m 1' transcript (RNA) Mature mRNA DN Gene in DNA

3010/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction What are cDNAs & ESTs? cDNA libraries are important for determining gene structure & studying regulation of gene expression Isolate RNA (always from a specific organism, region, and time point) Convert RNA to complementary DNA (with reverse transcriptase) Clone into cDNA vector Sequence the cDNA inserts Short cDNAs are called ESTs or Expressed Sequence Tags ESTs are strong evidence for genes Full-length cDNAs can be difficult to obtain vector insert

3110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction UniGene: Unique genes via ESTs Find UniGene at NCBI: UniGene clusters contain many ESTs UniGene data come from many cDNA libraries. When you look up a gene in UniGene, you can obtain information re: level & tissue distribution of expression

3210/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Gene Prediction Overview of steps & strategies What sequence signals can be used? What other types of information can be used? Algorithms HMMs, Bayesian models, neural nets Gene prediction software 3 major types many, many programs!

3310/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Overview of Gene Prediction Strategies What sequence signals can be used? Transcription: TF binding sites, promoter, initiation site, terminator, GC islands, etc. Processing signals: Splice donor/acceptors, polyA signal Translation: Start (AUG = Met) & stop (UGA,UUA, UAG) ORFs, codon usage What other types of information can be used? Homology (sequence comparison, BLAST) cDNAs & ESTs (experimental data, pairwise alignment)

3410/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Why? Smaller genomes Simpler gene structures Many more sequenced genomes! (for comparative approaches) Many microbial genomes have been fully sequenced & whole-genome "gene structure" and "gene function" annotations are available e.g., GeneMark.hmmGeneMark.hmm TIGR Comprehensive Microbial Resource (CMR)TIGRComprehensive Microbial Resource (CMR) NCBI Microbial GenomesNCBIMicrobial Genomes

3510/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Predicting Genes - Basic steps: Obtain genomic sequence BLAST it! Perform database similarity search (with EST & cDNA databases, if available) Translate in all 6 reading frames (i.e., "6-frame translation") Compare with protein sequence databases Use Gene Prediction software to locate genes Analyze regulatory sequences Refine gene prediction

3610/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Predicting Genes - Details: 1. 1st, mask to "remove" repetitive elements (ALUs, etc.) 2.Perform database search on translated DNA (BlastX,TFasta) 3.Use several programs to predict genes (GENSCAN, GeneMark.hmm, GeneSeqer) 4.Search for functional motifs in translated ORFs (Blocks, Motifs, etc.) & in neighboring DNA sequences 5.Repeat

3710/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Perform pairwise alignment with large gaps in one sequence (due to introns) Align genomic DNA with cDNA, ESTs, protein sequences Score semi-conserved sequences at splice junctions Using Bayesian model or MM Score coding constraints in translated exons Using a Bayesian model or MM Spliced Alignment Algorithm Brendel 2005 GeneSeqerGeneSeqer - Brendel et al.- ISU Intron GT AG Splice sites Donor Acceptor Brendel et al (2004) Bioinformatics 20: 1157

3810/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Brendel - Spliced Alignment II: Compare with protein probes Genomic DNA Start codonStop codon Protein Brendel 2005

3910/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Information Content I i : Extent of Splice Signal Window: i: ith position in sequence Ī : avg information content over all positions >20 nt from splice site  Ī : avg sample standard deviation of Ī Splice Site Detection Brendel 2005 Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal? YES

4010/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction Human T2_GT Human T2_AG Information content vs position Brendel 2005 Which sequences are exons & which are introns? How can you tell? Brendel et al (2004) Bioinformatics 20: 1157

4110/22/07BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction enen e n+1 inin i n+1 PGPG P A(n) P  G (1-P  G )P D(n+1) (1-P  G )(1-P D(n+1) ) 1-P A(n) PGPG Markov Model for Spliced Alignment Brendel 2005