Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.

Slides:



Advertisements
Similar presentations
GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Ab initio gene prediction Genome 559, Winter 2011.
Hidden Markov Models in Bioinformatics
Predicting Genes in Mycobacteriophages December 8, In Silico Workshop Training D. Jacobs-Sera.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Gene predictions for eukaryotes attgccagtacgtagctagctacacgtatgctattacggatctgtagcttagcgtatct gtatgctgttagctgtacgtacgtatttttctagagcttcgtagtctatggctagtcgt.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Gene Identification Lab
Introduction to BioInformatics GCB/CIS535
Comparative ab initio prediction of gene structures using pair HMMs
An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.
“Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004.
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
Eukaryotic Gene Finding
Genome Annotation BCB 660 October 20, From Carson Holt.
Biological Motivation Gene Finding in Eukaryotic Genomes
Genome Analysis & Gene Prediction. Overview about Genes Gene : whole nucleic acid sequence necessary for the synthesis of a functional protein (or functional.
NGS Analysis Using Galaxy
Hidden Markov Models In BioInformatics
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Genome Annotation Rosana O. Babu.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Introduction to RNAseq
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
Annotation of eukaryotic genomes
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Annotating The data.
Genome Annotation (protein coding genes)
EGASP 2005 Evaluation Protocol
EGASP 2005 Evaluation Protocol
Genes, Genomes, and Genomics
Eukaryotic Gene Finding
Ab initio gene prediction
Introduction to Bioinformatics II
A User’s Guide to GO: Structural and Functional Annotation
Genome Annotation and the Human Genome
Introduction to Alternative Splicing and my research report
The Toy Exon Finder.
Presentation transcript:

Gene Finding Genome Annotation

Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics Population biology & evolution Medical genomics

Basic Approaches Computational – Absolute rules: start and stop codons – Statistical probabilities: which codon is a true start? Introns splice junctions codon usage Experimental – Comparison with known genes/proteins (BLAST) – Expressed sequence tags – RNAseq data

Computational Gene Prediction Statistical properties of protein-coding genes differ from those of non-coding sequence – Long ORFs On average stop codons should occur 3 times in every 64 codons (~1/21) – Codon bias codonAmino acid % ACAThr24.6 ACCThr35.5 ACGThr28.4 ACUThr11.4

Gene features tend to occur in specific sequence contexts from Korf(2004) a.Splice acceptor sites b.Splice donor sites c.Translation starts d.Splice acceptor sites for A. thaliana genes predicted using C. elegans parameters

Many of the ab initio gene finders use Hidden Markov Models (HMMs) HMMs – Contain parameters defining probabilities that specific gene features occur in different sequence contexts They can be used to predict – transcription start sites – Intron splice junctions – Poly-A addition sites – promoters

Standard practice is to perform gene predictions with multiple programs We will run two programs in today’s exercise: SNAP – Korf (2004) Gene finding in novel genomes BMC Bioinformatics 5:59 AUGUSTUS – Stanke et al (2004) AUGUSTUS: a web server for gene finding in eukaryotes. Nucl. Acids Research 32:W309

Gene validation Independent evidence that our candidate gene is, in fact, a gene – Conserved protein motifs – Blast matches – Expressed sequence tags – RNAseq reads

For today’s exercise We will use the following evidences: – Genes/proteins already identified in M.oryzae (many being well supported by blast, EST and other transcriptomic data) Splice junction information from the RNAseq mapping that we performed yesterday

Information overload!!! Results from: – SNAP – AUGUSTUS Magnaporthe genes Magnaporthe proteins RNAseq mapping data How are we going to make sense out of these highly redundant datasets?

Enter…MAKER Synthesizes multiple forms of gene prediction data – Predictions and evidences Outputs a single, consistent set of genes and gene models, including quality values Uses a standard gene annotation format – GFF3 (related to the GTF format used yesterday) – Results can be imported into a genome browser

GFF3 format seqidsourcetypeStartEndScoreStrandphaseattributes

Gene finding is an iterative process SNAP AUGUSTUS HMM GENE MODELS GENE MODELS BLAST matches BLAST matches ESTs MAKER