2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Homology Based Analysis of the Human/Mouse lncRNome
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Tomato genome annotation pipeline in Cyrille2
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
New data and tools at TAIR (The Arabidopsis Information Resource)
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Discover the UniProt Blast tool. Murcia, February, 2011Protein Sequence Databases Customize the BLAST results.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Part I: Identifying sequences with … Speaker : S. Gaj Date
UMR ASP UMR ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID F. Giacomoni, M.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
RNA Sequencing I: De novo RNAseq
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
A collaborative tool for sequence annotation. Contact:
GNPAnnot Community Annotation System applied to sugarcane BAC clone sequences Valentin GUIGNON PAG Sugarcane Genome Sequencing Initiative Sunday, 16 January.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the combined BLAST and Genome Browser Tutorial.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
Web Apollo/JBrowse • JBrowse is a web based genome browser
Annotating The data.
VectorBase genome annotation
Daphnia Genome Preview at wFleaBase.org
Basics of BLAST Basic BLAST Search - What is BLAST?
Genome Sequence Annotation Server
Sequence based searches:
Functional Annotation of Transcripts
Genome Sequence Annotation Server
GEP Annotation Workflow
Gene Annotation with DNA Subway
Advanced PGDB Editing: Regulation GO Terms
Genome Annotation w/ MAKER
Ensembl Genome Repository.
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Yating Liu July 2018 G-OnRamp workshop
Follow-up from last night: XSEDE credits
Basic Local Alignment Search Tool
Part II SeqViewer AraCyc Help
Welcome - webinar instructions
Presentation transcript:

2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA Bioinfo@INRA-Toulouse Helianthus annuus genome annotation HA412.v1.1.bronze.20141015 update Sébastien Carrere1 Ludovic Legrand1, Jérôme Gouzy1 Erika Sallet1, Thomas Schiex2 1 Laboratoire des Interactions Plantes Microorganismes (LIPM) INRA/CNRS 2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA

Summary Genome annotation Web tools The EuGene gene finder Sunflower bronze annotation pipeline Annotation summary Web tools Genome Browser Annotation Browser Sequence-based tools January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation of protein coding genes EuGene: an integrative gene finder Integration of different types of evidences protein similarities, evidences of transcription, etc. Alternative splicing prediction Combiner Integrating predictions from other gene finders (e.g FGENESH) Result in GFF3 format Ensuring interoperability with a large number of software and databases (chado, gbrowse, jbrowse, apollo, etc..) Availability Open source software (Artistic License) http://mulcyber.toulouse.inra.fr/projects/eugene/ Foissac S, Gouzy J, Rombauts S, Mathé C, Amselem J, Sterck L, Van de Peer Y, Rouzé P, Schiex T: Genome Annotation in Plants and Fungi: EuGène as a Model Platform. Current Bioinformatics 2008, 3:87-97. January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sunflower EuGene pipeline January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sunflower EuGene pipeline Deal with the large amount of transposable elements Reference database contains a lot of TE Need to clean these databases before creating similarities evidences TE and flanking genes were collapsed Need to consider TE Regions as non coding regions January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sunflower EuGene pipeline Deal with N stretches  configure EuGene to allow gene prediction through 3kb gaps January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation summary 94.33 % of HA412-HO EST are correctly mapped (~90% of XRQ ESTs)  Gene space is covered 90935 protein coding genes 59817 with Full Length Best Hits (spanning 60% of the length of the A. thaliana | SwissProt | Unitprot_plant protein) OR « EST/RNAseq assembly » support 39050  with EST support over 80% of the mRNA 13568  gene models correspond to full length A. thaliana proteins Lettuce Genome Assembly, Structure and Annotation (Maria Jose Truco , PAG 2014): “Genome annotation of the assembled genome using three prediction pipelines postulated a set of 94,556 non-redundant gene models. From those, 41,000 high confidence gene models were identified […] that combines transcriptomic and prediction evidence.” Nomenclature : Ha412v1r1_LGgXXXXXX January 2015 – Sebastien.Carrere@toulouse.inra.fr

Web Tools https://www.heliagene.org/HA412.v1.1.bronze.20141015/ Login / password: see consortium January 2015 – Sebastien.Carrere@toulouse.inra.fr

Genome Browser Available tracks Query with Assembly Gene Models (v0 and v1.1) Protein similarities TE predictions (MITE, LTR, BlastX) Ha412-HO RNA-seq libraries mapping Transcript alignments Query with Genomic Region Gene locus tags Transcripts accessions HaT13l or Ha412T4l January 2015 – Sebastien.Carrere@toulouse.inra.fr

Contextual menus Linked resources January 2015 – Sebastien.Carrere@toulouse.inra.fr

Contextual menus Linked resources January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation Browser Pre-computed analysis to speed up annotation mining Full-text search through Automatic functionnal annotation (InterPro based) Blast hits accessions / descriptions (Blastp vs. Uniprot-Plants, B. dystachyon, A. thaliana) InterPro hits accessions (GO terms, database accessions – PFAM, InterPro terms) Exact match Partial match Complex queries Region search January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation Sheet January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation Sheet Pre-computed protein families January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation Sheet Pre-computed blast results 5 best hits with e-value < 1e-5 against A. thaliana TAIR10 B. distachyon proteome Uniprot-Plants January 2015 – Sebastien.Carrere@toulouse.inra.fr

Annotation Sheet Pre-computed InterPro results January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sequence-based tools Blast server January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sequence-based tools Extract-seq January 2015 – Sebastien.Carrere@toulouse.inra.fr

Sequence-based tools Protein families January 2015 – Sebastien.Carrere@toulouse.inra.fr

Next Annotation components (gene structure  genome browser) will be updated/improved as soon as the gold genome assembly is available Sequence datasets (annotations, hits) will be integrated as a new source into the HeliaMine portal Feature request/Bug report Jerome.Gouzy@toulouse.inra.fr Sebastien.Carrere@toulouse.inra.fr January 2015 – Sebastien.Carrere@toulouse.inra.fr