BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
VectorBase Frank Collins, Scott Emrich, Dan Lawson,Greg Madey BRC PI/PM Meeting Bethesda, MD April 27, 2012.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
Glossina Transcriptome Annotation Karyn Megy, VectorBase European Bioinformatics Institute, UK.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
How to access genomic information using Ensembl August 2005.
VectorBase BRC VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton.
UniProt - The Universal Protein Resource
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
Genome Annotation BCB 660 October 20, From Carson Holt.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.
Tomato genome annotation pipeline in Cyrille2
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Annotation of Anopheline Genomes at VectorBase Dan Lawson, VectorBase & The Anopheles Genomes Cluster Consortium EMBL-EBI.
The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
1 GMOD Meeting, Spring 2005 Peili Zhang, FlyBase - Harvard Comparative Genome Annotation of Drosophila pseudoobscura and Its Implementation in chado.
Genome Annotation Rosana O. Babu.
VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX September 2011.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Interactions with other BRCs Scott Emrich “all hands” meeting VectorBase.
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
VectorBase Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Overview and History of VectorBase Frank Collins March 31, 2015.
VectorBase’s Population Biology Resources and How to Submit to Them Bob MacCallum Imperial College, London, UK July 16, 2013.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Sequence Curation: Adding an Additional Track to the Genome Browser dictyBase is populated with many different sources of data: gene predictions, Genbank.
Sequence Curation Paul Davis Sanger Institute. Overview Sequence curation within WormBase consortium. Import of sequence data. Prediction stats. Work.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
CCRC Cancer Conference November 8, 2015.
Data Loading into Ensembl Database TGAC Browser
Web Databases for Drosophila
VectorBase genome annotation
Genome Sequence Annotation Server
GEP Annotation Workflow
Ensembl Genome Repository.
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Presentation transcript:

BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase

BRC6 28 th October 2008 Arthropod vectors of human pathogens Lutzomyia Phlebotomus Culex Rhodnius Anopheles Glossina Aedes PediculusIxodes

BRC6 28 th October 2008 Deer tick Ixodes scapularis Vector of Lyme disease (spirochete Borrelia burgdorferi) Estimated genome size of 2.1 Gb Sequenced strain: Wikel 12th generation from ticks sourced from New York, Oklahoma & Connecticut First Chelicerate genome to be sequenced

BRC6 28 th October 2008 Genome annotation cycle Automatic gene build Assembly Community annotations Manual annotations Other genomes, gene sets Repeat library (TEs etc) ESTs, cDNAs Protein domains

BRC6 28 th October 2008 Generating sequence Sequencing undertaken by established sequencing centres (e.g. Broad, JCVI,) Initial assembly annotated in collaboration with the sequencing centre(s) 19,300,000 trace reads generated Approx. 6x WGS 570K BAC end sequencing Assembly produced at JCVI 194K EST sequences

BRC6 28 th October 2008 Assembly statistics This WGS project has the project accession ABJB The current version of the project (01) has the accession number ABJB , and consists of 1,141,594 scaffolds (ABJB ABJB ). ABJB ABJB ABJB Released assembly IscaW1 570,637 contigs 369,495 supercontigs Assembled coverage of 3.8x

BRC6 28 th October 2008 Preparing for gene build Repeatmasking Analyses to identify repeat elements RepeatScout RECON Standard tandem-repeat & low-complexity filtering Collate data sets Transcripts (cDNA & EST data) Peptides (taxonomic groupings, inc. Daphnia pulex) Train gene predictors, mainly Augustus (JCVI)

BRC6 28 th October 2008 Annotation plan First-pass gene prediction Focused on protein-coding genes CDS’s Semi-automated approach This is not manual curation Involvement of community where possible Timely delivery of gene set

BRC6 28 th October 2008 Gene Prediction Each group/centre has it’s own gene prediction pipeline/protocol. Each group produces a 1st pass ‘best guess’ set of predictions 0.5 sets, public release These sets are merged into a single set 1.0 set, not released Quality control activities 1.1.set, public release Which is annotated with protein features.. And released to the wider world

BRC6 28 th October 2008 Merging gene predictions Reduce to single predictions per locus Compare exon/intron structures Gene set #1Gene set #2 Identical structures Compatible structures Different structures Merge/Split structures ComplexNo Map Add isoform predictions based on EST/Peptide data Canonical gene set

BRC6 28 th October 2008 Merge of data sets to 1.0 release Simple, hierarchical system Reduce to single transcript per locus (simplicity) Compare loci across the 2 sets Categorize Manually investigate some examples Deal with each category individually Collate each group back to give a ‘minimal’ complete set Add alternate isoforms back into the set (transcripts, proteins) Add UTR extensions where possible QC the data set

BRC6 28 th October 2008 Merge annotation comparisons

BRC6 28 th October 2008 Examples Isoform-compat Isoform-diff

BRC6 28 th October 2008 Examples Merge/Splits Difficult

BRC6 28 th October 2008 GBrowse viewer

BRC6 28 th October 2008 VectorBase browser

BRC6 28 th October 2008 Final gene set (IscaW1.1) 20,486 protein-coding genes 48% have Pfam domain 40% have supporting EST evidence 8,138 tRNAs Over-prediction of Ser (4425) and Thr (1527) predictions 301 ncRNA Submitted to GenBank last week, release to be coordinated in the next couple of weeks

BRC6 28 th October 2008 Genome annotation cycle Automatic gene build Assembly Community annotations Manual annotations Other genomes, gene sets Repeat library (TEs etc) ESTs, cDNAs Protein domains

BRC6 28 th October 2008 Community annotation Web submission CHADO Researcher Community representative Appraisal Approval GFF3 Gene Build vb ! Total: 13,339 entries An. gambiae 9,423 Cx. quinquefasciatus 2,598 Ae. aegypti 1,281 Ix. scapularis 37

BRC6 28 th October 2008 Community annotation track in browser

BRC6 28 th October 2008 Lessons Annotation plan for sequencing and annotation of new genomes is well established (MSC & BRC) Clearly defining the data release strategies (0.5,1.0 & 1.1) Monthly conference calls Face to face meeting when merging 0.5 gene predictions Coordinated release between MSC, VectorBase and GenBank

BRC6 28 th October 2008 But we can always improve Agreement on project/public identifiers at the start of the project Primarily contigs and supercontigs Overall nomenclature applied as final step in annotation More QC before the major milestones Better communication

BRC6 28 th October 2008 Acknowledgements Kitsos Louis Pantelis Topalis Emmanuel Dialynas Ewan Birney Martin Hammond Daniel Lawson Karyn Megy Bill Gelbart Kathy Campbell Fotis Kafatos George Christophides Bob MacCallum Seth Redmond Peter Atkinson Peter Arensburger Catherine Hill Jason Meyer Frank Collins Greg Madey Scott Emrich Ryan Butler Katie Cybulski Nate Konopinski Rob Bruggner (alumni) E.O. Stinson (alumni) Dave Severson Neil Lobo Frank Collins Neil Lobo AedesAnophelesCulexIxodes EMBL-EBIHarvardIMBBImperialNotre Dame Colleagues Ensembl { Genebuilders, Web, Compara, Core, Outreach } BRCs { Pathema, ApiDB } Sequencers { JCVI & Broad Institute }