VectorBase Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

Slides:



Advertisements
Similar presentations
VectorBase Frank Collins, Scott Emrich, Dan Lawson,Greg Madey BRC PI/PM Meeting Bethesda, MD April 27, 2012.
Advertisements

ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
Peter Tsai Bioinformatics Institute, University of Auckland
BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
Genome Annotation BCB 660 October 20, From Carson Holt.
NGS Analysis Using Galaxy
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI.
Dan Lawson, All Sites VectorBase Releases. 2 VectorBase 2012 A release cycle for VectorBase Regular release every 2 months In place since June 2010 Latest.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
New data and tools at TAIR (The Arabidopsis Information Resource)
05/04/2005 Informatics Meeting C. elegans – “Back To The Future”. Paul Davis (aka Huey)
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Annotation of Anopheline Genomes at VectorBase Dan Lawson, VectorBase & The Anopheles Genomes Cluster Consortium EMBL-EBI.
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.
Introduction to RNA-Seq & Transcriptome Analysis
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
RNA-seq workshop ALIGNMENT
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX September 2011.
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Introduction to RNAseq
A collaborative tool for sequence annotation. Contact:
Overview and History of VectorBase Frank Collins March 31, 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
VectorBase’s Population Biology Resources and How to Submit to Them Bob MacCallum Imperial College, London, UK July 16, 2013.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Funding for Tsetse Genome Sequencing in the USA Neil Hall TIGR (soon to be renamed JCVI) But I am soon to be leaving to go to The University of Liverpool.
Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.
The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.
Accessing and visualizing genomics data
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Canadian Bioinformatics Workshops
CCRC Cancer Conference November 8, 2015.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Denise Carvalho-Silva Ensembl Outreach
Cancer Genomics Core Lab
VectorBase genome annotation
Using RNA-seq data to improve gene annotation
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Functional Annotation of the Horse Genome
Ensembl Genome Repository.
Sequence Analysis - RNA-Seq 2
Presentation transcript:

VectorBase Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)

VectorBase Kolymbari Meeting July 2011 VectorBase EMBL-EBI IMBB Imperial College, London University of Notre Dame Harvard University University of New Mexico

VectorBase Kolymbari Meeting July 2011 VectorBase Integrated genomic resource for arthropod vectors of human pathogens. Funded by NIH-NIAID as part of four Bioinformatic Resource Centers (BRCs). Collaboration of 3 European and 3 US Institutes. VectorBase is: Both service provider and content generator A collator of genomic information A genome annotation group (gene structure prediction) A provider of tools for browsing and data mining vector genomes A helpdesk for community queries Responsible for data submissions to the public archival databanks Committed to regular release cycle (5-6 releases per year)

VectorBase Kolymbari Meeting July 2011 Summary of current contents Genome Gene set Transcriptomics Gene expression PopGen Aedes aegypti ✓✓✓✓✕ Anopheles gambiae ✓✓✓✓✓ Culex quinquefasciatus ✓✓✕✓✕ Glossina morsitans ✓✕✓✕✕ Ixodes scapularis ✓✓✕✕✕ Pediculus humanus ✓✓✕✕✕ Rhodnius prolixus ✓✕✓✕✕

VectorBase Kolymbari Meeting July 2011 VectorBase website Release cycle has allowed for more frequent updates to the Ensembl browser Includes support for presenting local data (GFF3, BAM, BED, ( big ) WIG & VCF files) Updates/development for specific data types (e.g. PopGen, ontologies & search) VectorBase site needs a style/technology make over Aim to removing clutter from the site and improving user experience Merging our Help wiki (FAQ, tutorials, newsletter, forum) into the main site Advantages for site maintenance and flexibility for coming years Now is the time to get in contact with comments, wish list items. Please contact VectorBase if you have comments about the current site, wish lists for the new site and if you want to be involved in user testing the new site.

VectorBase Kolymbari Meeting July 2011 Pre-sites for upcoming genomes

VectorBase Kolymbari Meeting July 2011 Pre-sites for upcoming genomes BrowseSearch

VectorBase Kolymbari Meeting July 2011 Supporting species without genomic resources Genome De-linked Annotation Viewer BrowseSearch

VectorBase Kolymbari Meeting July 2011 Supporting species without genomic resources

VectorBase Kolymbari Meeting July 2011 Updating annotation sets CommunityVectorBase Submissions from community (CAP) Previously as.xls file Soon to also accept fasta and gff3 DAS server for data presentation overhauled Integration into reference gene set codified Manual curation at Harvard/New Mexico Priority is Anopheles gambiae Provides QC for new gene builds Final arbiter for issues arising from CAP Move to Aedes aegypti in late 2011 The quality of the gene sets will improve faster if you, the community, play an active role in correcting gene predictions. Please contact VectorBase if you find an incorrect prediction or have data sets which can improve the gene set.

VectorBase Kolymbari Meeting July 2011 Updating annotation sets RNA-Seq Aim: Gene prediction using high-throughput transcriptome data a.k.a ‘RNA-seq’ Overview Alternative method for generating transcript-based gene predictions. Uses Illumina or 454 reads as well as traditional Sanger sequenced ESTs Relatively short read lengths makes intron-exon junction prediction hard countered by the very high volume of data generated (millions of reads) Pipeline uses existing short-read algorithms for gene prediction: tophat, cufflinks, scripture Potential problems Data sets require significant filtering and pre-analysis QC Mis-calling of homopolymer runs in 454 data leads to data noise and mis- prediction of splice sites Large data sets include many inappropriate splicing events (intron read through, NMD targets etc.) Summary Effective at finding UTR regions and validating/improving existing predictions Vital for making sense of sequence based measures of gene expression

VectorBase Kolymbari Meeting July 2011 Updating annotation sets Projection from reference Projection build Aim: Gene prediction using ‘high’ quality reference set from a related species. Overview When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly. This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. Whole-genome alignment (WGA) between reference and target using BLASTz. Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. Project predictions through transformation of coordinates between reference and target assemblies. Summary Effective for low coverage and poor quality assemblies. Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction.

VectorBase Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Many issues with the PEST assembly as a reference S molecular form is proposed as the next reference Sanger* Illumina † 454 Hybrid assembly strategy Metrics of success Project existing gene predictions de novo prediction in novel regions Re-map important datasets

VectorBase Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Validation of the assembly by normal metrics Emphasis on the concordance with large scale restriction map (optical map)

VectorBase Kolymbari Meeting July 2011 Anopheles gambiae reference sequence

VectorBase Kolymbari Meeting July 2011 Upcoming genomes: Kolymbari 2013? NHGRI White papers Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata Anopheles Anopheles darlingi* Anopheles stephensi Others Aedes Aedes albopictus i5K initiative ?...

VectorBase Kolymbari Meeting July 2011 Notices 2nd round of Driving Biological Projects solicitation 2 years funding at $300K per year maximum 2 page letters of interest by August 1st Invited full proposals by November 1st Hiring an outreach position at Notre Dame Details on the University of Notre Dame website

VectorBase Kolymbari Meeting July 2011 Contact VectorBase at

VectorBase Kolymbari Meeting July 2011 Acknowledgements V EMBL-EBI Imperial College Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey Fotis Kafatos Bob MacCallum George Christophides Seth Redmond NoTre Dame HaRvard IMBB New MexicO A Sequencers EnsEmbl Maggie Werner-Washburne Phil Baker Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell Kitsos Louis Pantelis Topalis Emmanuel Dialynas TIGR/JCVI WashU Broad Institute Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo