Download presentation
Presentation is loading. Please wait.
Published byBarrie Shields Modified over 9 years ago
1
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)
2
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 VectorBase EMBL-EBI IMBB Imperial College, London University of Notre Dame Harvard University University of New Mexico
3
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 VectorBase Integrated genomic resource for arthropod vectors of human pathogens. Funded by NIH-NIAID as part of four Bioinformatic Resource Centers (BRCs). Collaboration of 3 European and 3 US Institutes. VectorBase is: Both service provider and content generator A collator of genomic information A genome annotation group (gene structure prediction) A provider of tools for browsing and data mining vector genomes A helpdesk for community queries Responsible for data submissions to the public archival databanks Committed to regular release cycle (5-6 releases per year)
4
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Summary of current contents Genome Gene set Transcriptomics Gene expression PopGen Aedes aegypti ✓✓✓✓✕ Anopheles gambiae ✓✓✓✓✓ Culex quinquefasciatus ✓✓✕✓✕ Glossina morsitans ✓✕✓✕✕ Ixodes scapularis ✓✓✕✕✕ Pediculus humanus ✓✓✕✕✕ Rhodnius prolixus ✓✕✓✕✕
5
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 VectorBase website Release cycle has allowed for more frequent updates to the Ensembl browser Includes support for presenting local data (GFF3, BAM, BED, ( big ) WIG & VCF files) Updates/development for specific data types (e.g. PopGen, ontologies & search) VectorBase site needs a style/technology make over Aim to removing clutter from the site and improving user experience Merging our Help wiki (FAQ, tutorials, newsletter, forum) into the main site Advantages for site maintenance and flexibility for coming years Now is the time to get in contact with comments, wish list items. Please contact VectorBase if you have comments about the current site, wish lists for the new site and if you want to be involved in user testing the new site.
6
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Pre-sites for upcoming genomes
7
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Pre-sites for upcoming genomes BrowseSearch
8
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Supporting species without genomic resources Genome De-linked Annotation Viewer BrowseSearch
9
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Supporting species without genomic resources
10
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Updating annotation sets CommunityVectorBase Submissions from community (CAP) Previously as.xls file Soon to also accept fasta and gff3 DAS server for data presentation overhauled Integration into reference gene set codified Manual curation at Harvard/New Mexico Priority is Anopheles gambiae Provides QC for new gene builds Final arbiter for issues arising from CAP Move to Aedes aegypti in late 2011 The quality of the gene sets will improve faster if you, the community, play an active role in correcting gene predictions. Please contact VectorBase if you find an incorrect prediction or have data sets which can improve the gene set.
11
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Updating annotation sets RNA-Seq Aim: Gene prediction using high-throughput transcriptome data a.k.a ‘RNA-seq’ Overview Alternative method for generating transcript-based gene predictions. Uses Illumina or 454 reads as well as traditional Sanger sequenced ESTs Relatively short read lengths makes intron-exon junction prediction hard countered by the very high volume of data generated (millions of reads) Pipeline uses existing short-read algorithms for gene prediction: tophat, cufflinks, scripture Potential problems Data sets require significant filtering and pre-analysis QC Mis-calling of homopolymer runs in 454 data leads to data noise and mis- prediction of splice sites Large data sets include many inappropriate splicing events (intron read through, NMD targets etc.) Summary Effective at finding UTR regions and validating/improving existing predictions Vital for making sense of sequence based measures of gene expression
12
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Updating annotation sets Projection from reference Projection build Aim: Gene prediction using ‘high’ quality reference set from a related species. Overview When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly. This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. Whole-genome alignment (WGA) between reference and target using BLASTz. Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. Project predictions through transformation of coordinates between reference and target assemblies. Summary Effective for low coverage and poor quality assemblies. Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction.
13
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Many issues with the PEST assembly as a reference S molecular form is proposed as the next reference Sanger* Illumina † 454 Hybrid assembly strategy Metrics of success Project existing gene predictions de novo prediction in novel regions Re-map important datasets
14
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Validation of the assembly by normal metrics Emphasis on the concordance with large scale restriction map (optical map)
15
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Anopheles gambiae reference sequence
16
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Upcoming genomes: Kolymbari 2013? NHGRI White papers Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata Anopheles Anopheles darlingi* Anopheles stephensi Others Aedes Aedes albopictus i5K initiative ?...
17
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Notices 2nd round of Driving Biological Projects solicitation 2 years funding at $300K per year maximum 2 page letters of interest by August 1st Invited full proposals by November 1st http://www.vectorbase.org/Other/News/?id=140 Hiring an outreach position at Notre Dame Details on the University of Notre Dame website http://www.vectorbase.org/Other/News/?id=145
18
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Contact VectorBase at info@vectorbase.orginfo@vectorbase.org
19
VectorBase http://www.vectorbase.org Kolymbari Meeting July 2011 Acknowledgements V EMBL-EBI Imperial College Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey Fotis Kafatos Bob MacCallum George Christophides Seth Redmond NoTre Dame HaRvard IMBB New MexicO A Sequencers EnsEmbl Maggie Werner-Washburne Phil Baker Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell Kitsos Louis Pantelis Topalis Emmanuel Dialynas TIGR/JCVI WashU Broad Institute Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.