Integrating source modifiers with sequence data through a new GenBank submission module in Symbiota   Andrew N. Miller1, Phil Anders1, Neil Cobb2, Ben.

Slides:



Advertisements
Similar presentations
EMu MODULE MATTERS Collection events and Sites To merge or not to merge……. Dianne J Bray Museum Victoria Melbourne, Australia.
Advertisements

The Process of Data Ingestion in ÆKOS Andrew Graham and Matt Schneider TERN Ecoinformatics Data Analysts Logos used with consent. Content of this presentation.
EMu Online Data Sources Brad Lickman For Taxonomy and Geolocation (and Vocabulary Control)
Landcare Research LCR Manage 7 of 25 Nationally Significant Collections & Databases –CHR (plants, mosses etc) - 600k collections –PDD (fungi) - 80k collections.
Leveraging Filtered Push Technology to Enhance Remote Taxonomic Identifications Nico Franz 1, Edward Gilbert 1, Neil Cobb 2 & Paul Morris 3 1 School of.
How to publish genomic Data papers based on BOL data - Biodiversity Data Journal Lyubomir Penev Bulgarian Academy of Sciences & Pensoft Publishers ViBRANT.
Don’t make me think Biodiversity data publishing made easy Vince Smith, Alice Heaton, Laurence Livermore, Simon Rycroft, Ben Scott & Lyubomir Penev* The.
What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
The North American Carbon Program Google Earth Collection Peter C. Griffith, NACP Coordinator; Lisa E. Wilcox; Amy L. Morrell, NACP Web Group Organization:
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Integrated Taxonomic Information System Janet Gomon, Deputy Director, ITIS Smithsonian Institution Museum of Natural History The.
Data Analysis Working Group, DIMACS, 26 Sept 2005 DNA Barcoding and the Consortium for the Barcode of Life David E. Schindel, Executive Secretary National.
Collections Management KE EMu Spatial Technologies Turning information into knowledge.
The Role of Small Herbaria in Large Digitization Projects Chris Neefus, Albion Hodgdon Herbarium (NHA) University of New Hampshire, Durham, New Hampshire,
Virtual Federal Herbarium Prototype. What is a virtual federal herbarium? A “library” of specimen data and images of plants and fungi A searchable public.
Considerations for the Construction of Lichen Databases Data Management.
Currently 7 Thematic Collection Networks with 130 participating institutions A dvancing D igitization of B iodiversity C ollections (ADBC NSF Program)
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
Building a Data Sharing Community. The Vertebrate Networks Est. 1999, collections (2011) Est collections (2011) Est collections.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
Drupal Training Syllabus Chaitanya Lakshmi
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Making You Explore the Potential of Online Business CMS Based - Web Development Solutions.
SCAN Survey Results: Engaging the Public with Insect Digitization Workflows Dr. Melody Basham Hasbrouck Insect Collection Outreach Specialist Project Director.
AYAN MITRA CHRIS HOFFMAN JANA HUTCHINS Arizona Geospatial Data Sharing Web Application Development April 10th, 2013.
Resource Identification for a Biological Collection Information Service in Europe An introduction to the BioCISE project Walter G. Berendsohn Botanical.
PHP PHP: Hypertext Preprocessing Preston Brinks and Sean McKenzie.
INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino Centro de Referência em Informação Ambiental, CrIA.
University of Florida Florida State University
BEN Architecture Isovera Consulting Feb Internet consulting for non-profits 2 BEN Architecture Diagram.
Progress since the February 2005 London DNA Barcode of Life Conference Scott Miller, Chair Consortium for the Barcode of Life Smithsonian Institution.
 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we.
World Data Center for Marine Environmental Sciences.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
An Introduction to Scratchpads: Making your data work for you Laurence Livermore Natural History Museum, London Joinville, Brazil.
BGBM IT infrastructure and collection management Anton Güntsch.
Contents of the Site On the MY NASA DATA homepage you can find: Data Access Lesson Plans Computer Tools Science Focus E-Notes.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
NeMys: an evolving biological information system, a state of art Deprez, Tim (UGent) Vincx, Magda (UGent) Vanden Berghe, Edward (VLIZ) Mees, Jan (VLIZ)
EMu Interface and the Web Clear identification of web fields for users and administrators Visual identifier of the web presentations in EMu, ie Collection.
Don’t make me think Biodiversity Data Publishing Made Easy Laurence Livermore, Vince Smith, Alice Heaton, Simon Rycroft, Ed Baker, Ben Scott & Lyubomir.
U.S. Department of the Interior U.S. Geological Survey The Biological Data Profile Extending the FGDC Metadata Standard Kirsten Larsen.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Spotlight on the Global Plants Initiative
Integrating past, present, and projected future biological and environmental data to facilitate innovative global change biology research.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
 Allow access to observational, model and forecast data  Likely to be in the form of a portal with consistent meta data and pointer to other online location,
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Predicting Near Space Flights L. Paul Verhage 13 July 2013.
 Research Question  Goals and Scope  Digitization Workflow  Geo-referencing  Dissemination  Outreach and Crowd Sourcing.
Dan Rosauer Research School of Biology Australian National University Citing data in biogeography: The Atlas of Living Australia.
A look to the past for the future- The North American Profile Sharon Shin Metadata Coordinator Federal Geographic Data Committee.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Scratchpads Virtual Research Environments for taxonomic and biodiversity related data.
Accessing MVZ: A Primer and Demo of Arctos, MVZ’s Collection Management System, for Biodiversity Researchers
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
Created in 1996 to enhance museums and libraries nationwide and to provide coordination IMLS Statutory Authority for Digitization.
SharePoint 2007 Business Intelligence
Utility of an OAI Service Provider Search Portal
Barcode sequences at GenBank

Attie Bioinformatics Server Redesign
Midwest Training Entering Data on the Website
Integrating Access for Information Discovery and More
Kuali Days / November 2007 Tempe, Arizona
Presentation transcript:

Integrating source modifiers with sequence data through a new GenBank submission module in Symbiota   Andrew N. Miller1, Phil Anders1, Neil Cobb2, Ben Brandt2, and Ed Gilbert3 1University of Illinois Urbana-Champaign 2Northern Arizona University 3Arizona State University BCoN Meeting Lawrence, KS 13 February, 2018

Collection Management Systems Arctos Emu FileMaker Pro Microsoft Access Microsoft Excel Paradox Specify Symbiota

What is Symbiota? Specimen search engine Floristic data Species checklists Surveys Identification key Image library Distribution maps, descriptions, taxonomic information Genetic data Data aggregation

37 Million Records, 40 Portals, 13 Thematic Collection Networks

Key Symbiota Websites Homepage: http://symbiota.org/ Code @ GitHub: https://github.com/Symbiota Citable publication: http://bdj.pensoft.net/articles.php?id=1114 Google Group (support): http://symbiota.org/docs/google-group/ Symbiota Working Group: https://www.idigbio.org/wiki/index.php/Symbiota_Working_Group

Source modifiers are seldom populated in GenBank records The Problem Source modifiers are seldom populated in GenBank records specimen voucher country isolation source host collected by collection date identified by latitude longitude altitude

Fungi dataset (1,200,057 records) (fungi[orgn] NOT srcdb refseq[prop] NOT wgs[keyword] NOT tsa[keyword] NOT uncultured[filter]) NOT gbdiv pat[prop]) AND (specimen_voucher[text] OR isolate[text] OR culture_collection[text] OR strain[text]) Source modifiers specimen voucher 82% country 52% isolation source 29% host 29% collected by 0.00008% collection date 15% identified by 0% latitude longitude 0% altitude 0.6%

Arthropod dataset (3,415,661 records) Source modifiers specimen voucher 29% country 64% isolation source 2% host 4% collected by 0.0008% collection date 49% identified by 0% latitude longitude 0.0002% altitude 0.2%

Plant dataset (3,715,413 records) Source modifiers specimen voucher 33% country 24% isolation source 2% host 0.7% collected by 0% collection date 6% identified by 0% latitude longitude 4.6% altitude 0.4%

Vertebrate dataset (6,748,218 records) Source modifiers specimen voucher 41% country 16% isolation source 2% host 3.5% collected by 1.2% collection date 3.5% identified by 0% latitude longitude 0% altitude 0.2%

Pull metadata directly from Collection Management System The Solution Pull metadata directly from Collection Management System and submit to GenBank

Symbiota rRNA Submission Tool User Profile info Specimen metadata Sequence Send to GenBank

Genetic Data

Genetic Data

Genetic Data

PHP / MySQL Open Source Modular Specimen Floristic Identification