Download presentation
Presentation is loading. Please wait.
Published byPatrick Wilkins Modified over 9 years ago
1
The BARCODE Data Standard: CBOL’s Partnership with the International Nucleotide Sequence Database Collaboration (INSDC) David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.eduSchindelD@si.edu; http://www.barcoding.si.eduhttp://www.barcoding.si.edu 202/633-0812; fax 202/633-2938
2
Infrastructure of Taxonomy: Fragmented, Disconnected Collections and databases of specimens Seedbanks, culture/cell line collections Compilations of taxonomic names Floristic and faunistic surveys/inventories Monographs, Taxonomic revisions Data repositories (gene sequences, characters, images, trees) The (undigitized) Taxonomic Literature
3
Linking Logical Categories (1): Specimens, Names, Opinions ??
4
Linking Logical Categories (2): Naming and defining species Holotype specimens
5
Linking Logical Categories (3): Establishing species boundaries ?? Species concept beyond holotype - Paratype series - Typological versus population thinking - Genetic lineages - BSC (hard to apply)
6
Linking Logical Categories (4): Interpreting species boundaries ?? Other assigned specimens: Species philosophy of original author Interpretation of user
7
Databases of Names, Specimens, Species Distributions Authority files of taxonomic names Museum databases of associated data Databases of species occurrences and distribution (OBIS)
8
DNA Barcodes: A Key Variable for Biodiversity Informatics Authority files of taxonomic names Museum databases of associated data Databases of species occurrences and distribution (OBIS)
9
CBOL’s Working Groups Database: Designing/constructing the Barcode Section of GenBank DNA: Protocols for formalin-fixed and old museum specimens; Producing LIMS for dissemination Data Analysis: Beyond phenetic methods; population genetics perspective (Plants: Initiated discussions of plant barcode gene region(s))
10
BARCODE Data Standards Consultations with GenBank, ITIS, museum database developers, GBIF, ISIS, from 2004 Consensus results of Front Royal meeting –GBIF ITIS GRIN –NBII Species2000 IPNI –ICZN ZooRecord OBIS GenBank Proposed to International Nucleotide Sequence Database Collaboration (EMBL, DDBJ) Approved by CBOL and INSDC mid-2005
11
Reserved Keyword “BARCODE” GenBank reviews records against standard Adds keyword “BARCODE” in annotation field Can be removed by CBOL
12
Requirements Species name selected from authority Sequence from COI or other barcode region approved by CBOL Structured link to voucher specimen Online access to metadata Trace files and quality scores Primer sequences and names Minimum sequence length (500bp for COI) Geographic locality
13
Recommended fields, added to INSDC at CBOL’s request Latitude and longitude Name of the identifier Name of the collector Date of collection
14
New Data Fields Latitude/Longitude Collection date Collector’s name Identifier’s name
15
BARCODE Keyword in GenBank
16
Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) BARCODE Records in INSDC Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.
17
Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) Structured link to Vouchers Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.
18
What constitutes a voucher? Long-term reference tied to BARCODE Corroborates the species identification Provides additional tissue CBOL relies on community decisions: –Full specimen? –Parts for morphologic features (e.g., feather?) –Frozen tissue? –E-Vouchers for large specimens, destructive samples, catch-and-release?
19
Where’s the voucher?
20
Linking to Vouchers Structured Voucher IDs
21
Based on Darwin Core Eventually will be replaced by GUID Triplet: Institution Acronym : Collection : Specimen # NMNH : FISH : 123456 CBOL, GBIF and NCBI discussing global registry of: –Institutional acronyms –Collection codes –“Pre-accession” specimen IDs Voucher Specimen ID
22
Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) Link to Species Names Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp. Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species
23
Species names in INSDC
24
NCBI Taxonomy Browser The good, the bad, and the ugly Species names provided by submitters Checked against compilations Linkout to Catalogue of Life, other sources Names not found added to Taxonomy Browser Submitters informed of errors but not forced to make corrections
25
NCBI Taxonomy Browser
26
NCBI Taxonomy Browser Some names have no other source
27
Other names linked to GBIF and Catalogue of Life…
28
…and primary data source
29
Authoritative Species Lists Catalogue of Life Species lists compiled by barcoding projects –FISH-BOL from FishBase, CoF –MBI mosquito catalog Nomenclators NameBank New names in publications Eventually, central registries (e.g., ZooBank)
30
Provisional Species ID Uncertain identifications Species complexes Newly discovered variants Ecogenomic samples Need general guidelines to ensure: –Globally unique, –Stable, retrievable –Can’t be confused with valid species name
31
Barcode Sequence Voucher Specimen Species Name Specimen Metadata Literature (link to content or citation) BARCODE Records in INSDC Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Georeference Habitat Character sets Images Behavior Other genes Trace files Other Databases Phylogenetic Pop’n Genetics Ecological Primers Databases - Provisional sp.
32
Improving links to taxonomic journals Connecting taxonomic articles
33
Links to Taxonomic Literature Library-Laboratory meeting in London, 2005, on electronic access to taxonomic literature Led to formation of Biodiversity Heritage Library initiative Proactive steps with PubMed to add taxonomic journals to online abstracts Aggressive negotiation with publishers of barcoding papers Involvement in Encyclopedia of Life
34
Long-term data curation of BARCODE records Data records assembled IDs consistent with other records? Compliant with BARCODE standards? Data records released on INSDC Data records published in BOLD Community feedback Update records (audit trail of species names retained) CBOL control of BARCODE flag GenBank adds BARCODE flag
35
Acknowledgements Robert Hanner, University of Guelph, Chair of CBOL’s Database Working Group Scott Federhen, NCBI Taxonomy Browser Donald Hobern, Head of Informatics, GBIF
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.