doi> Norman Paskin, International DOI Foundation Digital Object Identifiers for Science Data
doi> Digital Object Identifier = DOI A name (not a location) for an entity on digital networks A system for persistent and actionable identification and interoperable exchange of managed information on digital networks –Standards-based components (detail in a moment) –Now to become an International Standard (in ISO TC46) Developed as cross-industry, cross-sector, not-for-profit effort managed by an open membership collaborative development body –International DOI Foundation (IDF) In widespread use now: –Over 15 million assigned, over 1000 naming authorities (users) –Key feature of scientific primary publishing as part of CrossRef system –Adopted for government documents (EC, OECD, UK, etc) In use, is a mechanism behind the scenes, –e.g. looks like a URL in a web context Offers interoperable common system for identification of science data: two projects considered as examples: –TIB project (citation of primary data sets) –Names for Life (biological taxonomy)
doi> The word identifier can mean several different things, e.g.: –Labels : Output of numbering schemes e.g. ISBN –Specifications for using labels: e.g. on internet URL, URN, URI (URI = Uniform Resource Identifier) –Implemented systems: Labels, following a specification, in a system e.g. DOI system. Packaged system offering label + tools + implementation mechanisms Requirements: reliability, automated global access, and interoperability –Interoperability = the possibility of use in services outside the direct control of the issuing assigner. Persistence implies interoperability (with the future) Interoperability implies extensibility (do not know future uses) Hence DOI is a generic framework applicable to any digital object –Digital object can be a representation of any entity Identifiers
Data Model Internet Resolution Numbering scheme Policies DOI is the combination of these four components doi>
DOI syntax can include any existing identifier label formal or informal, of any entity An identifier container e.g. – /NP5678 – /ISBN – / ISO-DOI NISO Z39.84, DOI Syntax
Internet resolution allows a DOI to link to any & multiple pieces of current data Resolve from DOI to data –initially to location (URL) – persistence May be to multiple data: –Multiple locations –Metadata –Services –Extensible user-defined Uses the Handle system -Implementing URI/URN concept -Running on TCP/IP (common co-inventor) -IETF RFCs 3650, 3651, See Release 1.0, September 2003 "Online Registries: The DNS and Beyond... [doi: /309registries ]
Data Dictionary + DOI AP framework DOI Data Model = Metadata tools: –a data dictionary to define + –a grouping mechanism to relate Necessary for interoperability –Enabling information that originates in one context to be used in another in ways that are as highly automated as possible. Able to use existing metadata –Mapped using a standard dictionary –Can describe any entity at any level of granularity –indecsDD which incorporates ISO MPEG 21 RDD IDF is the MPEG21 RDD registration authority
DOI policies allow any model for practical implementations Implementation through IDF –Governance and agreed scope, policy, rules of the road –Technical infrastructure: resolution mechanism, proxy servers, mirrors, back-up, central dictionary, –Social infrastructure: persistence commitments, fall-back procedures, cost-recovery (self-sustaining), shared use of system –Not a standard but a Registration Authority/maintenance agency IDF delegates through Registration Agencies –Each can develop own applications –Use in own brand ways appropriate for their community
DOI to become ISO TC46/SC9 standard Home of identification numbering: identifiers for semantically meaningful entities: ISO 2108 International Standard Book Numbering (ISBN) ISO 3297 International Standard Serial Number (ISSN) ISO 3901 International Standard Recording Code (ISRC) ISO International Standard Technical Report Number (ISRN) ISO International Standard Music Number (ISMN) ISO International Standard Audiovisual Number (ISAN) ISO International Standard Musical Work Code (ISWC) ISO Project Version identifier for Audiovisual Works (V-ISAN) ISO Project International Standard Text Code (ISTC) Information and Documentation - Identification and Description
doi> Resolve The Handle resolution technology allows you to access any kind of Service associated with your DOI. eg Services can include metadata services Identify DOI syntax can include any existing identifier, formal or informal, of any entity eg / / /ISBN /Norman_presentation / ISO-DOI Describe DOI metadata can be of any type, standard or proprietary eg OnixForBooks OnixForSerials IEEE/LOM MARC Dublin Core Proprietary scheme (to interoperate with anyone else in the DOI network, map to the Data Dictionary (iDD). DOI combination of components A package of services is an Application Profile
doi> DOI and scientific data DOI is already the core technology for maintaining cross-reference –persistent links between a citation and internet access to article CrossRef system used by 350+ publishers representing bulk of STM articles (as pre-publication link builder) 9,000 DOIs per day added to CrossRef. –Over 12 million DOIs now registered with CrossRef, –Over 850,000 assigned to books and conference proceedings. Several projects suggested to IDF using DOIs for data (not connected with CrossRef) –physico-chemical property data; biological microscopy images. –See Paskin, ICSTI 2002 paper Some projects have developed their own identifiers, very useful for their own area –E.g. Life Science Identifier (I3C/IBM): simple URN mechanism, non- generic, non-global –These can be incorporated into a DOI if needed to make globally interoperable and extensible Two projects in particular have developed DOI applications:
doi> (1) TIB: Citation of Primary Data Problem: re-use of existing data sets –Attribution of data source: make data publications citable in a standard way (cf. articles Citation Index) –Archiving of data in context so as to be discoverable and interoperable (usable by others) Background –CODATA National Committee WG, grant-aided by DFG (Sept 2001 to May 2002): Report "Concept of Citing Scientific Primary Data –Continuation as project for pilot implementation funded by DFG Oct 2003 to Oct 2005 at TIB (German National Library of Science & Technology) –Development of DOI registration agency for Data Solution: -DOIs for data sets, with associated metadata -Core management metadata applicable to all datasets -Structured metadata extensible to specific science disciplines
doi> (1) Citation of Primary Data: illustration of solution During her research for the World Data Center Climate (WDCC) Dr. Weather gains primary data about the weather in Hannover in the year –Primary data is tested, evaluated, stored and administrated at the WDCC. –Primary data is registered and allocated DOI at the TIB –With quality control of metadata, no change once allocated, etc Dr Weather can now cite this with a resolvable DOI e.g DOI: /WDCC/W_Han_2003_MMB_ (Prefix) = TIB as the registration agency. WDCC = research institute. W_Han_2003_MMB_2 = internal name of the Data DOI is resolvable directly, or via http as
doi> (1) Citation of Primary Data: illustration of solution Usage scenario 1: Dr. Storm is reading publications from Dr. Weather in a journal and would like to analyse her data under different aspects. Can resolve the DOI to obtain the data set for use In his publication Comparison of the weather from Hannover and Miami Dr. Storm cites Dr. Weathers data using its DOI, referring to the uniqueness and own identity of the original data. Citation example: Weather, 2003: Weather in Hannover for 2003 doi: /WDCC/W_Han_2003_MMB_2 Usage scenario 2: Mr. Nice is writing a paper about the sales figures of ice cream in Hannover in 2003, but he has no information about the weather. Searches via TIB central registration agency metadata search Result is doi: /WDCC/W_Han_2003_MMB_2 He resolves the DOI to find the data. The metadata refers him to the WDCC as publisher and data archive. In his paper he cites the data using the DOI.
doi> (2) Names for life: Biological taxonomy Problem: Future-proofing biological nomenclature –See Garrity and Lyons, OMICS, 2003 For a given nomenclature in a biological taxonomy, change occurs –e.g. new species recognised, species reassigned as the founding species of new genera; synonyms; species split into subspecies which later became separate species; –resulting in changes of names, genera, families, classes, relationships over time –How does researcher keep track? Solution: DOI proposed as tool –a data model of nomenclature and taxonomy –enabling disambiguation of synonyms and competing taxonomies –a metadata resolution service –enabling dissemination of archived and updated information objects through persistent links
macleodii (T) communis Alteromonas vaga nomenclature (2) Names for Life: illustration of problem doi>
macleodii (T) communis Alteromonas 1972 vaga nomenclature
communis vaga haloplanktis Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae Alteromonas macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae vaga communis (T) MarinomonasAlteromonas commune vagum multiglobiferum japonicum minutium biejerinckii maris hiroshimense pelagicum pusillum jannaschii kreigii Oceanosprillum maris williamsae linum (T) macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai vagabenthica hanedai MarinomonasAlteromonas putrifaciens (T) Shewanella japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum Oceanosprillum maris williamsae luteoviolaceae communis (T) linum (T) macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia hanedai luteoviolaceae denitrificans vagabenthica hanedai MarinomonasAlteromonasShewanella japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum Oceanosprillum maris williamsae putrifaciens putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans vagabenthica hanedai MarinomonasAlteromonasShewanella japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum Oceanosprillum maris williamsae colwelliana putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai MarinomonasShewanella japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis Alteromonas colwelliana colwelliana putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis putrifaciens hanedai denitrificans rubra citrea esperjiana undina aurantia luteoviolaceae tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra haloplanktis haloplanktis (T) Pseudoalteromonas undina haloplanktis tetradonis putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra Pseudoalteromonas undina antartica elyakoviii haloplanktis tetradonis haloplanktis haloplanktis (T) putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra Pseudoalteromonas undina antartica elyakoviii fridgidimarina geldimarina woodyii amazonensis baltica oneidensis pealeana violacea bacteriolytica prydzensis tunicata distincta elyakovii peptidolytica haloplanktis tetradonis mediterannea haloplanktis haloplanktis (T) putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra Pseudoalteromonas undina antartica elyakoviii fridgidimarina geldimarina woodyii amazonensis baltica oneidensis pealeana violacea bacteriolytica prydzensis tunicata distincta elyakovii peptidolytica tetrodonis japonica haloplanktis tetradonis mediterannea haloplanktis haloplanktis (T) putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea Pseudoalteromonas elyakoviii fridgidimarina geldimarina woodyii amazonensis baltica oneidensis pealeana violacea japonica denitrificans livingstonensis alleyanna atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra undina antartica bacteriolytica prydzensis tunicata distincta elyakovii peptidolytica tetrodonis haloplanktis tetradonis mediterannea haloplanktis haloplanktis (T) putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
vagabenthica hanedai colwelliana algae MarinomonasShewanella communis vaga haloplanktis rubra citrea esperjiana undina aurantia putrifaciens hanedai luteoviolaceae denitrificans tetradonis atlantica carageenovora Alteromonas colwelliana japonicum minutium biejerinckii maris hiroshimense multiglobiferum pelagicum pusillum commune jannaschii kreigii vagum biejerinckii pelagicum maris hiroshimense Oceanosprillum maris williamsae distincta fulginea Pseudoalteromonas elyakoviii fridgidimarina geldimarina woodyii amazonensis baltica oneidensis pealeana violacea japonica denitrificans livingstonensis alleyanna atlantica aurantia carrageenovora citrea esperjiana luteoviolacea nigrifaciens pisicida rubra undina antartica bacteriolytica prydzensis tunicata distincta elyakovii peptidolytica tetrodonis haloplanktis tetradonis 11 others mariniintestina saire schlegeliana gaetbuli mediterannea primoryensis haloplanktis haloplanktis (T) putrifaciens (T) communis (T) linum (T) macleodii (T) nomenclature
name taxon combined name exemplar nomos journal article gene annotation any online information strain record links from the web journal article strain record gene annotation journal article links to the web DOI (2) Names for Life: illustration of solution
dissemination nametaxon combined name exemplar nomos By reasoning over information objects, construct services that can be offered through multiple resolution. Look up this name and all its synonyms in PubMed Determine whether this exemplar is part of a taxon in another nomos Compare this name to the current state (contents) of the taxon (2) Names for Life: illustration of solution doi>
Summary: DOI A system for persistent and actionable identification and interoperable exchange of managed information on digital networks –Standards-based components (detail in a moment) –Now to become an International Standard (in ISO TC46) Developed as cross-industry, cross-sector, not-for-profit effort managed by an open membership collaborative development body –International DOI Foundation (IDF) In widespread use now: –Over 15 million assigned, over 1000 naming authorities (users) –Key feature of scientific primary publishing as part of CrossRef system –Adopted for government documents (EC, OECD, UK, etc) In use, is a mechanism behind the scenes, –e.g. looks like a URL in a web context Offers interoperable common system for identification of science data: two projects considered as examples: –TIB project (citation of primary data sets) –Names for Life (biological taxonomy)
doi>