International Congress of Entomology, Orlando Notes: Add in Plazi and the idea of the treatment server Terry Catapano September 26, 2016 International Congress of Entomology, Orlando
Extracting Linked Open Data from Taxonomic Publications Notes: Add in Plazi and the idea of the treatment server Plazi Extracting Linked Open Data from Taxonomic Publications Terry Catapano Plazi, New York (http://plazi.org)
5,000 journals with taxonomic content 1,900,000 species described Who are we? 3 Plazi 500,000,000+ printed pages 5,000 journals with taxonomic content 1,900,000 species described 20,000,000+ species treatments BUT: The facts are hidden Incomplete digitization Publications are not semantically enhanced Data are not linked Most data are not open Plazi solution: Linking through taxonomic treatments
Taxonomic Treatment Plazi Formica obsoleta Linnaeus, 1758: 580 Who are we? 4 Plazi Taxonomic Treatment Formica obsoleta Linnaeus, 1758: 580 name description distribution Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Linnaeus has to be credited for Latin Binomen AND Treatment.
Plazi TreatmentBank: Million Treatment Goal For 1M Treatments, at minimum: Identify with HTTP URIs in form of http://treatment.plazi.org/id/[UUID] Metadata Taxon Concept Publication Information Representations HTML RDF [XML] Exports Biodiversity Literature Repository EOL GBIF WikiData Notes: Add in Plazi and the idea of the treatment server
Plazi: Output Bibliographic references: > 800,000 Publications (year of publishing): 17,799 (2016: 3,500) Taxonomic treatments: 165,692 (50,000) Observation Records: 53099 Observation Records geo-referenced: 23068 Taxonomic names: 152,641 (45,000) Bibliographic references: > 800,000 RDF triples: > 100 Million (1M) Scientific illustrations: >140,000 (being uploaded to BLR/Zenodo) Notes: Add in Plazi and the idea of the treatment server
TreatmentBank
Plazi: TreatmentBank: Treatment stubs: adding content Notes: Add in Plazi and the idea of the treatment server
Sources Legacy publications Print Digitization Text Capture XML Digitized Text Capture XML Born Digital Text Extraction XML Prospective publishing Pensoft Journals TaxPub XML RDF Notes: Add in Plazi and the idea of the treatment server
Plazi conversion workflow TreatmentBank find scan text extraction markup store Notes: Add in Plazi and the idea of the treatment server
Daily Automated Processing of New Taxa Notes: I am afraid, I am going to loose some of you at some point, but I will try to get you all together at the end of the talk
TreatmentBank: HTML Representation Notes: Add in Plazi and the idea of the treatment server
Treatment Text: XML Representation
XML Representation: Text Markup and Enhancement Treatments Treatment Sections Features of interest Taxon Names Treatment Citations Material Citations (e.g., specimens) Bibliographic References (w/ citation) Figures (w/citation) Tables (w/ ciation) Notes: Add in Plazi and the idea of the treatment server
Nomenclature Section and Taxon Name Notes: Add in Plazi and the idea of the treatment server
Treatment Citation Notes: Add in Plazi and the idea of the treatment server
Material Citation Notes: Add in Plazi and the idea of the treatment server
TreatmentBank: online editing: material citation Notes: Add in Plazi and the idea of the treatment server
Semantic XML Publishing: TaxPub Notes: Add in Plazi and the idea of the treatment server
Treatment Data: RDF Representation
Treatment Data Published in Publication Defines Taxon Concept [1 and only 1] Cites Treatments/Taxon Concepts Cites Material (Specimens) hasInformation Information Item [Text] Content Data Notes: Add in Plazi and the idea of the treatment server
Treatment Data: Vocabularies, Ontologies, and Identifiers Treatment Ontology https://github.com/plazi/treatmentontologies OBKMS Ontology (Viktor Senderov/Pensoft) Plazi TreatmentBank HTTP URIs Publication: Dublin Core, SPAR (FABIO, PRO, FRBR) DOI (CrossRef, Zenodo/DataCite) ORCID, ISSN Taxon Concept DarwinCore, DarwinCoreSW ZooBank HTTP URIs ORCID, ResearcherID, Collection Codes, Repository IDs Citations: CiTO Information Item: SPM (+ EOL SPM extensions) Data: Trait Ontologies; SDD, etc… Notes: Add in Plazi and the idea of the treatment server
Treatment Data: Taxon Concept Notes: Add in Plazi and the idea of the treatment server
Treatment Data: Publication Information Notes: Add in Plazi and the idea of the treatment server
Biodiversity Literature Repository: DOIs for Legacy Literature Access, archive, DOI Who are we?
Treatment Data: Treatment Citations Notes: Add in Plazi and the idea of the treatment server
TreatmentBank: Taxonomic data: linking treatments
Treatment Data: Material Citations Notes: Add in Plazi and the idea of the treatment server
Treatment Data: Specimen Data Analysis and Outputs Notes: Add in Plazi and the idea of the treatment server
Treatment Data: Other Information Content Notes: Add in Plazi and the idea of the treatment server
Thank you! Terry Catapano catapano@plazi.org Notes: Add in Plazi and the idea of the treatment server