Download presentation
Presentation is loading. Please wait.
Published byKelly Brice Carr Modified over 6 years ago
1
International Congress of Entomology, Orlando
Notes: Add in Plazi and the idea of the treatment server Terry Catapano September 26, 2016 International Congress of Entomology, Orlando
2
Extracting Linked Open Data from Taxonomic Publications
Notes: Add in Plazi and the idea of the treatment server Plazi Extracting Linked Open Data from Taxonomic Publications Terry Catapano Plazi, New York (
3
5,000 journals with taxonomic content 1,900,000 species described
Who are we? 3 Plazi 500,000,000+ printed pages 5,000 journals with taxonomic content 1,900,000 species described 20,000,000+ species treatments BUT: The facts are hidden Incomplete digitization Publications are not semantically enhanced Data are not linked Most data are not open Plazi solution: Linking through taxonomic treatments
4
Taxonomic Treatment Plazi Formica obsoleta Linnaeus, 1758: 580
Who are we? 4 Plazi Taxonomic Treatment Formica obsoleta Linnaeus, 1758: 580 name description distribution Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Linnaeus has to be credited for Latin Binomen AND Treatment.
5
Plazi TreatmentBank: Million Treatment Goal
For 1M Treatments, at minimum: Identify with HTTP URIs in form of Metadata Taxon Concept Publication Information Representations HTML RDF [XML] Exports Biodiversity Literature Repository EOL GBIF WikiData Notes: Add in Plazi and the idea of the treatment server
6
Plazi: Output Bibliographic references: > 800,000
Publications (year of publishing): 17,799 (2016: 3,500) Taxonomic treatments: 165,692 (50,000) Observation Records: 53099 Observation Records geo-referenced: 23068 Taxonomic names: 152,641 (45,000) Bibliographic references: > 800,000 RDF triples: > 100 Million (1M) Scientific illustrations: >140,000 (being uploaded to BLR/Zenodo) Notes: Add in Plazi and the idea of the treatment server
7
TreatmentBank
8
Plazi: TreatmentBank: Treatment stubs: adding content
Notes: Add in Plazi and the idea of the treatment server
9
Sources Legacy publications Print Digitization Text Capture XML
Digitized Text Capture XML Born Digital Text Extraction XML Prospective publishing Pensoft Journals TaxPub XML RDF Notes: Add in Plazi and the idea of the treatment server
10
Plazi conversion workflow
TreatmentBank find scan text extraction markup store Notes: Add in Plazi and the idea of the treatment server
11
Daily Automated Processing of New Taxa
Notes: I am afraid, I am going to loose some of you at some point, but I will try to get you all together at the end of the talk
12
TreatmentBank: HTML Representation
Notes: Add in Plazi and the idea of the treatment server
13
Treatment Text: XML Representation
14
XML Representation: Text Markup and Enhancement
Treatments Treatment Sections Features of interest Taxon Names Treatment Citations Material Citations (e.g., specimens) Bibliographic References (w/ citation) Figures (w/citation) Tables (w/ ciation) Notes: Add in Plazi and the idea of the treatment server
15
Nomenclature Section and Taxon Name
Notes: Add in Plazi and the idea of the treatment server
16
Treatment Citation Notes:
Add in Plazi and the idea of the treatment server
17
Material Citation Notes:
Add in Plazi and the idea of the treatment server
18
TreatmentBank: online editing: material citation
Notes: Add in Plazi and the idea of the treatment server
19
Semantic XML Publishing: TaxPub
Notes: Add in Plazi and the idea of the treatment server
20
Treatment Data: RDF Representation
21
Treatment Data Published in Publication
Defines Taxon Concept [1 and only 1] Cites Treatments/Taxon Concepts Cites Material (Specimens) hasInformation Information Item [Text] Content Data Notes: Add in Plazi and the idea of the treatment server
22
Treatment Data: Vocabularies, Ontologies, and Identifiers
Treatment Ontology OBKMS Ontology (Viktor Senderov/Pensoft) Plazi TreatmentBank HTTP URIs Publication: Dublin Core, SPAR (FABIO, PRO, FRBR) DOI (CrossRef, Zenodo/DataCite) ORCID, ISSN Taxon Concept DarwinCore, DarwinCoreSW ZooBank HTTP URIs ORCID, ResearcherID, Collection Codes, Repository IDs Citations: CiTO Information Item: SPM (+ EOL SPM extensions) Data: Trait Ontologies; SDD, etc… Notes: Add in Plazi and the idea of the treatment server
23
Treatment Data: Taxon Concept
Notes: Add in Plazi and the idea of the treatment server
24
Treatment Data: Publication Information
Notes: Add in Plazi and the idea of the treatment server
25
Biodiversity Literature Repository: DOIs for Legacy Literature
Access, archive, DOI Who are we?
26
Treatment Data: Treatment Citations
Notes: Add in Plazi and the idea of the treatment server
27
TreatmentBank: Taxonomic data: linking treatments
28
Treatment Data: Material Citations
Notes: Add in Plazi and the idea of the treatment server
29
Treatment Data: Specimen Data Analysis and Outputs
Notes: Add in Plazi and the idea of the treatment server
30
Treatment Data: Other Information Content
Notes: Add in Plazi and the idea of the treatment server
31
Thank you! Terry Catapano catapano@plazi.org Notes:
Add in Plazi and the idea of the treatment server
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.