Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Congress of Entomology, Orlando

Similar presentations


Presentation on theme: "International Congress of Entomology, Orlando"— Presentation transcript:

1 International Congress of Entomology, Orlando
Notes: Add in Plazi and the idea of the treatment server Terry Catapano September 26, 2016 International Congress of Entomology, Orlando

2 Extracting Linked Open Data from Taxonomic Publications
Notes: Add in Plazi and the idea of the treatment server Plazi Extracting Linked Open Data from Taxonomic Publications Terry Catapano Plazi, New York (

3 5,000 journals with taxonomic content 1,900,000 species described
Who are we? 3 Plazi 500,000,000+ printed pages 5,000 journals with taxonomic content 1,900,000 species described 20,000,000+ species treatments BUT: The facts are hidden Incomplete digitization Publications are not semantically enhanced Data are not linked Most data are not open Plazi solution: Linking through taxonomic treatments

4 Taxonomic Treatment Plazi Formica obsoleta Linnaeus, 1758: 580
Who are we? 4 Plazi Taxonomic Treatment Formica obsoleta Linnaeus, 1758: 580 name description distribution Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Linnaeus has to be credited for Latin Binomen AND Treatment.

5 Plazi TreatmentBank: Million Treatment Goal
For 1M Treatments, at minimum: Identify with HTTP URIs in form of Metadata Taxon Concept Publication Information Representations HTML RDF [XML] Exports Biodiversity Literature Repository EOL GBIF WikiData Notes: Add in Plazi and the idea of the treatment server

6 Plazi: Output Bibliographic references: > 800,000
Publications (year of publishing): 17,799 (2016: 3,500) Taxonomic treatments: 165,692 (50,000) Observation Records: 53099 Observation Records geo-referenced: 23068 Taxonomic names: 152,641 (45,000) Bibliographic references: > 800,000 RDF triples: > 100 Million (1M) Scientific illustrations: >140,000 (being uploaded to BLR/Zenodo) Notes: Add in Plazi and the idea of the treatment server

7 TreatmentBank

8 Plazi: TreatmentBank: Treatment stubs: adding content
Notes: Add in Plazi and the idea of the treatment server

9 Sources Legacy publications Print  Digitization  Text Capture  XML
Digitized  Text Capture  XML Born Digital  Text Extraction  XML Prospective publishing Pensoft Journals TaxPub XML  RDF Notes: Add in Plazi and the idea of the treatment server

10 Plazi conversion workflow
TreatmentBank find scan text extraction markup store Notes: Add in Plazi and the idea of the treatment server

11 Daily Automated Processing of New Taxa
Notes: I am afraid, I am going to loose some of you at some point, but I will try to get you all together at the end of the talk

12 TreatmentBank: HTML Representation
Notes: Add in Plazi and the idea of the treatment server

13 Treatment Text: XML Representation

14 XML Representation: Text Markup and Enhancement
Treatments Treatment Sections Features of interest Taxon Names Treatment Citations Material Citations (e.g., specimens) Bibliographic References (w/ citation) Figures (w/citation) Tables (w/ ciation) Notes: Add in Plazi and the idea of the treatment server

15 Nomenclature Section and Taxon Name
Notes: Add in Plazi and the idea of the treatment server

16 Treatment Citation Notes:
Add in Plazi and the idea of the treatment server

17 Material Citation Notes:
Add in Plazi and the idea of the treatment server

18 TreatmentBank: online editing: material citation
Notes: Add in Plazi and the idea of the treatment server

19 Semantic XML Publishing: TaxPub
Notes: Add in Plazi and the idea of the treatment server

20 Treatment Data: RDF Representation

21 Treatment Data Published in  Publication
Defines  Taxon Concept [1 and only 1] Cites  Treatments/Taxon Concepts Cites  Material (Specimens) hasInformation  Information Item [Text] Content Data Notes: Add in Plazi and the idea of the treatment server

22 Treatment Data: Vocabularies, Ontologies, and Identifiers
Treatment Ontology OBKMS Ontology (Viktor Senderov/Pensoft) Plazi TreatmentBank HTTP URIs Publication: Dublin Core, SPAR (FABIO, PRO, FRBR) DOI (CrossRef, Zenodo/DataCite) ORCID, ISSN Taxon Concept DarwinCore, DarwinCoreSW ZooBank HTTP URIs ORCID, ResearcherID, Collection Codes, Repository IDs Citations: CiTO Information Item: SPM (+ EOL SPM extensions) Data: Trait Ontologies; SDD, etc… Notes: Add in Plazi and the idea of the treatment server

23 Treatment Data: Taxon Concept
Notes: Add in Plazi and the idea of the treatment server

24 Treatment Data: Publication Information
Notes: Add in Plazi and the idea of the treatment server

25 Biodiversity Literature Repository: DOIs for Legacy Literature
Access, archive, DOI Who are we?

26 Treatment Data: Treatment Citations
Notes: Add in Plazi and the idea of the treatment server

27 TreatmentBank: Taxonomic data: linking treatments

28 Treatment Data: Material Citations
Notes: Add in Plazi and the idea of the treatment server

29 Treatment Data: Specimen Data Analysis and Outputs
Notes: Add in Plazi and the idea of the treatment server

30 Treatment Data: Other Information Content
Notes: Add in Plazi and the idea of the treatment server

31 Thank you! Terry Catapano catapano@plazi.org Notes:
Add in Plazi and the idea of the treatment server


Download ppt "International Congress of Entomology, Orlando"

Similar presentations


Ads by Google