Presentation is loading. Please wait.

Presentation is loading. Please wait.

Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG.

Similar presentations


Presentation on theme: "Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG."— Presentation transcript:

1 Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG 2007, Bratislava

2 Participating organization Main support by US-NSF, German DFG

3 Biodiversity monitoring, or what‘s out there? Measuring and monitoring biodiversity means standard repetitive samples: Access to taxonomic data is the main impediment to run succesful surveys and to integrate survey into mainstream conservation, potentially one of the biggest user of taxonomic data The question is: How can we provide the fastest way this content? What is doable, and what not?

4 Literature & interoperability http://www.blsalptransit.ch/en/frameset_e.htm A report from a break through in a long tunnel....

5 Literature & interoperability A report from a break through in a long tunnel.... For the first time, the entire production chain of ocr-ing, marking up, adding all the guids to produce a valid taxonx document is in place We can provide a stable of encoded data/metadata which other applications can utilize (e.g. semant/iSpecies)

6

7 Literature & interoperability Plazi.org Sandbox and data provider The principle: community involvement Develop tools and solutions to access literature, both retrospective and prospective literature Make content available through exporting data into dedicated databases Provide an example of an input facility for Zoobank Get around copyright by focusing on content by marking up documents Explore digital taxonomic literature „Arxiv“ Drupal based with underlying DSpace repository and handle server

8 Literature & interoperability Plazi workflow

9 Literature & interoperability Plazi products OCR-ed texts (dirty, clean) ABBYY training files for fonts ABBYY training files for journals ABBYY custom dictionary

10 Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names -Add new names Terminology follows ITIS; currently upload into Hymenoptera Name Server; query via html.

11 Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names; ZooBank? -Add new names - Get bibliographic Metadata from HNS (MODS) - Get bibliographic Guids from bioguid - Get geographic long/lat from geonames.org

12 Literature & interoperability Products (1): documents pdf, xslt-html, xml Get one with pdf, xml Pdf (original or scanned) Html via XSLT XML Taxonx All documents with Guids: minimally Names, mods; max. bib.refs, specimen, localities

13 Literature & interoperability Plazi workflow

14 Literature & interoperability Products (2): Search and Retrieval Server

15 Literature & interoperability Search and Retrieval Server: Output

16 Literature & interoperability Search and Retrieval Server: Output

17 Literature & interoperability Search and Retrieval Server: Output

18 Literature & interoperability Search and Retrieval Server: Output

19

20 Products: What content do we have in store? Goldstandard: 120+ taxonomic publications from Madagascar, ranging from 1758-2007 (70% completed) (vertical) Recent publications continually added (horizontal standard) Series of publications describing elements of Taxonx, GoldenGATE, name finding algorithms (FindIT, FAT), compare approaches Increasing library of training files for ABBYY and analyzers for GoldenGATE Literature & interoperability

21 Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Literature & interoperability

22 Time per minute to produce clean OCR using ABBYY; publications in chronological order Producing metrics to measure effort and compare various approaches and alogrithm

23 Literature & interoperability Time used to mark up documents in Taxonx in comparison to the number of pages per volume. Chronologica order Producing metrics to measure effort and compare various approaches and alogrithm to mark up documents

24 Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Experience: mark up is expensive.... Literature & interoperability

25 pdfprint Print + catalogue Value for scientist imageocr clean pdf/ocrstruct. xml semant xml semant xml high ocr dirty s-xml linked data- base costs Literature & interoperability ? How to best invest into the digitization of legacy publication? Names Marked- up treatments marked-up Finer grained mark up

26 ms submission („Taxon-x-version“) new ms alertPosting for review Edited ms Revised ms Publication: pdf Publication: hard copy Publication database („taxon-x-version“) ontology bibliography analysis & ms preparation ZooBank / NS Character DB Specimen DB Description DB Distribution DB Char. Matrix DB Phyl. Tree DB Char-state Im. Specimen Im. Habitat Image Leg. Publicat. Taxon DB New Data feedback Accepted ms New taxon alert ….. to the Future of Publication: publication as a version control instrument


Download ppt "Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG."

Similar presentations


Ads by Google