Download presentation
Presentation is loading. Please wait.
Published bySamantha Hodges Modified over 9 years ago
1
Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG 2007, Bratislava
2
Participating organization Main support by US-NSF, German DFG
3
Biodiversity monitoring, or what‘s out there? Measuring and monitoring biodiversity means standard repetitive samples: Access to taxonomic data is the main impediment to run succesful surveys and to integrate survey into mainstream conservation, potentially one of the biggest user of taxonomic data The question is: How can we provide the fastest way this content? What is doable, and what not?
4
Literature & interoperability http://www.blsalptransit.ch/en/frameset_e.htm A report from a break through in a long tunnel....
5
Literature & interoperability A report from a break through in a long tunnel.... For the first time, the entire production chain of ocr-ing, marking up, adding all the guids to produce a valid taxonx document is in place We can provide a stable of encoded data/metadata which other applications can utilize (e.g. semant/iSpecies)
7
Literature & interoperability Plazi.org Sandbox and data provider The principle: community involvement Develop tools and solutions to access literature, both retrospective and prospective literature Make content available through exporting data into dedicated databases Provide an example of an input facility for Zoobank Get around copyright by focusing on content by marking up documents Explore digital taxonomic literature „Arxiv“ Drupal based with underlying DSpace repository and handle server
8
Literature & interoperability Plazi workflow
9
Literature & interoperability Plazi products OCR-ed texts (dirty, clean) ABBYY training files for fonts ABBYY training files for journals ABBYY custom dictionary
10
Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names -Add new names Terminology follows ITIS; currently upload into Hymenoptera Name Server; query via html.
11
Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names; ZooBank? -Add new names - Get bibliographic Metadata from HNS (MODS) - Get bibliographic Guids from bioguid - Get geographic long/lat from geonames.org
12
Literature & interoperability Products (1): documents pdf, xslt-html, xml Get one with pdf, xml Pdf (original or scanned) Html via XSLT XML Taxonx All documents with Guids: minimally Names, mods; max. bib.refs, specimen, localities
13
Literature & interoperability Plazi workflow
14
Literature & interoperability Products (2): Search and Retrieval Server
15
Literature & interoperability Search and Retrieval Server: Output
16
Literature & interoperability Search and Retrieval Server: Output
17
Literature & interoperability Search and Retrieval Server: Output
18
Literature & interoperability Search and Retrieval Server: Output
20
Products: What content do we have in store? Goldstandard: 120+ taxonomic publications from Madagascar, ranging from 1758-2007 (70% completed) (vertical) Recent publications continually added (horizontal standard) Series of publications describing elements of Taxonx, GoldenGATE, name finding algorithms (FindIT, FAT), compare approaches Increasing library of training files for ABBYY and analyzers for GoldenGATE Literature & interoperability
21
Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Literature & interoperability
22
Time per minute to produce clean OCR using ABBYY; publications in chronological order Producing metrics to measure effort and compare various approaches and alogrithm
23
Literature & interoperability Time used to mark up documents in Taxonx in comparison to the number of pages per volume. Chronologica order Producing metrics to measure effort and compare various approaches and alogrithm to mark up documents
24
Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Experience: mark up is expensive.... Literature & interoperability
25
pdfprint Print + catalogue Value for scientist imageocr clean pdf/ocrstruct. xml semant xml semant xml high ocr dirty s-xml linked data- base costs Literature & interoperability ? How to best invest into the digitization of legacy publication? Names Marked- up treatments marked-up Finer grained mark up
26
ms submission („Taxon-x-version“) new ms alertPosting for review Edited ms Revised ms Publication: pdf Publication: hard copy Publication database („taxon-x-version“) ontology bibliography analysis & ms preparation ZooBank / NS Character DB Specimen DB Description DB Distribution DB Char. Matrix DB Phyl. Tree DB Char-state Im. Specimen Im. Habitat Image Leg. Publicat. Taxon DB New Data feedback Accepted ms New taxon alert ….. to the Future of Publication: publication as a version control instrument
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.