Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

EPrints 3 Technical Overview EPrints 3 Briefing 8 th December 2006, London.
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
How to publish genomic Data papers based on BOL data - Biodiversity Data Journal Lyubomir Penev Bulgarian Academy of Sciences & Pensoft Publishers ViBRANT.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Making small data big! The Biodiversity Data Journal (BDJ) Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts, Vincent Smith ViBRANT.
Taxonomic Literature Standards and Synergies TDWG 2006 Anna L. Weitzman & Christopher H. C. Lyal.
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Information Retrieval in Practice
The XML mark up process from the viewpoint of a biodiversity publisher Lyubomir Penev, Donat Agosti, Teodor Georgiev, Terry Catapano, Vladimir Blagoderov,
Technical Tips and Tricks for User Support Mike Gardner
1 Adaptive Management Portal April
Publishing Workflow for InDesign Import/Export of XML
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
Streamlining the registration- to-publication pipeline Lyubomir Penev, Teodor Georgiev, Pavel Stoev Sherborn Meeting, NHM London, 28 Oct 2011 ViBRANT.
Link yourself or perish? PhytoKeys, the next generation journal in systematic botany Lyubomir Penev 1, W. John Kress 2, Sandra Knapp 3, De-Zhu Li 4, Susanne.
JSTOR & OCR - A Case Study Kiffany Francis. What is JSTOR? “JSTOR is a not-for- profit organization with a dual mission to create and maintain a trusted.
Overview of Search Engines
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Cybertaxonomy and revisionary systematics Dmitry Dmitriev Illinois Natural History Survey, USA
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 14 Sharing Documents 1 Morrison / Wells / Ruffolo.
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional research repository for the University of Pretoria.
Overview of SQL Server Alka Arora.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Archivists’ Toolkit: Overview and Demonstration Arwen Hutt Metadata Specialist University of California, San Diego.
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Class Instructor Name Date. Classroom Tips Class Roster – Please Sign In Class Roster – Please Sign In Internet Usage Internet Usage –Breaks and Lunch.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Virtual Biodiversity ViBRANT Literature Mining and Mark-up ViBRANT’s text processing tools David Morse, The Open University, UK,
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Copyright, Biopiracy and the Taxonomic Impediment Donat Agosti Naturhistorisches Museum der Burgergemeinde Bern, Switzerland and American Museum of Natural.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
ETMS Documentation Roger Milego Stockholm, Project meeting 8-9 October 2013.
Resolving the publishing bottleneck and increasing data interoperability in biodiversity science Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts,
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with.
Jeremy Miller 1,2, Donat Agosti 2,3, Guido Sautter 2, Terry Catapano 2,4, David King 5, Serrano Pereira 1, Rutger Vos 1, Soraya Sierra 1 Unlocking the.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
The Future of Informatics in Digital Literature – or Literature and it’s (Digital) Future Donat Agosti and Terrance Catapano Plazi TDWG, Woods Hole, September.
The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Taxonomic Publications: Past und Future Donat Agosti (AMNH and NHMB) Andrew Polaszek (ICZN) Klemens Böhm und Guido Sautter (Uni Karlsruhe)
The way from pdf-documents to xml-files A brief overview through the OCR- process and the XML mark up Christiana Klingenberg & Donat Agosti.
Taxonomic Workflow in the EDIT Platform for Cybertaxonomy Andreas Kohlbecker, Pepe Ciardelli, Niels Hoffmann, Katja Luther, Andreas Müller Botanic Garden.
Metadata Training for SEFSC Science Staff Part Two.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
International Congress of Entomology, Orlando
Lesson 9 Sharing Documents
VI-SEEM Data Repository
Lesson 9 Sharing Documents
Introduction to DSpace
Lesson 14 Sharing Documents
Migrating to Unified Content
What is xMod? xMod is: a desktop application (at the moment!) which can transform a repository of XML into a completely finished website Paul Spence, Paul.
Lesson 14 Sharing Documents
Presentation transcript:

Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG 2007, Bratislava

Participating organization Main support by US-NSF, German DFG

Biodiversity monitoring, or what‘s out there? Measuring and monitoring biodiversity means standard repetitive samples: Access to taxonomic data is the main impediment to run succesful surveys and to integrate survey into mainstream conservation, potentially one of the biggest user of taxonomic data The question is: How can we provide the fastest way this content? What is doable, and what not?

Literature & interoperability A report from a break through in a long tunnel....

Literature & interoperability A report from a break through in a long tunnel.... For the first time, the entire production chain of ocr-ing, marking up, adding all the guids to produce a valid taxonx document is in place We can provide a stable of encoded data/metadata which other applications can utilize (e.g. semant/iSpecies)

Literature & interoperability Plazi.org Sandbox and data provider The principle: community involvement Develop tools and solutions to access literature, both retrospective and prospective literature Make content available through exporting data into dedicated databases Provide an example of an input facility for Zoobank Get around copyright by focusing on content by marking up documents Explore digital taxonomic literature „Arxiv“ Drupal based with underlying DSpace repository and handle server

Literature & interoperability Plazi workflow

Literature & interoperability Plazi products OCR-ed texts (dirty, clean) ABBYY training files for fonts ABBYY training files for journals ABBYY custom dictionary

Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names -Add new names Terminology follows ITIS; currently upload into Hymenoptera Name Server; query via html.

Literature & interoperability GoldenGATE interactions - Get Guid from Hymenoptera Name Server for names; ZooBank? -Add new names - Get bibliographic Metadata from HNS (MODS) - Get bibliographic Guids from bioguid - Get geographic long/lat from geonames.org

Literature & interoperability Products (1): documents pdf, xslt-html, xml Get one with pdf, xml Pdf (original or scanned) Html via XSLT XML Taxonx All documents with Guids: minimally Names, mods; max. bib.refs, specimen, localities

Literature & interoperability Plazi workflow

Literature & interoperability Products (2): Search and Retrieval Server

Literature & interoperability Search and Retrieval Server: Output

Literature & interoperability Search and Retrieval Server: Output

Literature & interoperability Search and Retrieval Server: Output

Literature & interoperability Search and Retrieval Server: Output

Products: What content do we have in store? Goldstandard: 120+ taxonomic publications from Madagascar, ranging from (70% completed) (vertical) Recent publications continually added (horizontal standard) Series of publications describing elements of Taxonx, GoldenGATE, name finding algorithms (FindIT, FAT), compare approaches Increasing library of training files for ABBYY and analyzers for GoldenGATE Literature & interoperability

Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Literature & interoperability

Time per minute to produce clean OCR using ABBYY; publications in chronological order Producing metrics to measure effort and compare various approaches and alogrithm

Literature & interoperability Time used to mark up documents in Taxonx in comparison to the number of pages per volume. Chronologica order Producing metrics to measure effort and compare various approaches and alogrithm to mark up documents

Additionall products Training course for literature mark up to get the community involved Creating a neotropical catalogue of the ants using mark-up approach Development of metrics to measure mark- up production to optimize output for users (ecologists, taxonomists, etc.) Experience: mark up is expensive.... Literature & interoperability

pdfprint Print + catalogue Value for scientist imageocr clean pdf/ocrstruct. xml semant xml semant xml high ocr dirty s-xml linked data- base costs Literature & interoperability ? How to best invest into the digitization of legacy publication? Names Marked- up treatments marked-up Finer grained mark up

ms submission („Taxon-x-version“) new ms alertPosting for review Edited ms Revised ms Publication: pdf Publication: hard copy Publication database („taxon-x-version“) ontology bibliography analysis & ms preparation ZooBank / NS Character DB Specimen DB Description DB Distribution DB Char. Matrix DB Phyl. Tree DB Char-state Im. Specimen Im. Habitat Image Leg. Publicat. Taxon DB New Data feedback Accepted ms New taxon alert ….. to the Future of Publication: publication as a version control instrument