TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with.

Slides:



Advertisements
Similar presentations
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Advertisements

CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Taxonomic Literature Standards and Synergies TDWG 2006 Anna L. Weitzman & Christopher H. C. Lyal.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Information Retrieval in Practice
Link yourself or perish? PhytoKeys, the next generation journal in systematic botany Lyubomir Penev 1, W. John Kress 2, Sandra Knapp 3, De-Zhu Li 4, Susanne.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
Use of METS in CDL Digital Special Collections Brian Tingle.
Digital Encoding What’s behind E-text Resources?.
Session 7 Selection of Online Resources and Options for Providing Access.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
Digital Library Architecture and Technology
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Chinese-European Workshop on Digital Preservation Beijing (China), July.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Mark Sullivan University of Florida Libraries Digital Library of the Caribbean.
Practical RDF Chapter 1. RDF: An Introduction
Metadata Standards and Applications 1. Introduction to Digital Libraries and Metadata.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Archival information system ARHiNET Croatian national archival information system Vlatka Lemić Croatian State Archives, Croatia.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
ABCD & BioCASe A Quick Introduction. Motivation & Rationale – ABCD I “Access to Biological Collection Data”  v2.06 ratified by TDWG, v1.20 still in use.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Automated (meta)data collection – problems and solutions Grete Christina Lingjærde and Andora Sjøgren USIT, University of Oslo.
The Future of Informatics in Digital Literature – or Literature and it’s (Digital) Future Donat Agosti and Terrance Catapano Plazi TDWG, Woods Hole, September.
Introduction to metadata
The PLAZI Markup System Donat Agosti Terry Catapano Robert “Bob“ Morris Guido Sautter Universität Karlsruhe (TH) Research University – founded 1825.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Resource Description and Access (RDA) information session Deirdre Kiorgaard Australian Committee on Cataloguing Representative to the Joint Steering Committee.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Standards for digital encoding Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 2: TEI.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Literature & interoperability: a working example using ants Donat Agosti, Terry Catapano, Guido Sautter, Christiana Klingenberg & Christie Stephenson TDWG.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
INTRODUCTION TO DOCUMENT AUTHORING AND ELECTRONIC PUBLISHING.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Advanced Higher Computing Science
Information Retrieval in Practice
Data sharing and exchange: Experiences within the
International Congress of Entomology, Orlando
Markup of Educational Content
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
An Architecture for Complex Objects and their Relationships
Lesson 14 Sharing Documents
PREMIS Tools and Services
Accommodating local cataloguing traditions in a global context
Some Options for Non-MARC Descriptive Metadata
Presentation transcript:

TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with Ohio State University, and U-Mass.

TaxonX 1. Motivation & Rationale TaxonX is a W3C XML schema for encoding legacy taxonomic literature in order to: create open, stable, persistent, full text digital surrogates of taxonomic treatments identify taxonomic treatments and their major structural components to enable networked reference and citation identify lower level textual data such scientific names, localities, morphological characters, and bibliographic citations to facilitate their extraction by, and integration with external applications and resources Study and describe the structure of systematics publications by creating few typical corpora of literature, such as entire journal (eg AMNH Novitates), across taxa (e.g all ant systematics papers post 1995), or faunistic (e.g. all ant systematics paper covering Madagascar ranging from 1758 to 2006) (see eg with links to other relevant sides of the project).

TaxonX 2. Publishing: Software Implementations Currently: A stand-alone schema; from a development point of view, allows flexibility in modelling the breadth of taxonomic publications and their structures (see sourceforge.net/projects/taxonx) Next steps: Create taxonX based modules/extensions in other schemata: XHTML, NCBI/NLM Journal Archiving DTD, TEI, Publishers‘ schemata

TaxonX 3. Publishing: Deployments Taxon specific sites (eg ants) Serials (eg American Museum Novitates) Biodiversity Heritage Library?

TaxonX 4. Consuming: Software Implementations Sites which would take advantage of being able to access publications more specifically (e.g., to pages; sections of treatments, etc…), and within the original context. Mashups: ie ispecies Taxon specific sites: eg antweb, Hymenoptera Name Server (HNS) / antbase.org

TaxonX 5. Consuming: Deployments In prospective publication: Negotiations with publishers to insert at least taxononomy specific schema, if possible modular elements.

TaxonX 6. Market Size: Potential Publishers Anybody interested to open heritage library (retrospective), and prospectively publishers to define elements in their publications, allowing not only better searches, but also to define which parts of the publications will be open access, and which ones not. Eg Zoobank is negotiating with publishers to get a restricted open access to treatments (= the part being descriptions within a systematics paper).

TaxonX 7. Market Size: Potential Customers Based on the experience of ants, ca 12,000 species and 4,000 publications with approx 20 page each = 100,000,000 pages of legacy publications with taxonomic descriptions, a tremendous amount of information. If Biodiversity Heritage Library will start, at least part of this published record will become available and be tremendously more useful, if at least treatment boundaries and respective names are marked up. Ultimately, one could envision this to be an intermediary step to extract and store the treatments in more powerful structures, such as databases. All the treatments are primarily linked to genetic, distributional or nomenclatorial and other data via the taxonomic name of which to which the treatment refers. At antbase/HNS this link is in a simple form already implemented by a link from each citation to the respective pdf copy of the referring page. Future agregators of treatments might be institutions like Zoobank, but essentially dedicated databases allowing specific applications, like ispecies to collect the treatments and use them for specific purposes.

TaxonX 8. Success Factors TaxonX is a lightweight and flexible schema which should be quickly learned and may be applied to the wide variety of formatting present in legacy documents Allows, sometimes relies on (see use of MODS for file-level bibliographical metadata), use of external schemata Loose content requirements allows for instances to be encoded over time and at many levels of granularity, while maintaining validity through iterations. Contains mechanisms for semantic normalization of the data contained in treatments. See taxonX's use of Darwin Core (soon perhaps LinneanCore, SDD, etc…) to normalize phrase level data, and xid element for inclusion of LSID's, ITIS, HNS, or other external identifiers. Contrast to TaXMLit: Heavyweight schema: c. 485 elements (taxonX: 30) Stricter content model requirements might encounter difficulties when applied to other literature beyond Biologia Centrali Americana heavy burden placed on input/content creation; does not lend itself to an iterative/"layered" markup approach defines own elements for semantic normalization rather than providing mechanisms for use of other schemas, or references to external resources

TaxonX 8. Success Factors ctd. Big enough corpus of accessible marked up publications Specific applications making use of extracting and querying the content of treatments: e.g. “what red and in Madagascar above 1000 meter in plant y?”, that is much more refined questions, which returns a list of taxa (and links to its sources) and not only a document or part of, as is possible in amazon.com. A simpler question could be give me a list of taxa in Y., whereby dedicated name servers would enhance

TaxonX 9. Hurdles to Adoption The heterogeneity and structural looseness of the data contained in legacy taxonomic treatments nevertheless defies encoding and semantic normalization by even a lightweight and flexible schema. The flexibility of the schema may present challenge both in authoring and in profiling the encoded data for use by external applications. Dependence on external schemata requires vigilance and active maintenance of the schema; may complicate validation of instances over long-term; namespace wrangling makes authoring difficult

TaxonX 10 Big Picture

Motivation & Rationale Very brief introduction to motivations i.e. what it was intended to do and why it takes the form it does. Publishing: Software Implementations. The software available (or planned) to publish data in this format. Publishing: Deployments. Who is using (or about to use) these implementations to publish data. What is the demographic? Consuming: Software Implementations. The software available (or planned) to consume data in this format. Consuming: Deployments. Who is using (or about to use) these implementations to consume data. What is the demographic? Market Size: Potential Publishers Who could be producing data like this. Market Size: Potential Customers Who could be consuming data like this. Success Factors: Significant factors for successful adoption. Why has it been successful? What do you think will make it successful? From an adopters point of view. Hurdles to Adoption Significant hurdles to adoption. What have been the major hurdles to adoption? Or what are perceived as the major hurdles? Big Picture Where does the technology fit in the model discussed in the morning session (this obviously can't be prepared ahead of time so a blank slide is fine). Points raised in discussion on this will form the detailed agenda for day 2.