Presentation is loading. Please wait.

Presentation is loading. Please wait.

TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with.

Similar presentations


Presentation on theme: "TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with."— Presentation transcript:

1 TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with Ohio State University, and U-Mass.

2 TaxonX 1. Motivation & Rationale TaxonX is a W3C XML schema for encoding legacy taxonomic literature in order to: create open, stable, persistent, full text digital surrogates of taxonomic treatments identify taxonomic treatments and their major structural components to enable networked reference and citation identify lower level textual data such scientific names, localities, morphological characters, and bibliographic citations to facilitate their extraction by, and integration with external applications and resources Study and describe the structure of systematics publications by creating few typical corpora of literature, such as entire journal (eg AMNH Novitates), across taxa (e.g all ant systematics papers post 1995), or faunistic (e.g. all ant systematics paper covering Madagascar ranging from 1758 to 2006) (see eg http://antbase.org/databases/xml_publications.htm with links to other relevant sides of the project).

3 TaxonX 2. Publishing: Software Implementations Currently: A stand-alone schema; from a development point of view, allows flexibility in modelling the breadth of taxonomic publications and their structures (see sourceforge.net/projects/taxonx) Next steps: Create taxonX based modules/extensions in other schemata: XHTML, NCBI/NLM Journal Archiving DTD, TEI, Publishers‘ schemata

4 TaxonX 3. Publishing: Deployments Taxon specific sites (eg ants) Serials (eg American Museum Novitates) Biodiversity Heritage Library?

5 TaxonX 4. Consuming: Software Implementations Sites which would take advantage of being able to access publications more specifically (e.g., to pages; sections of treatments, etc…), and within the original context. Mashups: ie ispecies Taxon specific sites: eg antweb, Hymenoptera Name Server (HNS) / antbase.org

6 TaxonX 5. Consuming: Deployments In prospective publication: Negotiations with publishers to insert at least taxononomy specific schema, if possible modular elements.

7 TaxonX 6. Market Size: Potential Publishers Anybody interested to open heritage library (retrospective), and prospectively publishers to define elements in their publications, allowing not only better searches, but also to define which parts of the publications will be open access, and which ones not. Eg Zoobank is negotiating with publishers to get a restricted open access to treatments (= the part being descriptions within a systematics paper).

8 TaxonX 7. Market Size: Potential Customers Based on the experience of ants, ca 12,000 species and 4,000 publications with approx 20 page each = 100,000,000 pages of legacy publications with taxonomic descriptions, a tremendous amount of information. If Biodiversity Heritage Library will start, at least part of this published record will become available and be tremendously more useful, if at least treatment boundaries and respective names are marked up. Ultimately, one could envision this to be an intermediary step to extract and store the treatments in more powerful structures, such as databases. All the treatments are primarily linked to genetic, distributional or nomenclatorial and other data via the taxonomic name of which to which the treatment refers. At antbase/HNS this link is in a simple form already implemented by a link from each citation to the respective pdf copy of the referring page. Future agregators of treatments might be institutions like Zoobank, but essentially dedicated databases allowing specific applications, like ispecies to collect the treatments and use them for specific purposes.

9 TaxonX 8. Success Factors TaxonX is a lightweight and flexible schema which should be quickly learned and may be applied to the wide variety of formatting present in legacy documents Allows, sometimes relies on (see use of MODS for file-level bibliographical metadata), use of external schemata Loose content requirements allows for instances to be encoded over time and at many levels of granularity, while maintaining validity through iterations. Contains mechanisms for semantic normalization of the data contained in treatments. See taxonX's use of Darwin Core (soon perhaps LinneanCore, SDD, etc…) to normalize phrase level data, and xid element for inclusion of LSID's, ITIS, HNS, or other external identifiers. Contrast to TaXMLit: Heavyweight schema: c. 485 elements (taxonX: 30) Stricter content model requirements might encounter difficulties when applied to other literature beyond Biologia Centrali Americana heavy burden placed on input/content creation; does not lend itself to an iterative/"layered" markup approach defines own elements for semantic normalization rather than providing mechanisms for use of other schemas, or references to external resources

10 TaxonX 8. Success Factors ctd. Big enough corpus of accessible marked up publications Specific applications making use of extracting and querying the content of treatments: e.g. “what red and in Madagascar above 1000 meter in plant y?”, that is much more refined questions, which returns a list of taxa (and links to its sources) and not only a document or part of, as is possible in amazon.com. A simpler question could be give me a list of taxa in Y., whereby dedicated name servers would enhance

11 TaxonX 9. Hurdles to Adoption The heterogeneity and structural looseness of the data contained in legacy taxonomic treatments nevertheless defies encoding and semantic normalization by even a lightweight and flexible schema. The flexibility of the schema may present challenge both in authoring and in profiling the encoded data for use by external applications. Dependence on external schemata requires vigilance and active maintenance of the schema; may complicate validation of instances over long-term; namespace wrangling makes authoring difficult

12 TaxonX 10 Big Picture

13 Motivation & Rationale Very brief introduction to motivations i.e. what it was intended to do and why it takes the form it does. Publishing: Software Implementations. The software available (or planned) to publish data in this format. Publishing: Deployments. Who is using (or about to use) these implementations to publish data. What is the demographic? Consuming: Software Implementations. The software available (or planned) to consume data in this format. Consuming: Deployments. Who is using (or about to use) these implementations to consume data. What is the demographic? Market Size: Potential Publishers Who could be producing data like this. Market Size: Potential Customers Who could be consuming data like this. Success Factors: Significant factors for successful adoption. Why has it been successful? What do you think will make it successful? From an adopters point of view. Hurdles to Adoption Significant hurdles to adoption. What have been the major hurdles to adoption? Or what are perceived as the major hurdles? Big Picture Where does the technology fit in the model discussed in the morning session (this obviously can't be prepared ahead of time so a blank slide is fine). Points raised in discussion on this will form the detailed agenda for day 2.


Download ppt "TaxonX : A mark-up schema and approach for systematics literature American Museum of Natural History and University of Karlsruhe in collaboration with."

Similar presentations


Ads by Google