Download presentation
Presentation is loading. Please wait.
Published byAubrey Watkins Modified over 9 years ago
1
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008
2
NSF/DFG Grant (AMNH/University of Karlsruhe) XML Markup of taxonomic publications for extraction of: Treatments Scientific Names Morphological Characters Distribution Data Collection locales/events For: Open Access Submission to db's Retrieval Ontology development
3
Markup Languages Provides grammar to define document types Delineate & identify document elements (atoms) in text Syntax: Structural relationships between elements (parent/child, cardinality, ordinality, id/idref, key/keyref) Beyond the PDF
5
TaxonX schema Golden Gate Editor 250 Docs/7500 Treatments DSpace-based Digital Object Repository (handles) SRS TAPIR (specimen data) Species Profile Model/RDF (descriptive data)
6
Wildly heterogeneous Requires lax structuring of documents Need for regularization Requires editorial policy (reproduction: text of work or text of document) Defers much work of interoperability Benefits Treatments +names, subsections, localities, bibliographic references Extraction & representation in other services Costs GoldenGate configured for testbed: 3 minutes per page $5 page(?)
7
New Literature Different markup activity Different markup activity Prospective not Retrospective More optimal cost/benefit ratio? Strict modeling for consistent documents/data Increased regularization Increased sharing, re-use Decreased costs (potentially): Application QC Adoption
8
TDWG Vocabularies supply many concepts NLM Journal Archiving and Interchange Tag Suite DTD's for markup of journal articles Archiving, Publishing, Authoring, other modules possible Wide adoption by publishers and aggregators; LOC Actively maintained Module for taxonomic treatments in Publishing
9
Inherit generic features from existing Tag Set Bibliographic references Tables Linking supporting material/data (xlink) Linking to graphic and media objects (xlink) Treatments Treatment sections Scientific names, Geographic names, Characters/States Specimens and other materials citations
10
Plazi: NLM conversion of Zootaxa and PLOS One articles Apply markup at earliest stage possible Develop tools to assist (probably easier than for “pure” legacy literature) Extend codes and structures to handle electronic publication Shifts “illustrated narrative” complex digital objects METS, OAI-ORE, MPEG-21/DIDL
11
Text Materials Description Treatment Image Data Nomenclature
12
Linked Data Machines > Documents > Data Open documents, free data Reduced costs of use/re-use (e.g., SPM for EOL) Broaden scope of application Accelerate velocity of information exchange
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.