Download presentation
Presentation is loading. Please wait.
1
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM Journal Archiving Vocabulary
2
NATIONAL LIBRARY OF MEDICINE Why XML? Preserves structure of an article Lends itself to intelligent processing Human readable – not dependent on technology Portable
3
NATIONAL LIBRARY OF MEDICINE PMC Workflow
4
NATIONAL LIBRARY OF MEDICINE PubMed Central DTD History pmc-1.dtd DTD currently in production Derived from keton.dtd and BMC article.dtd. Designed to be a simple DTD for online display and archive. Written with samples from PNAS, MBC, and BMC. Why a new DTD? Elements/attributes had to be added to accommodate new journals. DTD would become cumbersome quickly if we had to keep making changes for each new title. Original “simplicity” of design would lead to confusing data structures as the dtd expanded. Moved away from standard XML practices to accommodate source SGML. Needed an independent review.
5
NATIONAL LIBRARY OF MEDICINE The Results pmc-2.dtd Mulberry’s Suggestions Create two DTDs: one for archiving to allow us to convert data from multiple sources to our DTD. a subset for authoring to allow us to retain some control when publishers create articles to the DTD. Use proven solutions like XLINK and the XHTML table standard. Use data models to simplify the DTD.
6
NATIONAL LIBRARY OF MEDICINE Harvard E-Journal Archiving Project The Melon Foundation funded the Harvard Library to study the feasibility of using one DTD for archiving journal articles. Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. Conclusion – yes, it is feasible, but the right DTD does not exist. A meeting was held in April 2002 to discuss the changes needed to the PMC2 DTD to expand its range to include most any journal. Attendees included PMC, Mulberry Technologies, Inc. (consultant to PMC), The Mellon Foundation, The Harvard Library, and Inera (consultant to Harvard- Mellon).
7
NATIONAL LIBRARY OF MEDICINE Conclusions 1.PMC and Harvard-Mellon had different ideas about what the DTD should do. Harvard was interested in an Interchange DTD, which would allow publishers to submit in multiple formats, which would all be valid. PMC was interested in an Archive DTD, which would be open enough to allow conversion of multiple sources into one single format. 2. If the PMC2 DTD was modularized, and some pieces were added (like the OASIS table model), many DTDs could be built using the same elements, giving both flexibility and consistency.
8
NATIONAL LIBRARY OF MEDICINE Status The “NLM Archiving and Interchange DTD Suite” has been created and released. Mulberry and Inera analyzed hundreds of journals across subjects to insure that the DTD Suite was powerful enough to tag them. The “NLM Journal Archiving DTD” and the “Journal Publishing DTD” have been created from the DTD Suite. The Archiving DTD and the Suite were circulated through the Mulberry’s and Inera’s contacts in the electronic publishing world for comments and suggestions. Suggestions that made the DTD more useable were incorporated.
9
NATIONAL LIBRARY OF MEDICINE Journal Archiving and Interchange DTD Purpose is to preserve journal’s intellectual content Written for ease of conversion (from other DTDs) completeness (union of current journal DTDs) Characteristics descriptive (tag what is there) inclusive (preserve as much tagging as possible) non-enforcing (there is no right way) almost nothing required very little required sequence (metadata in order, little else) many large OR groups (do anything here)
10
NATIONAL LIBRARY OF MEDICINE Journal Publishing DTD Purpose is to provide guidance in creating new journal material Written for authoring article content initial tagging of non-XML content creating consistent structures Differences from the archiving smaller (not as many elements) prescriptive (not as many choices) enforcing (there is one way to do many things) more required elements
11
NATIONAL LIBRARY OF MEDICINE 2.0 – What is Archiving? Archiving the submitted file or Archiving the article content?, - the bone of contention in lists, author lists, etc. In 2.0, Archiving and Publishing DTDs will allow for both types of archiving. Creating a third “Authoring DTD”
12
NATIONAL LIBRARY OF MEDICINE Who Owns the Tagset? The DTDs? Not “Open Source” DTDs and Tagset are in the public domain NLM retains control over changes and additions to the Tagset and DTDs But: Anyone may create a new DTD from or use them without permission from NLM
13
NATIONAL LIBRARY OF MEDICINE What’s Next?: Other DTDs Because the DTD is built as a set of DTD modules, other document types can be created (relatively) easily using the same content models. We are building a Books DTD and planning an Online Documentation DTD.
14
NATIONAL LIBRARY OF MEDICINE Links PubMed Central – http://www.pubmedcentral.gov NLM DTDs and documentation http://dtd.nlm.nih.gov jeffbeck@nih.gov
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.