A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004
What we’ll cover A functional overview of XML One use of XML at Cornell How our MARC to XML converter works What a Dublin Core XML record looks like to our users Concluding thoughts about XML opportunities
A functional overview of XML (many thanks to David Ruddy for the source content of this section)
XML = Extensible Markup Language A markup language gives meaning to special characters or character sequences, a.k.a. markup delimiters In XML, markup delimiters form rules for content designation (hold that thought!) In XML, markup delimiters have no inherent meaning (allowing them to serve as a flexible, extensible metalanguage) XML uses plain text, is non-proprietary, and is platform and software independent
HTML versus XML HTML Procedural markup Rules govern display (fonts, layout) Doesn’t understand content XML Structural markup Rules establish relationships among content components Doesn’t control display
A brief detour into metadata: Two ways to designate content In MARC: $a The Big heat In XML: Big heat value
In XML the name-value pair comprises an element An element has these parts: Start tag Element content End tag content Goldfinches
Element rules and features Elements can hold data Boston Elements can hold other elements ad infinitum A letter to Orestes A. Brownson Hildreth, Richard, Elements must be “properly” nested
A quick look at other XML entities Attributes qualify elements Caption title. Document Type Definitions (DTDs) control the structure of XML documents XML Schemas give more control than DTDs Extensible Stylesheet Language Transformation (XSLT) stylesheets transform one XML document into another (or into HTML)
What does XML allow us to do? Structure data with a flexible and extensible set of rules Share data in a non-proprietary format, especially among “incompatible” systems Reuse data, e.g., in different presentation formats for different purposes
One use of XML at Cornell
A local reason for moving MARC data to XML CUL decided to use ENCompass for access to networked resources ENCompass requires XML records Our records for e-resources are in MARC, so we needed to get them into XML
Using MARCXML MARCXML is lossless—it preserves the richness of the MARC record in XML LC offers a toolkit for converting MARC to MARCXML at MARCXML can serve as a “bus” between MARC and other XML formats
The MARCXML “bus”
Adapting MARCXML tools We implemented LC’s converter to convert MARC to Dublin Core in XML We created a Web interface for system- wide access We extended LC’s Dublin Core XSLT stylesheet
How our MARC to XML converter works
Start with a MARC record
Import it into the converter
The converter applies our DC stylesheet
And outputs a Dublin Core XML record Harmonized tariff schedule of the United States HTS United States. United States International Trade Commission. Office of Tariff Affairs and Trade Agreements. Full text The Commission : [1987- HTSA provides the applicable tariff rates and statistical categories for all merchandise imported into the United States; it is based on the international Harmonized System, the global classification system that is used to describe most world trade in goods. Tariff--Law and legislation--United States-- Periodicals. Education
What a DC XML record looks like to our users
The DC XML record is in our Find Databases system
Users can view the DC record in a labeled display
The DC XML is behind the labeled display
Concluding thoughts about XML opportunities When XML knocks on your door: You can pick up XML encoding quickly With a little up-front IT time and XSLT skills, you can convert MARC to XML With XSLT skills, you can modify user displays in XML-based delivery systems