Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009
Today’s Outline Overview of Metadata Millennium system and XML Overview of XSLT Case Studies 1.Sunday School Books Collection 2.New Book List Conclusions and Observations
Metadata Structured data or information about an information resource. Types of metadata: –Descriptive –Administrative/Rights –Preservation –Technical –Structural
Descriptive Metadata Popular descriptive metadata standards –Dublin Core (Simple & Qualified) –MODS –MARCXML –VRA Core –IEEE LOM –TEI Header –EAD
Innovative XML XML records from Millennium Retrieved through HTTP query Data arrangement based on MARC fields –But MARC field and its subfields are siblings Optimized for WebPAC display –Brief record (for search result index page display)Brief record Contains data from MARC 245, Publication year, record ID –Full record (for both public and staff MARC display of individual record)Full record
Public display Staff MARC display
Millennium System and XML MillenniumMillennium Delimited Text MARCMARC XMLXML /xrecord XMLServer OAIHarvester Metadata Builder Content Pro Content Pro Content Pro Content Pro
/xrecord
XML Server XML server query string (search for title “xslt”): txslt
OAI Harvester
MetaData Builder
Content Pro in Encore
XSLT Extensible Stylesheet Language Transformation Current version: 2.0 “Transformation” means: –Manipulation of XML documents by creating a new document based on the original document We recommend against multiple bullet indents Usages in library context: –Crosswalking Data selection and manipulation –Web display Example: converting EAD into HTML for web display
XSLT Uses XPath expressions to select/filter data node –By name of “Element” –By value of “Element” and/or “Attribute”
Case Study One Sunday School Books Collection –19 th century publications by religious societies –170 titles digitized and cataloged Data conversion needs –Source: Millennium –Target: Content Pro –Conversions in: Format:.marc to XML Schema and Data Structure: MARC to Qualified Dublin Core
Options for Data Migration Create Lists MARC XML Innovative XML MARC File Content Pro (QDC) MillenniumMillennium HTTP Query HTTP Query XSLT MARCEdit
Segment of Innovative XML Siblings MARC field/subfield as value of element Field indicator as value of element
Segment of MARC21XML Parent-Child MARC field/subfield as value of element attribute Field indicator as value of element attribute
Segment of MARC21XML Issues with Innovative XML data conversion needs –Data structured differently from MARC21XML Availability of existing “Innovative XML to DC/QDC” XSLT? –Not optimized for data manipulation Complications in data selection »Selection of data node by matching criteria against values in individual elements »A series of matching may be needed for selecting just one node Efficiency in processing »Multiple upward, downward, and lateral movement involved in data selection
Final Path of Data Migration Create Lists MARC XML MARC File Content Pro (QDC) Millennium(.marc)Millennium(.marc) XSLT MARCEdit
Design of XSLT Based on LC’s “MARC To Simple DC” XSLT“MARC To Simple DC” XSLT –Customized mappings according to LC’s suggestionsLC’s suggestions –Crosswalking strategies Conditional processing (i.e. matching) boolean ( ), contains ( ), starts-with ( ),, String manipulation Used in both conditional processing and data selection for output substring ( ), substring-before ( ), substring-after ( ), translate ( ), concat ( ), normalize-space ( )
Design of XSLT Conditional Processing & String Manipulation in De- duplication <xsl:if test="not(contains($dataField245Lower, translate(substring(normalize-space(.),1,string-length()-1), $upperCase,$lowerCase)))"> <xsl:value-of select="normalize-space (substring(.,1,string-length()-1))"/> Converts 245 & 246 into lower case before comparing Chop trailing period (.) Compare MARC 246 against MARC 245
Design of XSLT No for MARC 246
Design of XSLT Predicate Used for data selection and de-duplication --> <xsl:for-each = preceding-sibling::marc: and Selects LCSH only Selects unique 650$y only
Design of XSLT Hard-coding Inserted elements that are global to all records application/pdf --> application/pdf
Segment of Source MARCXML
Segment of Output QDC XML
Case Study Two Library’s book lists Issues with featured list
Existing New Book List –Newly cataloged books for browse shelf –New approach using XML and XSLT New features design –Sorting –RSS feed –Customization Case Study Two
New Book List Based on XML File Millennium XML server outputs two files –Entire new book list over a rolling period of time –List of daily added books New Book List program output –Book List in HTML format –RSS feed for daily added books
Path of Data Processing Web Server & php Web Server & php MillenniumMillennium EXPECT XSLT Internet XML output
Design of XSLT
Putting It Together
Observations and Challenges Millennium System and XML –XSLT processor within Millennium and customizing Innovative XML output Using XML as data source –Large XML file size XSLT and data processing –XSLT data manipulation –Lack of built-in functions for conditional data looping etc.
Thank you!