Download presentation
Presentation is loading. Please wait.
Published byJob Fitzgerald Modified over 9 years ago
1
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009
2
Today’s Outline Overview of Metadata Millennium system and XML Overview of XSLT Case Studies 1.Sunday School Books Collection 2.New Book List Conclusions and Observations
3
Metadata Structured data or information about an information resource. Types of metadata: –Descriptive –Administrative/Rights –Preservation –Technical –Structural
4
Descriptive Metadata Popular descriptive metadata standards –Dublin Core (Simple & Qualified) –MODS –MARCXML –VRA Core –IEEE LOM –TEI Header –EAD
5
Innovative XML XML records from Millennium Retrieved through HTTP query Data arrangement based on MARC fields –But MARC field and its subfields are siblings Optimized for WebPAC display –Brief record (for search result index page display)Brief record Contains data from MARC 245, Publication year, record ID –Full record (for both public and staff MARC display of individual record)Full record
6
Public display Staff MARC display
7
Millennium System and XML MillenniumMillennium Delimited Text MARCMARC XMLXML /xrecord XMLServer OAIHarvester Metadata Builder Content Pro Content Pro Content Pro Content Pro
8
/xrecord
9
XML Server XML server query string (search for title “xslt”): http://magic.msu.edu/xmlopac/?xml= txslt
10
OAI Harvester
11
MetaData Builder
13
Content Pro in Encore
14
XSLT Extensible Stylesheet Language Transformation Current version: 2.0 “Transformation” means: –Manipulation of XML documents by creating a new document based on the original document We recommend against multiple bullet indents Usages in library context: –Crosswalking Data selection and manipulation –Web display Example: converting EAD into HTML for web display
15
XSLT Uses XPath expressions to select/filter data node –By name of “Element” –By value of “Element” and/or “Attribute”
16
Case Study One Sunday School Books Collection –19 th century publications by religious societies –170 titles digitized and cataloged Data conversion needs –Source: Millennium –Target: Content Pro –Conversions in: Format:.marc to XML Schema and Data Structure: MARC to Qualified Dublin Core
17
Options for Data Migration Create Lists MARC XML Innovative XML MARC File Content Pro (QDC) MillenniumMillennium HTTP Query HTTP Query XSLT MARCEdit
18
Segment of Innovative XML Siblings MARC field/subfield as value of element Field indicator as value of element
19
Segment of MARC21XML Parent-Child MARC field/subfield as value of element attribute Field indicator as value of element attribute
20
Segment of MARC21XML Issues with Innovative XML data conversion needs –Data structured differently from MARC21XML Availability of existing “Innovative XML to DC/QDC” XSLT? –Not optimized for data manipulation Complications in data selection »Selection of data node by matching criteria against values in individual elements »A series of matching may be needed for selecting just one node Efficiency in processing »Multiple upward, downward, and lateral movement involved in data selection
21
Final Path of Data Migration Create Lists MARC XML MARC File Content Pro (QDC) Millennium(.marc)Millennium(.marc) XSLT MARCEdit
22
Design of XSLT Based on LC’s “MARC To Simple DC” XSLT“MARC To Simple DC” XSLT –Customized mappings according to LC’s suggestionsLC’s suggestions –Crosswalking strategies Conditional processing (i.e. matching) boolean ( ), contains ( ), starts-with ( ),, String manipulation Used in both conditional processing and data selection for output substring ( ), substring-before ( ), substring-after ( ), translate ( ), concat ( ), normalize-space ( )
23
Design of XSLT Conditional Processing & String Manipulation in De- duplication <xsl:if test="not(contains($dataField245Lower, translate(substring(normalize-space(.),1,string-length()-1), $upperCase,$lowerCase)))"> <xsl:value-of select="normalize-space (substring(.,1,string-length()-1))"/> Converts 245 & 246 into lower case before comparing Chop trailing period (.) Compare MARC 246 against MARC 245
24
Design of XSLT No for MARC 246
25
Design of XSLT Predicate Used for data selection and de-duplication --> <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0'] [not(marc:subfield[@code='y'] = preceding-sibling::marc: datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/ marc:subfield[@code='y']"> Selects LCSH only Selects unique 650$y only
26
Design of XSLT Hard-coding Inserted elements that are global to all records application/pdf --> application/pdf
27
Segment of Source MARCXML
28
Segment of Output QDC XML
29
Case Study Two Library’s book lists Issues with featured list
30
Existing New Book List –Newly cataloged books for browse shelf –New approach using XML and XSLT New features design –Sorting –RSS feed –Customization Case Study Two
31
New Book List Based on XML File Millennium XML server outputs two files –Entire new book list over a rolling period of time –List of daily added books New Book List program output –Book List in HTML format –RSS feed for daily added books
32
Path of Data Processing Web Server & php Web Server & php MillenniumMillennium EXPECT XSLT Internet XML output
33
Design of XSLT
36
Putting It Together
38
Observations and Challenges Millennium System and XML –XSLT processor within Millennium and customizing Innovative XML output Using XML as data source –Large XML file size XSLT and data processing –XSLT data manipulation –Lack of built-in functions for conditional data looping etc.
39
Thank you! makw@mail.lib.msu.edu gongd@msu.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.