Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009.

Similar presentations


Presentation on theme: "Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009."— Presentation transcript:

1 Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009

2 Today’s Outline Overview of Metadata Millennium system and XML Overview of XSLT Case Studies 1.Sunday School Books Collection 2.New Book List Conclusions and Observations

3 Metadata Structured data or information about an information resource. Types of metadata: –Descriptive –Administrative/Rights –Preservation –Technical –Structural

4 Descriptive Metadata Popular descriptive metadata standards –Dublin Core (Simple & Qualified) –MODS –MARCXML –VRA Core –IEEE LOM –TEI Header –EAD

5 Innovative XML XML records from Millennium Retrieved through HTTP query Data arrangement based on MARC fields –But MARC field and its subfields are siblings Optimized for WebPAC display –Brief record (for search result index page display)Brief record Contains data from MARC 245, Publication year, record ID –Full record (for both public and staff MARC display of individual record)Full record

6 Public display Staff MARC display

7 Millennium System and XML MillenniumMillennium Delimited Text MARCMARC XMLXML /xrecord XMLServer OAIHarvester Metadata Builder Content Pro Content Pro Content Pro Content Pro

8 /xrecord

9 XML Server XML server query string (search for title “xslt”): http://magic.msu.edu/xmlopac/?xml= txslt

10 OAI Harvester

11 MetaData Builder

12

13 Content Pro in Encore

14 XSLT Extensible Stylesheet Language Transformation Current version: 2.0 “Transformation” means: –Manipulation of XML documents by creating a new document based on the original document We recommend against multiple bullet indents Usages in library context: –Crosswalking Data selection and manipulation –Web display Example: converting EAD into HTML for web display

15 XSLT Uses XPath expressions to select/filter data node –By name of “Element” –By value of “Element” and/or “Attribute”

16 Case Study One Sunday School Books Collection –19 th century publications by religious societies –170 titles digitized and cataloged Data conversion needs –Source: Millennium –Target: Content Pro –Conversions in: Format:.marc to XML Schema and Data Structure: MARC to Qualified Dublin Core

17 Options for Data Migration Create Lists MARC XML Innovative XML MARC File Content Pro (QDC) MillenniumMillennium HTTP Query HTTP Query XSLT MARCEdit

18 Segment of Innovative XML Siblings MARC field/subfield as value of element Field indicator as value of element

19 Segment of MARC21XML Parent-Child MARC field/subfield as value of element attribute Field indicator as value of element attribute

20 Segment of MARC21XML Issues with Innovative XML data conversion needs –Data structured differently from MARC21XML Availability of existing “Innovative XML to DC/QDC” XSLT? –Not optimized for data manipulation Complications in data selection »Selection of data node by matching criteria against values in individual elements »A series of matching may be needed for selecting just one node Efficiency in processing »Multiple upward, downward, and lateral movement involved in data selection

21 Final Path of Data Migration Create Lists MARC XML MARC File Content Pro (QDC) Millennium(.marc)Millennium(.marc) XSLT MARCEdit

22 Design of XSLT Based on LC’s “MARC To Simple DC” XSLT“MARC To Simple DC” XSLT –Customized mappings according to LC’s suggestionsLC’s suggestions –Crosswalking strategies Conditional processing (i.e. matching) boolean ( ), contains ( ), starts-with ( ),, String manipulation Used in both conditional processing and data selection for output substring ( ), substring-before ( ), substring-after ( ), translate ( ), concat ( ), normalize-space ( )

23 Design of XSLT Conditional Processing & String Manipulation in De- duplication <xsl:if test="not(contains($dataField245Lower, translate(substring(normalize-space(.),1,string-length()-1), $upperCase,$lowerCase)))"> <xsl:value-of select="normalize-space (substring(.,1,string-length()-1))"/> Converts 245 & 246 into lower case before comparing Chop trailing period (.) Compare MARC 246 against MARC 245

24 Design of XSLT No for MARC 246

25 Design of XSLT Predicate Used for data selection and de-duplication --> <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0'] [not(marc:subfield[@code='y'] = preceding-sibling::marc: datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/ marc:subfield[@code='y']"> Selects LCSH only Selects unique 650$y only

26 Design of XSLT Hard-coding Inserted elements that are global to all records application/pdf --> application/pdf

27 Segment of Source MARCXML

28 Segment of Output QDC XML

29 Case Study Two Library’s book lists Issues with featured list

30 Existing New Book List –Newly cataloged books for browse shelf –New approach using XML and XSLT New features design –Sorting –RSS feed –Customization Case Study Two

31 New Book List Based on XML File Millennium XML server outputs two files –Entire new book list over a rolling period of time –List of daily added books New Book List program output –Book List in HTML format –RSS feed for daily added books

32 Path of Data Processing Web Server & php Web Server & php MillenniumMillennium EXPECT XSLT Internet XML output

33 Design of XSLT

34

35

36 Putting It Together

37

38 Observations and Challenges Millennium System and XML –XSLT processor within Millennium and customizing Innovative XML output Using XML as data source –Large XML file size XSLT and data processing –XSLT data manipulation –Lack of built-in functions for conditional data looping etc.

39 Thank you! makw@mail.lib.msu.edu gongd@msu.edu


Download ppt "Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May 17 - 20, 2009."

Similar presentations


Ads by Google