Out topic is… METS and MODS to express data for digital objects

Slides:



Advertisements
Similar presentations
METS: Metadata Encoding & Transmission Standard Merrilee Proffitt Society of American Archivists August 2002.
Advertisements

METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Introduction to METS (Metadata Encoding and Transmission Standard) Jerome McDonough New York University
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Standards showcase: MODS, METS, MARCXML ALA Annual 2006 Rebecca Guenther and Jackie Radebaugh Network Development and MARC Standards Office Library of.
METS: An Introduction Towards a Digital Object Standard Rick Beaubien Library Systems Office U.C. Berkeley.
Using and to Create XML Standards-based Digital Library Applications Morgan Cundiff & Nate Trail Network Development and MARC Standards Office (NDMSO)
METS at UC Berkeley Part I: Generating METS Objects.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
METS Dr. Heike Neuroth EMANI – Project Meeting February 14 th - 16 th, 2002 Springer-Verlag Heidelberg Göttingen State and University Library (SUB)
Creating METS Application Profiles using METS and MODS Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
From EAD to METS An overview and history of METS Rick Beaubien UC Berkeley.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
3. Technical and administrative metadata standards Metadata Standards and Applications.
METS AT THE L IBRARY OF C ONGRESS Nate Trail Sept 11, 2014.
Keeping the pieces together: The Role of METS in the Preservation of Digital Content Robin Wendler Harvard University Library January 16, 2005 [Men in.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
METS: Metadata Encoding and Transmission Standard Richard Gartner Oxford University Library Services
Use of METS in CDL Digital Special Collections Brian Tingle.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
A METS Application Profile for Historical Newspapers
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
METS Intro & Overview Mets Opening Day Germany May 7, 2007 Nancy J. Hoebelheinrich Stanford University Libraries.
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Case History: Library of Congress Audio-Visual Prototyping Project METS Opening Day October 27, 2003 Carl Fleischhauer Office of Strategic Initiatives.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
Case History: Library of Congress Audio-Visual Prototyping Project METS Opening Day (2003), Revised For the CUL Metadata Working Group July 22, 2004 Carl.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.
Nate Trail Network Development & MARC Standards Office 8/1/2006 With help from Sydney Olive How to Build, Display and Find METS Objects.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
METS Navigator Jenn Riley John Walsh Michelle Dalmau David Jiao Indiana University Digital Library Program Digital Library Federation Spring Forum
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
Report from Workshop 8: XML and related technologies ELAG 2001 Jan Erik Kofoed BIBSYS Library Automation.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
METS: Implementing a metadata standard in the digital library Richard Gartner Oxford University Library Services
METS Application Profiles Morgan Cundiff Network Development and MARC Standards Office Library of Congress.
IMPLEMENTATION ISSUES. How PREMIS can be used  For systems in development as a basis for metadata definition  For existing repositories as a checklist.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
Introduction to the Semantic Web and Linked Data
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
The 3 M’s: MINERVA, MODS, and METS Allene Hayes (LC) Rebecca Guenther (LC) Leslie Myrick (NYU) DLF -- New Orleans April 20, 2004.
5. Applying metadata standards: Application profiles Metadata Standards and Applications Workshop.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
Lifecycle Metadata for Digital Objects The Final Curtain December 4, 2006.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
7th Annual Hong Kong Innovative Users Group Meeting
Introduction to Metadata
Integrating PREMIS and METS
PREMIS Tools and Services
Introduction to METS (Metadata Encoding and Transmission Standard)
Presentation transcript:

Using METS and MODS to Create an XML Standards-based Digital Library Application Out topic is… METS and MODS to express data for digital objects Library of Congress Presents Music, Theater, and Dance is the digital library web application I will be talking mainly about the first part (the data) and Nate will be talking more about the LC Presents digital library application Morgan Cundiff Network Development and MARC Standards Office Library of Congress

Who are we? NDMSO Technical Team Morgan Cundiff Nate Trail Betsy Miller Glenn Gardner Corey Keith LC Presents Team (Music Division) Karen Lund Pat Padua James Wolf Paul Fraunfelter Mike Ferrando Acknowledge colleagues who worked on the project

XML is the lingua franca of the Web Increasingly used for data exchange and messaging in the business world Web pages are increasingly based on XHTML Family of technologies to leverage (XML Schema, XSLT, XPath, and XQuery) Software tools widely available (for storage, editing, parsing, validating, transforming and publishing XML) – much of this software is open source and free – and it is constantly and actively being improved. Microsoft Office 2003 supports XML as document format (WordML and ExcelML) “Web 2.0” heavily based on XML (AJAX, Semantic Web, Web Services, etc.) Background perspective: Using XML is the larger issue More and more data is being stored in XML format. When XQuery becomes a W3C recommendation, Xpath/Xquery will become the way to query XML. When that occurs there will quickly follow a wave of new products for storing and searching XML data.

XML “XML has become the de-facto standard for representing metadata descriptions of resources on the Internet.” Jane Hunter Working towards MetaUtopia - A Survey of Current Metadata Research Writing about metadata for Web resources, Jane Hunter made this statement back in 2003

The Importance of Standards “In moving from dispersed digital collections to interoperable digital libraries, the most important activity we need to focus on is standards… most important is the wide variety of metadata standards [including] descriptive metadata… administrative metadata…, structural metadata, and terms and conditions metadata…” Howard Besser The Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries And speaking about standards (also in 2003) Howard Besser made this statement: Howard was exactly right. We have heard various people speak against the creation of new “stovepipes”. By that they mean creating standalone applications that have no ability to communicate or “interoperate” with other applications. These are the “dispersed” collections that Besser refers to. We also hear Dr. Billington and others speak about creating a World Digital Library. The World Digital Library is a vision for digital library interoperabilty on a very grand scale and it is very noble and powerful vision. Again, I think Howard Besser was right when he said, the most important activity we need to focus on, to realize that vision, is metadata standards. And that is exactly what we are doing in our work to build digital library collections that are based on METS and the other members of the family of standards that have emerged from the digital library community.

XML in the Digital Library Community Family of XML data standards: METS, MODS, MIX, PREMIS, TEI, and EAD METS Implementations: LC, OCLC, RLG, California Digital Library, Harvard, Princeton, National Library of Portugal, National Library of Wales, University of Indiana, Stanford, New York University, University of Göttingen, Oxford University, etc etc… METS Software Tools: Harvard METS Toolkit, Harvard DRS METS Archive Tool (Dmart) for Audio Deposit, CDL 7train METS Generation Tool, MEX Authoring Tools (Das Bundesarchiv), ContentE (Biblioteca Nacional Digital, Portugal), METS Navigator (Indiana University Digital Library Program) ResCarta Metadata Creation Tool (ResCarta Foundation) etc METS listserv: 530 subscribers XML in our “domain” the digital library community Since about 2002 there has been steady and increasing “uptake” of a family of XML standards for digital library metedata NDMSO has been very involved in this development- maintenance agency for METS, MODS, MIX, MARCXML, PREMIS, and EAD

XML Standards at LC: A little historical perspective … 1995 – first American Memory web collections released (not XML-based) 1998 – XML 1.0 becomes a W3C Recommendation 2002 – METS and MODS released 2002 – Digital Audio-Visual Preservation Prototyping Project (led by Carl Fleischhauer, first use of METS, MODS, and MIX at LC) 2003 – “Patriotic Melodies” (first use of METS and MODS in production at LC) 2003 – Veterans History Project released 2004 – I Hear America Singing released (since renamed to LC Presents) 2004 – Justice Blackmun Papers collection released 2006 – National Digital Newspaper Project (LC and partners, first use of METS, MODS, MIX, PREMIS) as repository submission package at LC- September launch) 2006 – Ser2Dig (Digital Serials workgroup, METS for multi-volume monographs) 2006 – Draft METS profile for “article-level” historical newspapers Historical perspective at LC

What is METS? (a quick primer) METS is an XML Schema designed for the purpose of creating XML document instances that express the hierarchical structure of digital library objects, the names and locations of the files that comprise those objects, and the associated metadata. METS can, therefore, be used as a tool for modeling real world objects, such as particular document types. METS is used to express the structure of the document and the associated metadata.

What is MODS? (a quick primer) MODS is an XML Schema designed for expressing bibliographic data. MODS can be seen as an alternative to the MARC format. It is especially useful for XML-based digital library projects. MODS can be used as an “extension schema” to METS. Note to catalogers: MODS does not make you obsolete. The same knowledge and skills (mastery of cataloging rules and controlled vocabularies, subject knowledge, etc) are still necessary. It is just a different syntax (i.e. different from MARC) for making bibliographic data machine-readable. I do not know how fast the ILS world will move to XML though I think it is inevitable. In the meantime there continues to be an unfortunate divide between digital library projects and the cataloging operation here at LC. We are hoping to begin having music catalogers assist with LC Presents in the near future.

What are the 7 Sections of a METS Document? <metsHdr/> <dmdSec/> <amdSec/> <fileSec/> <structMap/> <structLink/> <behaviorSec/> </mets>

The Descriptive Metadata Section with mdWrap <mets> <dmdSec> <mdWrap> <xmlData> <!-- insert data from different namespace here --> </xmlData> </mdWrap> </dmdSec> <fileSec></fileSec> <structMap></structMap> </mets>

The Descriptive Metadata Section with MODS as extension schema <mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods></mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap></mets:structMap> </mets:mets>

The Descriptive Metadata Section with MODS and relatedItem elements <mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem type=“constituent”> <mods:relatedItem type=“constituent”></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap></mets:structMap> </mets:mets>

METS document with two hierarchies (logical and physical) <mets:mets> <mets:dmdSec> <mets:mdWrap> <mets:xmlData> <mods:mods> <mods:relatedItem> <mods:relatedItem></mods:relatedItem> </mods:relatedItem> </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec> <mets:fileSec></mets:fileSec> <mets:structMap> <mets:div> <mets:div></mets:div> </mets:div> </mets:structMap> </mets:mets>

MODS relatedItem type=“constituent” element Child element to MODS relatedItem element has same content model as mods (titleInfo, name, subject, physicalDescription, note, etc) The relatedItem element makes it possible to create very rich analytic descriptions for contained works within a MODS records relatedItem element is repeatable and it can be nested recursively (thus making it possible to build a hierarchical tree structure) relatedItem elements make it possible to associate descriptive data with any structural element.

<mods:mods> <mods:titleInfo> <mods:title>Bernstein conducts Beethoven and Mozart</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Bernstein, Leonard</mods:namePart> </mods:name> <mods:relatedItem type="constituent"> <mods:title>Symphony No. 5</mods:title> <mods:namePart>Beethoven, Ludwig van</mods:namePart> <mods:relatedItem type="constituent"> <mods:partName>Allegro con moto</mods:partName> </mods:relatedItem> <mods:partName>Adagio</mods:partName> </mods:mods> Here is a simple illustration of a logical hierarchy created with relatedItems.

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

Linking in METS Documents (XML ID/IDREF links) DescMD mods relatedItem AdminMD techMD (mix) sourceMD digiprovMD rightsMD fileGrp file StructMap div fptr

What is a METS Profile? “METS Profiles are intended to describe a class of METS documents in sufficient detail to provide both document authors and programmers the guidance they require to create and process METS documents conforming with a particular profile.” A profile is expressed as an XML document. There is a schema for this purpose. The profile expresses the requirements that a METS document must satisfy. A sufficiently explicit METS Profile may be considered a “data standard”. Note: A METS Profile is a human-readable prose document and is not intended to be “machine actionable”. You might think of the METS Profile document as something analogous to the LCRI

METS Profiles in use in LC Presents Sheet Music Musical Score (may be a score, score and parts, or a set of parts only) Print Material (books, pamphlets, etc) Music Manuscript (score or sketches) Recorded Event (audio or video) PDF Document Bibliographic Record Photograph Compact Disc Collection

Multiple Inputs to Common Data Format New digital items Legacy database Harvest of American Memory Collection Profile-based METS/MODS object (A common data set for searching and display) One of the realities we face is that we have valuable data especially bib data in many different formats and systems. We also have bib data that is in no system at all- it is in an old procite database or a word perfect or an even older print document. Web publication (LC Presents)

Example 1: New digital object METS musical score profile Library of Congress march / John Philip Sousa [musical score and parts] Example of “routine” METS-making

Example 2: New digital object METS Recorded Event Profile Juilliard String Quartet / Juilliard String Quartet [sound recording] Example of “routine” METS-making

Example 3: from “random” database METS Bibliographic Record Object DUKE ELLINGTON AND HIS ORCHESTRA (1962) [motion picture] Example of “database of bib data” source Conversion from Filemaker Pro database to Filemaker XML dump (1 XML file) XSLT to 14,000 METS/MODS records amd XSL to PDF (1 file)

Example 4: American Memory Collection METS Photograph Object Harvest of William P. Gottlieb Collection Portrait of Louis Armstrong, Carnegie Hall, New York, N.Y., ca. Apr. 1947] / William P. Gottlieb [photograph] File of 1600 MARC records marc4j to XML modsCollection (1 file) XSLT to METS photograph profile (1600 files)

Logical (MODS) Physical (structMap) <mets:structMap> <mets:div TYPE="photo:photoObject" DMDID="MODS1"> <mets:div TYPE="photo:version“DMDID="ver01” > <mets:div TYPE="photo:image"> <mets:fptr FILEID="FN10081"/> </mets:div> <mets:div TYPE="photo:version" “ DMDID=“ver02“> <mets:fptr FILEID="FN10090"/> <mets:div TYPE="photo:version" “DMDID="ver03”> <mets:fptr FILEID="FN1009F"/> </mets:structMap> <mods:mods ID=“ver01”> <mods:titleInfo> <mods:title>Original Work (vesion 1)</mods:title> </mods:titleInfo> <mods:relatedItem type=“otherVersion" ID=“ver02"> <mods:title>Derivative Work 1</mods:title> </mods:relatedItem> <mods:relatedItem type=“otherVersion" ID=“ver03"> <mods:title>Derivative Work 2</mods:title> </mods:mods> mods element and relatedItem type =“otherVersion” elements Sequence of 3 nodes div TYPE=“photo:version” elements Corresponding sequence of 3 nodes linked to logical sequence by ID/IDREF

METS Profiles makes possible 3 levels of validation for METS objects Valid XML (well-formed) Valid METS/MODS (XML Schema) Valid METS Profile

Aggregation Example 1 METS Collection Object http://lcweb2.loc.gov/cocoon/ihas/loc.natlib.ihas.200031146/default.html

Aggregation Example 2 MODS relatedItem type=“host” http://lcweb2.loc.gov/cocoon/ihas/search?query=%2BmemberOf:"Baseball%20sheet%20music%20collection"&start=0&view=thumbnail

Aggregation Example 3: “See also” MODS relatedItem (no type) http://memory.loc.gov/cocoon/ihas/loc.natlib.ihas.200003800/default.html

Administrative Metadata Example Use of PREMIS and MIX for digital images louis.xml

METS and MODS software tools (open source XML toolkit) Emacs (text editor) - edit MODS - nxml-mode (Emacs plug-in for schema-aware XML editing) - XML Schemas for METS, MODS, MIX, PREMIS - cygwin – bash shell command line and tools - Saxon (XSLT transformations) - Xerces (XML validation) - mysql-jdbc-connector (connect to natlib mySQL database) - SRU (retrieve records from ILS) - Cocoon facilities to retrieve and load records, retrieve xml version of a file system, etc - Ant – used to automate all of the above tasks and create pipelines of multiple tasks and run from Emacs

Parting thoughts: Advantages of METS/MODS-based approach Ability to model complex objects Easy to change, extend (both the data and the application) Use modern, non-proprietary software tools Leverage use of XSLT for legacy data conversion, for batch METS creation and editing, and for Web displays and behaviors Common syntax: XML for data creation/editing/storage/searching Output to HTML, PDF for display Easy to edit single records or selected batches of records Ability to validate data Ability to aggregate disparate data sources Improved ability to manage data and publish data now and … Well positioned for Future: new web application (Web 2.0), repository submission, cooperative project (test interoperability), provide METS for OAI harvesting …