Richard Gartner Oxford University

Slides:



Advertisements
Similar presentations
Metadata 101 Sandy McIntyre Colby SOASIS- Dayton Sandy McIntyre Colby SOASIS- Dayton
Advertisements

OpenDOAR and ROAR RSP Services Day, Bath, 15 th Jan.2009 Peter Millington SHERPA Technical Development Officer SHERPA, University.
METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
METS in the Oxford Digital Library: a case study Richard Gartner METS Awareness Training.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Introduction to Xaira Part One: All about Xaira Andrew Hardie.
1 Demystifying metadata Ann Chapman UKOLN University of Bath UKOLN is funded by Resource: The Council for Museums, Archives and Libraries, the Joint Information.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Introduction to METS (Metadata Encoding and Transmission Standard) Jerome McDonough New York University
CLDs, stewardship, resource discovery and collections management (hmm…catchy) Nick Poole ICT Adviser Resource: The Council for Museums, Archives and Libraries.
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
All Things Digital Digital Humanities, Digital Resources and Humanities Computing.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
Overview of IMS Content Packaging Sheila MacNeill.
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
The eXtensible Markup Language (XML) An Applied Tutorial Kevin Thomas.
Music Encoding Initiative (MEI) DTD and the OCVE
The Future of e-Learning Inclusive learning objects using RDF Dr Terence Love Dept of Design Curtin University
METS: An Introduction Towards a Digital Object Standard Rick Beaubien Library Systems Office U.C. Berkeley.
METS: An Introduction Structuring Digital Content.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
A really fairly simple guide to: mobile browser-based application development (part 1) Chris Greenhalgh G54UBI / Chris Greenhalgh
Post SCAN and Preportal - Towards an Integrated UK Archival Gateway  Jane Jamieson, National Archives of Scotland (NAS) 
Stand-off Annotation Further details and examples: Durusau and O’Donnell’s (2001) powerpoint presentation Thompson and McKelvie’s (1997) “Hyperlink semantics.
From EAD to METS An overview and history of METS Rick Beaubien UC Berkeley.
METS Metadata Encoding and Transmission Standard Metadata Working Group Forum April 19, 2002.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
XML Introduction By Hongming Yu Feb 6 th, Index Markup Language: SGML, HTML, XML An XML example Why is XML important XML introduction XML applications.
METS: Metadata Encoding and Transmission Standard Richard Gartner Oxford University Library Services
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Digital Encoding What’s behind E-text Resources?.
DIGITIZATION OF RARE LIBRARY MATERIALS Metadata Format Access to Digital Documents © Adolf Knoll, National Library of the Czech Republic.
Introducing XML Maria Esteva DLSD General Libraries May 2004.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Mark Sullivan University of Florida Libraries Digital Library of the Caribbean.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
Accessing Data Using XML CHAPTER NINE Matakuliah: T0063 – Pemrograman Visual Tahun: 2009.
METS: Implementing a metadata standard in the digital library Richard Gartner Oxford University Library Services
XML for Text Markup An introduction to XML markup.
1 SGML-MARC Incorporating Library Cataloging into the TEI Environment Stephen Paul Davis Columbia University Libraries.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
Intro to Podcasting Great Lakes Broadcasting Conference 14 March 2006.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Creating METS files in the Oxford Digital Library Richard Gartner.
Introduction to XML XML – Extensible Markup Language.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Introducing XLink and XPointer ©NIITeXtensible Markup Language/Lesson 10/Slide 1 of 23 Objectives In this lesson, you will learn to: * Identify the types.
Jan Christoph Meister University of Hamburg
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
METS from scratch Jerome McDonough New York University October 21, 2003.
Introduction to the World Wide Web & Internet CIS 101.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
Web Services: Principles & Technology Slide 3.1 Chapter 3 Brief Overview of XML COMP 4302/6302.
Archivists' Toolkit - All Hands Meeting Scope Both multilevel and single-level description Accommodates description of collections, series, sub-series,
Accessing a national digital library: an architecture for the UK DNER
XML Introduction By Hongming Yu Feb 6th, 2002.
What is xMod? xMod is: a desktop application (at the moment!) which can transform a repository of XML into a completely finished website Paul Spence, Paul.
Presentation transcript:

Richard Gartner Oxford University METS and TEI Richard Gartner Oxford University

Introduction (verbal) METS provides framework within which any data or metadata can be referenced or embedded This presentation shows how easily METS and TEI can be used in tandem The context is an image database with full OCR’d text encoded in TEI

Cobbett’s Parliamentary History

Incorporating TEI into METS <fileGrp ID="modhis006-aab-TEI"> <file GROUPID="TEI" MIMETYPE="text/xml" ADMID="modhis006-aab-001-TEI"> <FLocat LOCTYPE="URL“ xlink:href="modhis006-aab.xml"/> </file> </fileGrp>

Incorporating TEI into METS <div ID="modhis006-aab-div.1.1.1" LABEL="Half page"> <fptr FILEID="modhis006-aab-fgrp-0001"> <area FILEID="modhis006-aab-TEI " BEGIN="modhis006-aab-TEI.pb.1“ END="modhis006-aab-TEI.pb.2"/> </fptr> </div>

Incorporating TEI into METS

OCR -> TEI TEI in Libraries level 1 – simplest level of encoding designed for OCR texts One <div> element enclosing complete text One <p> element within this Page breaks marked with <pb>

OCR -> TEI (verbal) OCR’d text put into skeletal TEI file with minimal header Page-breaks in file replaced with <pb> A simple stylesheet assigns a sequential ID to each <pb> Another stylesheet adds <area> elements to METS structural map pointing to <pb> elements

Put your OCR text here! <?xml version="1.0" encoding="utf-8"?> <tei.2> <teiHeader status="new" type="text"> <fileDesc> <titleStmt> <title>modhis006-aab OCR text</title> </titleStmt> <publicationStmt> <publisher>Oxford Digital Library</publisher> </publicationStmt> <sourceDesc default="NO"> <p >OCR text from modhis006-aab</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <div0 id="modhis006-aab-aaa.div.1" part="N“ sample="complete" org="uniform"> <p> </p> </div0> </body> </text> </tei.2> Put your OCR text here!

□Parliamentary History. VOL. n. □ <pb/>Parliamentary History. VOL. n. <pb/> <pb/>Parliamentary History. VOL. n. <pb/>

<xsl:template match="//pb"> <xsl:element name="pb"> <xsl:attribute name="id"> <xsl:value-ofselect="$idstem"/> .pb. <xsl:number count="pb" format="1“ level="any"/> </xsl:attribute> </xsl:element> </xsl:template> <pb id="modhis006-aab-aaa.pb.1"/>Parliamentary History. VOL. n. <pb id="modhis006-aab-aaa.pb.2"/>

<xsl:element name="fptr"> <xsl:attribute name="FILEID"> <xsl:value-of select="@FILEID"/> </xsl:attribute> <xsl:element name="area"> <xsl:value-of select="$idstem"/> <xsl:attribute name="BEGIN"> .pb. <xsl:number count="mets:fptr" format="1" level="any"/> <xsl:attribute name="END"> <xsl:value-of select="$currentcount+1"/> </xsl:element>

<div ID="modhis006-aab-div.1.1.1" LABEL="Half page"> <fptr FILEID="modhis006-aab-fgrp-0001"> <area FILEID="modhis006-aab-TEI " BEGIN="modhis006-aab-TEI.pb.1“ END="modhis006-aab-TEI.pb.2"/> </fptr> </div>

Why use METS and TEI together? Images Overlapping hierarchies

Verbal Images Overlapping hierarchies AS far as P4, TEIs image facilities clumsy Have to use entity references only – no URLs URIs etc No way to distinguish between inline images (designed for these) and whole-page images No scope for administrative metadata Overlapping hierarchies CONCUR was SGML mechanism for this – clumsy to use and gone in XML – various other approaches all distinguised by notational complexity

Images <figure entity=“page1”> <head>Page 1</head> <ENTITY page1 SYSTEM “location_of_image_file” NDATA jpeg>

Overlapping hierarchies Some approaches used with TEI CONCUR (SGML) MECS (Wittgenstein archive) Stand-off markup: XLink mechanisms to impose markup (varying hierarchies) TexMECS Witt: PROLOG

Images in METS List all variants of image files in <fileSec> Each can have extensive administrative or descriptive metadata attached Reference them by URLs, URIs etc or embed them in the METS file FILEID element in <structMap> indicates exact correspondence of image to part of the item

Overlapping hierarchies <structMap type=“physical”> <div LABEL=“Page 1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page2”> </fptr> </div> </structMap> <structMap type=“logical”> <div LABEL=“Chapter 1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page23”> </fptr> </div> </structMap>

Overlapping hierarchies <structMap > <div LABEL=“Chapter 1”> <div LABEL=“Page1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page2”> </fptr> </div> </structMap>

More information http:www.loc.gov/standards/mets http://www.jisc.ac.uk/index.cfm?name=techwatch_report_0205