Encoding Challenges Indiana Magazine of History Melanie Schlosser and Michelle Dalmau Digital Library Brown Bag Spring 2007 April 4, 2007.

Slides:



Advertisements
Similar presentations
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Advertisements

History Study Center Primary and secondary sources documenting global history 2010.
Music Encoding Initiative (MEI) DTD and the OCVE
OnlineBooks and Blackwell Reference Online Nigel Thompson Account Development Manager.
Bloomsbury Conference on E-Publishing, June 2007 Subscription and Open Access Business Models in Journals Publishing Martin Richardson Managing Director.
Online Sources for Historians Seeley Historical Library 19 October 2010.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
6/15/20151 Opportunities for Collaboration: The HEARTH Project Joy Paulson and Nathan Rupp Cornell University Digital Library Federation Spring Forum New.
Orientation to Libraries Research Methods and Data College of Advancing Studies Brendan Rapple.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Online Resources from Oxford University Press.
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
Use of METS in CDL Digital Special Collections Brian Tingle.
Online Products From Oxford University Press This presentation gives a brief description of Oxford Bibliographies Online It tells you what Oxford Bibliographies.
Information Literacy Jen Earl: Academic Support Librarian- HuLSS.
Library Research Skills Arts Library Services Team | University Library Karen Chilcott | Faculty Liaison Librarian.
A METS Application Profile for Historical Newspapers
Indiana Authors and Their Books: The Journey from Print to Digital Michelle Dalmau, Digital Projects & Usability Librarian, Digital Library Program Jennifer.
Research Methods & Data AD140Brendan Rapple 2 March, 2005.
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
EdReNe Workshop London, 8th – 9th January 2008 Enhancing the LOM application profiles using the DOI AIE – Italian Publishers Association.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
WIRESCRIPT1 WIRESCRIPT Web Interactive REview of Scientific Culture, Research, Innovation Policy and Technology.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
Extending Access: Priorities and Solutions, November 2005 What are publishers doing to support research needs? Martin Richardson.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitising Special Queen’s - the JSTOR Project Preservation Teaching Research 1.
Current Events and Issues Using Index Databases for Finding Answers.
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Overview of EAD Jenn Riley Metadata Librarian Digital Library Program.
Successes and Growing Pains: The Indiana University Digital Library Program Jenn Riley Metadata Librarian Indiana University Digital Library Program January.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
The role of subject specialists in building humanities-based digital resources Jenn Riley Metadata Librarian IU Digital Library Program.
JENN RILEY, HEAD, CAROLINA DIGITAL LIBRARY AND ARCHIVES WHAT EVERY LIBRARIAN NEEDS TO KNOW ABOUT DIGITAL COLLECTIONS.
METS Navigator Jenn Riley John Walsh Michelle Dalmau David Jiao Indiana University Digital Library Program Digital Library Federation Spring Forum
Introduction to metadata
Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Libraries and Museums Jenn Riley Metadata Librarian Indiana University Digital Library.
Online Resources From Oxford University Press T his presentation gives a very brief overview of online resources from Oxford University Press in the.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Mr. P’s Class Term Paper All the Steps on the Path to an “A” Term Paper in World History.
Challenges in the Nursery: Linking a Finding Aid with Online Content Elizabeth Johnson, Lilly Library Jenn Riley, Digital Library Program DL Brown Bag,
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
The role of subject specialists in building humanities-based digital resources Jenn Riley Metadata Librarian IU Digital Library Program.
COM 4001 & 4002 Library Workshop Spring Session Overview  Library website review (library.villanova.edu)  Getting started with a topic  Finding.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
Networked Information Resources Federated search, link server, e-books.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Largest Academic Social Science and Humanities Reference Resource Online Authoritative - written by the leading experts in the field. Comprehensive - full.
Using computers to search electronic databases
Jenn Riley Metadata Librarian Digital Library Program
Jenn Riley Metadata Librarian Digital Library Program
Presentation transcript:

Encoding Challenges Indiana Magazine of History Melanie Schlosser and Michelle Dalmau Digital Library Brown Bag Spring 2007 April 4, 2007

Schlosser/Dalmau DL BB: IMH Encoding Challenges Overview Introduction to the IMH project Introduction to the TEI and its use in the IMH IMH encoding challenges: –Text features –Subject encoding –TEI and serials Ways we considered encoding Solution: Independent Headers! The Survey Interoperability: Customization v. Standardization Conclusion

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Scholarly journal focusing on Midwestern and Indiana history Continuously published since 1905 in cooperation with the Indiana Historical Society Features peer-reviewed historical articles, research notes, annotated primary documents, reviews, and critical essays Indiana Magazine of History (IMH)

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Indiana Magazine of History (IMH) Significant subscription base and wide readership Supports nationwide interest in American history, covering the old Northwest, the Midwest and Upland South Broad audience: historians, genealogists, public librarians, students (secondary and post-secondary) and the general public

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Indiana Magazine of History Online Collaboration between IMH editorial staff and the IU Digital Library Program LSTA funding to digitize and encode a 102-year run (from 1905 to 2006) Online version is freely accessible except for the most recent two years

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Indiana Magazine of History Online Online version will support full-text and bibliographic searching and browsing with access to page images and text –Search by article title, author, article type, full-text –Browse by issue, places names ~41,000 pages to scan and encode (~400 per year) by a vendor –Images: Archival and derivative images, includes covers and color/grayscale images when illustrations are included –Text: OCR and encoded according to the Text Encoding Initiative (TEI), version P4 –PDF: Articles converted to PDF for printing

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Text Encoding Initiative (TEI) Text Encoding Initiative (TEI) / Guidelines for Electronic Text Encoding and Interchange (TEI) The TEI Guidelines "are addressed to anyone who works with any text in electronic form. They provide means of representing those features of a text which need to be identified explicitly in order to facilitate processing of the text by computer programs” (Sperberg-McQueen). TEI provides elements, attributes, and other mechanisms for encoding prose, poetry, drama, dictionaries, critical apparatus, linguistic corpora, and other scholarly and non-scholarly texts.

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Text Encoding Initiative (TEI) Bibliographic Metadata: Structure:,,,,,, Content:,,,,,, See guidelines See guidelines for complete list of elements

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges TEI and DLP Projects TEI is used in a wide range of DLP projects with similar wide ranging encoding levels Indiana Authors and Their Books –Basic markup of books using TEI Lite Chymistry of Isaac Newton –Scholarly encoding of manuscripts following locally developed and evolving guidelines Swinburne Project –Scholarly encoding of prose, poetry and critical essays by and about Algernon Charles Swinburne IMH encoding at intermediate level

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding Challenges: Text Features Unusual and non-standard text features –Historical journal, contains a variety of content types Tabular data Primary source materials like letters and diariesletters –102 years Changed publishers – : Published by the editor – present: IU History Department and IHS Changed layouts – –

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding challenges: Text Features Outsourcing –Spontaneous and iterative encoding in-house could deal with unusual features as they arise –Since we’re outsourcing, have to plan in advance for everything that could come up –Have to communicate complicated decisions clearly with vendor across time, distance and communication barriers –May not be able to outsource work that requires textual analysis Budget –Limited budget necessitated selective encoding

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding challenges: Text Features Solution: Detailed encoding guidelines! –Performed text analysis to identify unusual features Pulled one volume each decade and documented features Performed sample encoding and validating –Semantic Worked with IMH staff to determine which features and content types were most important to users Diverse readership requires range of encoding to support multiple uses –Syntactic Providing page images along with text, so not important to replicate the page layout exactly Chose only the syntactic markup necessary for legibility

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding Challenges: Text Features Semantic features included: –Article types –Place names –Letters and diaries –Bibliographies ( ) Semantic features not included: –Front- and back-matter, including table of contents, publishing info, and advertisements –Personal names Lack of authority control would have made searching difficult Full-text searching can achieve similar results

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding Challenges: Text Features Syntactic features included: –Basic structural markup (as in TEI Lite) Page breaks Paragraphs Headers and bylines –Lists and tables –Blockquotes –Footnotes ( ) Syntactic features not included: –Poetry –Columns –Citations ( ) Same problem with authority control as names Most citations can be searched in and

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding Challenges: Subject Encoding IMH Online Index –Contains subject indexing –Based on printed index –Structurally complex and not machine-readable –Writing Perl scripts to parse the data and extract subject terms Subjects in the TEI –TEI has no standard way to encode subjects. The solution we settled on is the most common. –We will add the subject information when we receive the files from the vendor

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges IMH Encoding Challenges: TEI and Serials Bibliographic information in the TEI –Bibliographic information about a TEI-encoded text is captured in the –Each TEI.2 file can have only one header, and any given portion of text can only have one header that applies to it Bibliographic information in the IMH –Like most journals, articles in the IMH have two sets of bibliographic metadata: issue-level and article-level –It also contains book review articles, consisting of multiple reviews, each with its own metadata How to resolve this conflict?

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Ways We Considered Encoding… Article-level TEI.2 documents, tied together with METS –Pros: Can capture article-level metadata “the TEI way” - in the header of the TEI.2 document –Cons: Lose the integrity of the issue No way to include front- and back-matter Still have the problem of book review metadata

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Ways We Considered Encoding… Article-level and issue-level with links –Pros: Allows for full description of the issue and the articles within it Allows for inclusion of front- and back-matter –Cons: Still does not capture the issue as a text

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Ways We Considered Encoding… TEI Corpus –A way to encode language corpora, which are texts (written or oral) collected for linguistic and other research. We could treat articles as ‘texts’ and issues as ‘corpora’ –Pros: Allows for the grouping of multiple, discrete TEI documents into a cohesive whole. Considered a legitimate way to encode groups of texts –Cons: The IMH isn’t really a corpus Still does not allow for front- and back-matter

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Ways We Considered Encoding… Issue-level TEI.2 documents with MODS records for article-level metadata –Pros: Allows for full description of articles and book reviews MODS is more machine-readable than the TEI, so it would be easier to reuse the metadata and integrate it with other resources –Cons: Lose the TEI as the authoritative metadata source

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges The Solution: Independent Headers! What are they? –Standalone TEI Headers, enclosed in a document-level element –Created to “build catalogues, indexes and databases that can be used by people to locate relevant texts at remote locations.” (TEI P4 Guidelines) Why are we using them? –Allow us to capture all relevant bibliographic information in TEI Article-level Book review (sub-article) level –Supported by the standard (no extension required) –Our text delivery system (XTF) currently configured to extract metadata from the TEI header

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges The Solution: Independent Headers! Why is this a controversial solution? –“Not the TEI way to do this.” - Syd Bauman –It creates ‘overlapping’ headers. Unlike stylesheets, TEI has no way to ‘cascade.’ –There are theoretically other ways to capture this information in the TEI: Corpora and the other approaches we considered Repeating elements in the header Extending the schema to allow for bibliographic metadata within the of the text –Not supported in P5

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Survey of TEI Community Informal survey of text encoding community distributed across a number of listserves Asked about –Use of the TEI to encode serials –Use of Independent Headers 16 responses from Digital Libraries, Digital Humanities Centers, and independent faculty members –6 are using Independent Headers in some way –10 are using the TEI to encode print serials (journals and newspapers)

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Survey of TEI Community Conclusion: We are the only people using the Independent Headers as a way to capture more granular metadata in serials –Others are using them to: encapsulate bibliographic metadata for multivolume publications store and exchange records about their text collections –Most serials encoding projects are either: Encoding at the article level Encoding at the issue level and not capturing article-level metadata Encoding at the issue level and using MODS to capture article-level metadata

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges The Goal: Interoperability TEI document as authoritative source from which we can derive functionality (METS, page turning application) and descriptive metadata (OAI harvesting) Reliance on standards for management, preservation and re-use of digital content –Self-documenting –Seamless integration with our infrastructure –Self-describing; can port and manipulate texts in other online contexts

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Customization vs. Standardization “The TEI's adoption as a model in digital library projects raised some interesting issues about the whole philosophy of the TEI, which had been designed mostly by scholars who wanted to be as flexible as possible. Any TEI tag can be redefined and tags can be added where appropriate. A rather different philosophy prevails in library and information science where standards are defined and then followed closely -- this to ensure that readers can find books easily. It was a pity that there was not more input from library and information science at the time that the TEI was being created, but the TEI project was started long before the term " digital library " came into use. A few people made good contributions, but in the library community there was not the widespread range of many years' experience of working with electronic texts as in the scholarly community” ( Susan Hockey, A Companion to Digital Humanities, 2004 ).

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Customization v. Standardization “Digital libraries will be critical to future humanities scholarship. Not only will they provide access to a host of source materials that humanists need in order to do their work, but these libraries will also enable new forms of research that were difficult or impossible to undertake before.” (Howard Besser, A Companion to Digital Humanities, 2004).

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges In Conclusion …. On Independent Headers …. –We feel good about our unconventional use of the Independent Header! We learned a lot as we investigated solutions. –Alternative and viable options for representing article-or item-level metadata in TEI documents in the future: MODS P5 supports certain declarables (e.g., ) in the TEI Headerdeclarables On serials encoding with TEI … –Resurrected the need for TEI to be less monograph-centered and support serials encoding, especially print-born serials that are issue- centric (inherent hierarchy) Spreading the word … –Presenting at Digital Library Federation Spring 2007 Forum –Reporting to TEI council results of survey (“No one uses the independent headers!”)

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges References TEI P4 Guidelines: TEI P5 Guidelines: Besser, H. (2004). The Past, Present, and Future of Digital Libraries. In S. Schreibman, R. Siemens, J. Unsworth (Eds.), A companion to digital humanities (pp ). Oxford: Blackwell. Hockey, S. (2004). The history of humanities computing. In S. Schreibman, R. Siemens, J. Unsworth (Eds.), A companion to digital humanities (pp ). Oxford: Blackwell. Unsworth, J. (2000). The scholar in the digital library:

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Happy Birthday Melanie! Happy Birthday to you Happy Birthday deeeaarrr Me-la-nieeee Happy Birthday to you (sing in our heads)

April 4, 2007Schlosser/Dalmau DL BB: IMH Encoding Challenges Questions? Comments! Melanie Schlosser: Michelle Dalmau: Thanks to Syd Bauman, John Walsh and Jenn Riley for the brainstorm sessions and encoding help.