More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v.

Slides:



Advertisements
Similar presentations
PREMIS Conformance. Agenda 1.NLNZ and NLB conformance exercise 2.History of PREMIS Conformance 3.Current status 4.Mapping to functionality.
Advertisements

DRS 2 Metadata Migration June 25, Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
1 Extending the Implementation of PREMIS to Geospatial Resources in the Stanford Digital Repository: An Exploration By Nancy J. Hoebelheinrich Metadata.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
Ingest and Loading DigiTool Version 3.0. Ingest and Loading 2 Ingest Agenda Ingest Overview and Introduction Ingest activity steps Transformers Task Chains.
BitstreamFormat Renovation: DSpace Gets Real Technical Metadata.
PREMIS in the Real World: some reflections on constraints Jan Lavelle Senior Librarian (Systems Development) State Library of Tasmania.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
The British Library’s METS Experience The Cost of METS Carl Wilson
Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009.
The New DRS (DRS 2) Introduction. What is DRS? Digital repository for preservation and access –Maintains integrity of deposited content –Preserves content.
Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea Goethals.
Preserving Digital Collections Andrea Goethals Florida Center for Library Automation (FCLA)
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
FITS: The File Information Tool Set
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California.
Organizational Relationships and Shaping the Digital Resource July 21, 2010 Johanna Bauman, Senior Production Manager, ARTstor.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
DRS 2 Orientation Harvard University Library September 30, 2010 DRS = Digital Repository Service.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
The Statistics New Zealand Prototype PREMIS creation tool Euan Cochrane PREMIS Fair October 2009
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Andrea Goethals, Harvard Library ASERL Webinar 2013 File Information Tool Set.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
Global Digital Format Registry Progress Andrea Goethals, Harvard University Library NDIIPP Digital Preservation Partners’ Meeting Arlington, VA July 9,
Connecting Preservation Planning and Plato with Digital Repository Interfaces David Tarrant
DRS 2 Project (2008 – Present!) Andrea Goethals, Harvard Library Digital Preservation Management Workshop, MIT June 13, 2013.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
How Not to Lose Track of Your Research Organization and Planning Resources at Brandeis Melanie Radik and Raphael Fennimore Library & Technology Services.
Developing a Framework for File Format Migrations iPRES 2015 Chapel Hill, NC 3 November 2015 Joey Heinen and Andrea Goethals.
The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving 2015 Andrea Goethals. Franziska Frey and David Ackerman.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
1 Annotation Framework March Terminology CV - abbreviation for controlled vocabulary CRS - Community Review System (a collection within DLESE)
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Getting it together! Automating Standardized Technical Metadata for Images and Audio Jody L. DeRidder University of Alabama Libraries DLF 2015 October.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
Joint Meeting of CSUL Committees,
Preserving Digital Collections
Ingest and Dissemination with DAITSS
FLORIDA CENTER FOR LIBRARY AUTOMATION
DAITSS and the Florida Digital Archive
Statewide Digitization and the FCLA Digital Archive
EPrints Preservation.
Andrea Goethals, Harvard Library
Digital Preservation Policies: Technical Considerations
Presentation transcript:

More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v

How much metadata do we really need? That depends on the quality of the metadata...

Context of my remarks Experience developing for and now managing Harvard Library’s Digital Repository Service (DRS) (In production from 2000 – Present) – ~ 47 million files Recent multi-year overhaul of repository to the new DRS – Provided chance to analyze metadata & rethink approach

Prior to the new DRS Most all metadata was user-contributed – Expertise ranged from professional labs to curators, archivists and other staff Very little validation of user-contributed metadata Metadata elements had grown organically rather than systematically. For example...

Some elements weren’t specific enough File format one of: ICC, GIF, JPEG, TIFF, TDF, TEXT, PCD, AIFF, RealAudio, APP, WAV, WFR, JP2, JPF, ZIP, GZIP, PDF – Format variations and versions not recorded

Some elements were too specific Text abstract character repertoire one of: ‘US-ASCII’, ‘Unicode’ Text character map one of: ‘ISO_646.irv:1983’, ‘UTF-8’ – These weren’t validated so in reality the text could be in any character set but would be recorded as one of these regardless

Some generic elements only tracked for certain formats For images only: – enhancements – history – methodology – producer – production software – system And the above elements allowed free-text, leading to a variety of interpretation over time

Errors in relationship metadata Missing relationships (e.g. referenced in the METS descriptor file but lacking explicit relationships) Redundant relationships (files related more than once to the same files) Illogical relationships (only discoverable because of redundant metadata) – Examples: – Target images related to other target images – Non-target images described as target images – A METS descriptor file described as a scanned image – Objects merged into themselves

Strategies in the new DRS for improving metadata Automated format ingest, validation & metadata extraction at ingest Validation when files or ingested, added or removed or relationship metadata is changed Sync with catalogs, check and improve metadata on migration Pull descriptive metadata from catalogs at ingest or on request

File Information Tool Set (FITS) Identifies many file formats Validates a few file formats Extracts metadata from files Aggregates metadata from many tools Calculates basic file info (file size, MD5, etc.) Outputs technical metadata – Community-standard metadata schemas Identifies problem files – Conflicting tool opinions on format, metadata values – Unidentifiable file formats – Encrypted, rights metadata embedded in files

File Information Tool Set (FITS) Any file FITS wrapper + XSL JHOVE FITS wrapper + XSL DROID FITS wrapper + XSL NLNZ ME FITS wrapper + XSL ExifTool FITS wrapper + XSL File utility FITS wrapper + XSL FFIdent FITS XML Standard XML FITS XML + Tika, OIS Audio Information, ADL Tool, OIS File Information, OIS XML Metadata

FITS configured to get high quality metadata Metadata normalization – ‘JPEG2000’ = ‘JPEG 2000’ = ‘JPEG 2000 image’ – ‘inches’ = ‘2’ = ‘in.’ Plays to strengths of tools and downplays their weaknesses – Overall trust tool x over tool y – Don’t run tool x for format z Format tree (hierarchy of related formats) – ‘OpenDocument’ is more specific than ‘Zip’

Example of what we know about a file pre- and post-FITS adoption at ingest Pre-FITS (user-contributed metadata)Post-FITS adoption at Ingest Format = PDFFormat = Portable Document Format MIME media-type = application/pdfFormat version = 1.4 Format registry record: Registry: PRONOM Registry key: fmt/18 Page count: 24 Date created by application: T17:43:27-04:00 Title: JPCDHEP492 Creation application: ComSquare ImPDF Library v0.89 Admin flag: INHIBITOR

Additional strategies in the new DRS Move away from overly restrictive metadata elements where needed – Examples: – Allow free text for format names – Any text character set Add elements at the format-agnostic file level when they can apply to files in any format, e.g. producer or methodology Flag suspicious metadata (and content) for later analysis

Administrative flags Help pinpoint incorrect metadata, problem content or where metadata tools need improvement Some examples: – FAILED_METADATA_EXTRACTION – FORMAT_ID_CONFLICT – INCORRECT_METADATA – INHIBITOR – RIGHTS_METADATA

They said it better “It is quality rather than quantity that matters.” – Lucious Annaeous Senegal “Quality is not an act, it is a habit.” – Aristotle “Quality is never an accident. It is always the result of intelligent effort.” – John Ruskin

Thank you!