Download presentation
Presentation is loading. Please wait.
Published byOswald Johns Modified over 8 years ago
1
Data Wrangling: Developing Local Best Practice for Born Digital Metadata Tracy Popp, Digital Preservation Coordinator Ayla Stein, Metadata Librarian University Library University of Illinois Urbana-Champaign
2
Intro What will be addressed: Institutional context Project needs Challenges Current progress Future work
3
Institutional Context University Library –Campus-wide network of libraries –Largest public university research library in U.S. thirteen million volumes 24 million items and materials Over 12,000,000 digital files Main Library building, East Entrance http://www.library.illinois.edu/bis/images/uiucmainlib.jpg
4
Institutional Context Collaborative effort: –Content Access Management (Cataloging and Metadata) Ayla – Metadata Librarian –Preservation Unit Tracy – Born Digital Content Reformatting –Special Collections University Archives RBML, Sousa, etc. –Back to Preservation Kyle Rimkus – Preservation Librarian –Digital Content Long-term Preservation (Medusa)
5
Project Needs Ayla (Metadata) and Tracy (Born Digital Content Reformatting) Identify –Metadata currently captured Make –Schema Recommendations Technical Administrative Descriptive –Controlled Vocabulary
6
Overview of Challenges Behemoth spreadsheet Various reports not in a schema No controlled vocabulary Redundant data entry Ideally aligns with Medusa data
7
Born Digital Reformatting Behemoth spreadsheet –Project tracking and data entry Reports –Structured but not to a schema From FTK Imager: »Directory list of media structure (created at time of disk imaging); item level information »Hash list of exported files From TreeSize Pro »Media group level reports
8
Challenges - Schema No one schema appropriate –Many layers of transformation –varying types of metadata Born Digital Reformatting Collecting Unit Digital Preservation Repository Recover from obsolete media Arrangement Description Access Medusa: Long term Preservation
9
Challenges – Controlled Vocabulary Reformatting request form is paper –Project tracking system in works No Controlled Vocabulary Reviewed: MANY Chose: –PBCore instantiationMediaType –PBCore instantiationPhysical
10
Schema Choices METS, MODS, and PREMIS Why? –MODS and PREMIS align with Medusa terms
11
Schema Choices PREMIS –Record technical info of item pre- reformatting –Encode actions and digital forensics reports as ‘events’ –Can have full provenance of a digital object in a cohesive piece
12
Schema Choices The Catch: –Medusa supports limited metadata Collection & file group level Event info does not pre-date ingest into repository –Metadata file as content METS wraps up MODS & PREMIS info Deposit METS record with content
13
Good Practice Interoperability Various levels that will assist in the digital preservation life cycle
14
Summary: Work In Progress Schema Choice: –METS, MODS, and PREMIS Controlled Vocabulary Choices: –Data Type: instantiationMediaType –Media Type: PBCore instantiationPhysical
15
Future Work Creating centralized, web-based tracking tool –Allow curating units to add descriptive information –Avoid data duplication Import metadata and reports –Structured in schema More controlled vocabulary –Rights
16
Thank You! Tracy Popp Digital Preservation Coordinator tpopp2@Illinois.edu tpopp2@Illinois.edu Ayla Stein Metadata Librarian astein@Illinois.edu astein@Illinois.edu @TheStacksCat
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.