But I Don't Have Access to Your Server, and My Grad Student Left Last Month! Meeting the Challenges of Research Data Curation via Metadata Juliane Schneider Research Curation Data Program, UC San Diego
Please take my dataset! Collaboration Recognition Funding requirements Professional Legacy “Fame and tranquility can never be bedfellows.” --Michel de Montaigne Tenopir C, Dalton ED, Allard S, Frame M, Pjesivac I, Birch B, et al. (2015) Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLoS ONE 10(8): e doi: /journal.pone
Metadata Before you
Know your scope! Search Discovery Sharing Curation Preservation Publishing Services Search Discovery Sharing Self-Deposit Preservation Dark Storage
Know your resources! IT support – A separate library IT department – Dedicated IT staff from main IT department – Person in a cube Database – SQL? Triplestore? Proprietary system? System – Proprietary or open source? – Homegrown? – What are the pieces? (back end vs. UI vs. indexing)
UC San Diego Big IT department in the library Triplestore Homegrown system with a SOLR index, Blacklight and Hydra Moving to Fedora 4! Updated our data model to move into line with DPLA and other Fedora/Hydra Communities Looked for namespaces in places like Bibframe Grounding ourselves in the Portland Common Data Model
Our Data Model Brought more in line with DPLA and other Hydra/Fedora institutions Continue to conform to the Portland Common Data Model (PCDM)PCDM Breaking out subjects into Scientific Name, Common Name and Anatomy – Considering FAST headings Not final! Still working on it….
Form 1 Working project title Reason for inquiry Purpose and value Scope Copyright status/data ownership Sensitive data? FERPA? HIPAA? Scheduling a consult
Form 2 Collection title Personnel names, identifiers and roles Dates created/collected Full collection description Brief collection description Identifiers Keywords Related resources Licensing Embargos
Metadata Consultation Players: Metadata Specialist, Project Manager, Data Provider Actions: Confirm Collection Level metadata – Names of people, roles – Rights – Related resources – Publications – Preferred citation Discuss item level metadata
Metadata Questions Storage – What is the total size of the collection? – How many files? – What is the current file structure, and where does it live? Data – What are the formats of the files? – What is the basic structure of the data? – Is the dataset complete or ongoing? Metadata – What kind of metadata does the data have? – Does the metadata live with the data files, or is it in a separate place? – What format is the metadata in? – If there is existing metadata, is there a unique ID or some other way of connecting an object to its corresponding metadata
Code/Executable metadata Base environment (language or software) and version required to run the code, besides the OS Base language or software packages/libraries/modules needed, along with version numbers Ideally, a system requirements document (machine- generated) that describes the environment needed to run the code URL for base software home page
Standards PREMIS METS MODS MADS VRA Bibframe Schema.org
Crosswalks Excel spreadsheet template MARC (from our ILS) Archivist’s Toolkit
Biggest Challenges Acquiring the data/metadata – Setting up a shared space can be difficult because of permissions.
NEVER METADATA UNPREPARED Know What You Want To Do Know What You've Got To Do It With
DOCUMENT YOUR METADATA
Links and such Juliane Schneider Twitter: JulianeS LinkedIn: