Implications of Using PCDM in the Face of a Major Repository Migration

Slides:



Advertisements
Similar presentations
Digital Library Service at Higher Education in India
Advertisements

What Kinds of Material Can Be Submitted?. UT Scholars can submit most forms of digital materialstext, images, video, or audio files – to the UT Digital.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Linking Repositories Scoping Study Key Perspectives Ltd University of Hull SHERPA University of Southampton.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
5-7 November 2014 DR policies Practical Digital Content Management from Digital Libraries & Archives Perspective.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
“Would You Like to Play a Game?” :: Megan Winget :: University of Texas at Austin A Review of Challenges and Current Practice in Game-Related Collections.
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
From Berlin back to Business OPEN Stellenbosch University Library and Information Service Mimi Seyffert Manager: Digitisation and Digital Services.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Making Grey Literature Available through Institutional Repositories LeRoy J. LaFleur, Social Sciences Bibliographer Nathan A. Rupp, Metadata Librarian.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Enhancing Content Visibility in Institutional Repositories: Maintaining Metadata Consistency Across Digital Collections Ahmet Meti Tmava and Daniel Gelaw.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
Roy Tennant Life After MARC A Metadata Infrastructure for the 21st Century.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
Elisabeth M. Long Digital Library Development Center University of Chicago Library uchicago.edu CreativeCommons.org: Publishing in the Digital.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Digital Library Repositories and Instructional Support Systems: Repository Interoperability Working Group Leslie Johnston University of Virginia Library.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
DSpace - Digital Library Software
CBRC Digital Repository: Storing and viewing 3D objects, for science! James Halliday Programmer/Analyst, Library Technologies Juliet L. Hardesty
Digital Library Program Forum March 31, 2003.
11 Researcher practice in data management Margaret Henty.
Working Group 4 Data and metadata lifecycle management  1. Policies and infrastructure for data and metadata changes  2. Supporting file and data formats.
Managing Digital Assets File Naming and Resizing.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Making the Case for Curation: The Practical Experiment of DSpace Managing Digital Assets February 5-6, 2005 Charleston, SC Ann J. Wolpert, Director of.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Building Digital Archives Mark Phillips Cathy Hartman June 6, 2008.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
Assessment for Success with Institutional Repository Services
Tiewei (Lucy) Liu Metadata Librarian June 26, 2016
An Introduction to the Institutional Repository
Visualizing Global Impact of an Institutional Repository
Statewide Digitization and the FCLA Digital Archive
Software Documentation
Institutional role in supporting open access, open science, open data
University Libraries’ Repository Services
© 2015 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
VI-SEEM Data Repository
How to Implement an Institutional Repository: Part IV
Introduction to Implementing an Institutional Repository
Introduction to DSpace
Digital Knowledge Repositories: What the 2015 ETD Survey Reveals
Implementing an Institutional Repository: Part II
Hands-on Introduction and Refresher Course
Metadata in Digital Preservation: Setting the Scene
Metadata Typical metadata requested about a pdf during the ETD submission process: Most ETD submission processes.
Medusa at the University of Illinois
Beyond Description: Metadata for Catalogers in the 21st Century
….part of the OSU Libraries' suite of digital library tools…
Jisc Research Data Shared Service (RDSS)
Implementing an Institutional Repository: Part II
Digital Library and Plan for Institutional Repository
How to Implement an Institutional Repository: Part II
Managing the Institutional Repository for OA Khawulile Radebe: Librarian: Repository Administrator & Metadata.
Digital Library and Plan for Institutional Repository
Presentation transcript:

Implications of Using PCDM in the Face of a Major Repository Migration Leaving Flatland Implications of Using PCDM in the Face of a Major Repository Migration Steve Van Tuyl Hui Zhang Michael Boock Center for Digital Scholarship and Services Oregon State University Libraries & Press

Content Types in SA@OSU Theses and Dissertations 24,842 Technical Reports 11,185 Articles 7,788 Presentations and Posters 1,041 Audio and Video 159 Datasets 59 Oregon State University’s ScholarsArchive@OSU institutional repository has been in production since 2005. It contains over 58,000 total items, primarily theses and dissertations as well as technical reports (primarily in the form of government documents and extension and experiment station publications), items classified as books (primarily digitized books out of copyright), datasets, conference proceedings and presentations, multimedia, datasets, and, increasingly since the university passed an OA policy in 2013, faculty articles. Mediated deposit across many collections including ETD, which is require

Multi Part Files in SA@OSU 25% of objects in SA are multi-file 11% of ETD objects are multi-file How much content do we have that has multiple parts? ETDs with multiple bitstreams Diagram indicating what those bitstreams are (file extensions) <- steve has some sort of script that does this i think All file types with multiple bitstreams Diagram indicating what those bitstreams are (file extensions)

ScholarsArchive@OSU - Migration Project Current infrastructure doesn’t meet a number of needs: Flexible reporting & analytics Multi-file objects & relationships Decision to Hydra-fy: Modularity Unification of developer base PCDM Unification of developer base--most library development at osulp is in rails now OregonDigital, SA, other. Less familiarity with java (DSpace)

DSpace Simple Dissertation

DSpace Multi-file Example Multi-file dataset and an article that used the dataset, both in ScholarsArchive

PCDM (probably for the 100th time today) Collection Object/Work File Talk here about extreme flexibility and need to figure out how we want to represent different types of objects and their relationships with other objects?

Simple Document (dissertation) Work Dissertation File File [pdf] Specific examples, won’t look a lot different in hydra

Multi-Part Dissertation Work Article 1 Work Article 2 Work File [pdf] File File [pdf] File With more complex objects, we’ll be able to represent differently, assign metadata more granularly and represent relationships between objects that formerly either had to be packaged together as a single item, or described separately without being able to demonstrate or express relationships between them (except in notes or by putting items together in collections--still not clear how they relate)

Simple Dataset Dataset Work NetCDF Datafile Work Compressed Files Work Readme Work File [nc] File File [tar] File File [txt] File For datasets, in DSpace unable to represent heirarchy. Just list a bunch of files with minimal description (size, format, brief description)

Not so simple dataset Same data, ‘properly’ represented Work Dataset File Work Readme File [txt] Logical File Grouping File Work Data File File [nc] Work Work Work Work Work Work Work Data File 1 Data File 2 Data File 3 Data File 4 Data File 5 Data File 6 Data File 7 File File File File File File File File [nc] File [nc] File [nc] File [nc] File [nc] File [nc] File [nc]

Research Paper & Dataset File Work Readme File [txt] File Work Data File File [csv] Related (dcterms:isReferencedBy) File Work Data File File [csv] Work Dataset File Work Work Work Data File File [csv] Pre-Print File [pdf] Final Manuscript File [pdf] File Work Data File File [csv] File File File Work Figure File [tif]

Datasets, Paper, Dissertation File Work Readme File [pdf] Related (dcterms:isReferencedBy) File Work File [zip] Data File Work File Work Dataset Data File File [zip] File Work File [zip] Data File Work Work Dissertation Final Manuscript File [pdf] File Work Data File File [zip] File File File [pdf]

Challenges - Data Modeling Consistency Managing the flexibility of PCDM Consistency in: Modeling structure (especially throughout migration process) Internal representation of object types (consistent to the data modeling) Community representation of object types Diversity of item types (documents, AV, data, etc.) Large percentage are similar (single documents) I think we're saying: So, as we've demonstrated, we have a wide variety of item types with different relationships in the current repository. How do we ensure that these items and their relationships are migrated so that they are represented in the new repository consistently. For example, all theses with associated datasets are represented according to our predetermined model. ??

Solutions - Data Modeling Consistency Establish models for representation of broad types of content Single file objects Multi-file objects Versioned objects Curation of future repository content will adhere Being able to identify new objects that require different models Engage with PCDM community to identify content type guidance/standards Interoperability Common language for troubleshooting Plan for how we represent multi-file content deposited to the new repository. Determine any retrospective work that needs to be done to build relationships between existing content.

Challenges - Intent At times researchers know better than librarians do about the structure of their datasets The diversity of dataset sources and formats The volume/size of dataset The relationships among files inside dataset and to external/derivative resources (e.g. article) How do users want to discover objects The diversity of dataset sources and formats: whether files should be zipped together or represented as individual file

Solutions - Intent Balance between researcher/depositor intent WRT their content and modeling None of us are right all the time Be transparent about expectations and procedures Try to be consistent to the existing data models Need to consider at what level of granularity users want to discover content Representation of compound objects in a discovery interface? Thats hard Come to #OR2017

Challenges - Migration to Hydra Land We cannot babysit every item we migrate. Migration of 58,000+ items in an automated manner Retain all metadata, including structural information such as collections and communities to which items belong in DSpace Though we don’t want to adhere too much… Strike a balance between familiar representation and more functional modeling Are there issues with internal consistency if we allow changes in structure of objects post migration? Yes. community/collection in DSpace represents institution chart Collection name has meanings such as content type or owning institute Are there issues with internal consistency if we allow changes in structure of objects post migration? We have to make compromise during migration, but new items will be stricted to the rules.

Solutions - Migration to Hydra Land Identify what types of objects require manual or hands-on migration Datasets Multi-file dissertations & supplementary files Faux relationships Capture community & collection structure in item-level metadata to inform creation of PCDM collections (or not) Migrated collections will represent object groupings (e.g. Biochemistry) rather than object types (Biochemistry posters) After migration, reevaluate data models for new content to allow flexibility but pay attention to consistency Capture community & collection structure in item-level metadata to inform creation of PCDM collections (or not): manual audit the 400 collections in SA to create a crosswalk of community/collection structure to item level metadata for migration After migration, reevaluate data models for new content to allow flexibility but pay attention to consistency: e.g., multi-part dissertation And Poster as type facet

What’s Next A repository-wide analysis to find logical object types that require specific models Setting local model consistency Identifying relatedness of existing content Identify items that are too complex to automate (~10%?) Migration Automation Pilot with test collections Migrate majority of contents by data models pertaining to item types Either hand migrate remaining content OR modify automation to meet complexity Service critical collections such as ETD Discovery Identify items that are too complex to automate (~10%?) Unusual structure Parts that belong elsewhere (supplementary datasets with dissertations) Either hand migrate remaining content OR modify automation to meet complexity: depend on the number of items that require additional treatment Service critical collections such as ETD: it requires special consideration to assure smooth, minimum downtime transition (use DSpace for ETD until Sufia is thoroughly tested)

? Thank You

This is what our abstract says we’re talking about... Increasingly, repository managers and data curators are identifying gaps between current repository functionality and dataset preservation and dissemination requirements. Supplementary files such as datasets and software code are increasingly deposited together with documents (e.g. dissertations, faculty research articles). Datasets are also increasingly being deposited to repository platforms with the expectation that the structure of the data can be properly represented, along with the relationships between the data and other repository content. In our current repository, deposits are realized without regard for representation of the relationships between file types or differentiation in the description of files. In 2015, Oregon State University Libraries and Press (OSULP) began to migrate the ScholarsArchive@OSU institutional repository from DSpace to the Hydra-Sufia platform. This presentation describes and demonstrates OSULP’s prototype repository architecture that explicitly defines relationships between datasets and the publications using these datasets. We demonstrate how we use the Portland Common Data Model (PCDM) to contextualize files in relationship with other resources in the repository. We provide concrete examples of how this architectural migration improves the representation and publication of repository content and the implications for migration of a large repository.

Hydra::Works::Collection a pcdm:Collection Hydra::Works::Work a pcdm:Object pcdm:hasMember A D A D pcdm:hasMember Hydra::Works::FileSet a pcdm:Object Key A Access A D B Bitstream D Descriptive pcdm:hasFile T Technical T B OriginalFile a pcdm:File A T B Thumbnail a pcdm:File A T B ExtractedText a pcdm:File A

DSpace Flatness Example Only bitstream metadata is file name, size, format and description. Not indexed, hard to tell what file is what and what is important.