Download presentation
Presentation is loading. Please wait.
Published byAusten Howard Modified over 9 years ago
1
Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca
2
IUDL infrastructure project 2-year project funded by University Information Technology Services to reengineer digital library infrastructure around Fedora Builds on experience with Fedora in context of EVIA Digital Archive (ethnomusicology video) 2 full-time staff, plus part-time from many others Dozens of legacy collections with roughly 100,000 objects New collections: some content-focused, some research-focused
3
Diversity Multiple media types Multiple brands Multiple tools
4
The goal Ingest Aajk fs jkflsf jkds s jfs sdkf Jkl id jid whi ahin inpa aialw hwiwl
5
Required features Ingest common content types: ▫ Images ▫ Paged documents ▫ Textual documents Allow for easy creation of new content types Must support several workflows ▫ Metadata or media may be primary ▫ Most objects include derived media ▫ Systematic changes to metadata may be desired ▫ May need to connect with external tools for metadata generation, validation, etc. ▫ A workflow engine may sit on top of the ingest system
6
Existing Ingest Tools
7
Criteria Ease of install Native content models Custom content models (e.g. paged) Workflow neutrality, including object modification Batch ingest Remember, we’re evaluating object ingest only, not object delivery!
8
But first, some disclaimers… This is not an objective evaluation, just our experiences We’re not experts in these systems We’re evaluating ingest only, not delivery! We’re evaluating ingest with a focus on our needs We believe in community
9
Fedora admin client Comes with Fedora Geared towards admins rather than end users No systematic way of entering data or attaching files Very flexible The only way to create disseminators Tedious
11
Fez End-to-End GUI system Highly customizable content models, workflow, security Customizable role and group based access control Growing community Originally developed as an Institutional Repository Many preset content models Can create “extension” metadata based on an XSD External MySQL database for workflow/vocabulary data GPL
12
Fez - ingest Single object ingest ▫ Through Web UI ▫ ImageMagick/JHOVE integration Bulk ingest: ▫ Upload files to a directory ▫ Also can import existing Fedora objects in bulks ▫ Templates for metadata common to all objects, manual updates for the rest ▫ Batches possible, but only one file per object No disseminators Custom metadata can be stored as a simple XML file Objects must use “compound” content model File Custom MD Fedora
13
Fez – object organization Content level Collection level Community level CommunityCollectionImage DOPaper DOCollection DO with Custom MD
15
Elated overview End to end complete system for digital collections Simple customizable metadata and a simple workflow supported GPL “Elated is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository System, and could be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs.”
16
Elated ingest Single object ingest ▫ Through Web UI ▫ Focused on DC metadata, custom fields can be added Multi object ingest via zipped folders and files ▫ Metadata template + manually ▫ Batches possible, but only one file per object Simple content model Manually-attached disseminators File DC + Custom MD Fedora
17
Elated object organization Level n Level 2 Level 1 Top level CollectionFolder Image DO PDF DO Image DO PDF DOFolder Image DO
18
Valet for ETDs A component of the VTLS VITAL product focused on ETD submission Allows submission of thesis and a simple workflow for approval Part of a larger framework Highly focused on ETDs
19
DirIngest overview Ingests objects from a structured ZIP file Highly flexible User must create METS structure by hand Doesn’t handle disseminators Can create some RELS-EXT data, but not fully flexible Cannot modify existing objects/collections Easy to use OhioLink Bulk Ingest
20
DirIngest CollectionImages Image File TextsText File Zip Archive METS.xml Fedora Crules.xml Content level Folder level Top level CollectionImages Image DO TextsText DO
21
Batch modify A method of controlling API-M with simple XML statements Can create “empty” objects and change them in systematic ways. Requires manual (or programmatic) creation of the modify scripts Can be used in conjunction with other tools…
22
Summary FezElatedValetDir Ingest Batch Modify Admin Client Ease of install Native CM Custom CM Workflow Neutrality Batch ingest
23
Indiana Ingest Tool
24
A structured interface between a workflow management or repository management GUI and the Fedora repository Focused on simple input formats for maximum flexibility Keeps the tools independent of the repository architecture Builds the FOXML, rather than requiring a full structure to be pre-built Binds disseminators Creates RELS-EXT relationships Can create and/or alter items in a collection Auto-generates technical metadata with JHOVE or XSLT.
25
Ingest Tool Fedora MODSEADPDF DatastreamsFOXML Image Cataloging ToolSheet Music Cataloging Tool JPGSIP
26
Performing an ingest Place source metadata in an accessible location (filesystem, website) Place media files (both master and derivative) in an accessible location Define the "collection configuration" Run the ingest process Receive report
27
Sample collection config file Hoagy Carmichael Correspondence paged hoagy iudl:6 {path to master images}.tif {path to dreivative images here} -thumb.jpg -screen.jpg -full.jpg {path to ead}...... Collection defn File defn Desc. metadata Tech. metadata What to do If item exists
28
Ingest Config MODSImages Link to Parent Ingest Tool Fedora Tech MD FOXML Datastreams: Images METS RELS-EXT Example – Sheet Music
29
Ingest Config AES31 Metadata Audio Link to Parent Ingest Tool Fedora Tech MD FOXML Datastreams: Images METS RELS-EXT Example – preservation package SIP
30
Summary FezElatedValetDir Ingest Batch Modify Admin Client Ease of install Native CM Custom CM Workflow Neutrality Batch ingest IU Tool
31
Major difficulties in any ingest tool Providing flexibility in “style” of content model Matching filenames with metadata records Indicating the sequence of files in complex objects Abstracting over differing local metadata standards (even in our own collections)
32
Topics for future discussion What is the best structure for an ingest tool? ▫ Is our tool of interest to others? ▫ Would it be better to combine our capabilities with an existing tool? Can we agree on some core content models?
33
Thank You! Infrastructure project wiki: ▫ http://wiki.dlib.indiana.edu/confluence/display/INF http://wiki.dlib.indiana.edu/confluence/display/INF Contact info: ▫ Ryan Scherle rscherle@indiana.edurscherle@indiana.edu ▫ Muzaffer Ozakca mozakca@indiana.edumozakca@indiana.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.