This work is licensed under a Creative Commons Attribution 2.0 Germany License eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic Max Planck Digital Library Research and Development
Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG) MPG consists of about 80 institutes in three scientific sections the Chemistry, Physics and Technology Section the Biology and Medicine Section the Human Sciences Section The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data MPDL develops software solutions in close cooperation with scientists, librarians and technicians In the Human Sciences Section several institutes have digitized cultural artefacts and want to make them open access The Max Planck Digital Library (MPDL) in a Nutshell
eSciDoc SOA Landscape
Which data are managed?
How? PubMan – Publication Management VIRR – Textual digitized resources management IMEJI – Image management
PubMan: Management of publications
Collaboration of the MPDL with the Max Planck Institute for European Legal History Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist. VIRR is about
ViRR Key features Web-based collaborative application Editor (bibliographic metadata, table of contents and structural metadata) Viewer (online representation) Browser
ViRR Editor Combines a set of tools Paginator Table of Contents Editor Metadata Editor One complex, but flexible workspace No default order for the usage of the tools
ViRR Editor - Paginator Assign the logical page numbers to the physical ones Choose between different formats (Arabic, Latin, custom) Paginate manually or automatically
ViRR Editor - ToC Editor Gather the logical structure of a work by breaking it down in structural elements Arrange the hierarchical order of structural elements in the tree Assign scans to structural elements Choose from fine granular structural element types (over sixty)
ViRR Editor – Metadata Editor Assign descriptive metadata to structural elements Detailed description of every structural element Systematic browsing Dedicated search will be possible
ViRR Viewer Browse by scan Browse by ToC Navigate to page View metadata of structural element Page (web resolution) Page (full resolution) on click
ViRR: Sharing and reuse
From ViRR to Digitization Lifecycle Project Goal support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform Partners: MPI for European Legal History, Frankfurt MPI for European Legal History Kunsthistorisches Institut, Florenz (KHI) Kunsthistorisches Institut Bibliotheca Hertziana, Rom Bibliotheca Hertziana MPI for Human Development, Berlin MPI for Human Development Related projects: ViRR (see XML-Workflow (see
Imeji: Management of image collections
Imeji: repository of Digital Images Organized into Collections Created and defined by the institution, project, working group Albums Created and defined by the researcher
Imeji: what is so different about it? Imeji is not Flickr, nor Facebook... Freely definable metadata profiles at collection level Controlled Vocabularies may be integrated Smart search for dates, ranges (based on the metadata type) Helps gathering the metadata more effectively Focusses on collaboration and metadata quality Repository: Data can be exported at any time
eSciDoc and other services
eSciDoc SOA Landscape
eSciDoc core infrastructure Set Handler (OAI-PMH) Admin Handler Aggregation Definition Handl. Statistics Data Handler Scope Handler Report Handler Report Definition Handler Item Handler Container Handler Context Handler Organizational Unit Handler Content Model Manager User Account Handler Role Handler Group Handler Resources & Data StatisticsSecurity Content Relation Handler
CoNE Service ●Manages named entities ○Journals ○Persons ○Dewey Decimal Classification (3 public levels) ○Creative Commons Licenses (CC licenses) ○ISO Languages ○MIME Types ○PACS classification ○Custom classifications ●Reuse ○Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list) ●Motivation ○Metadata quality: autosuggest components in solutions during metadata editing ○Disambiguation: each entity is a named graph ○Data linking: CoNE identifiers in publication metadata ○Technical facilitation: all lists in one place ○Persons: Researcher Portfolio ●Extensions ○Refresh data from external sources
CoNE – Control of Named Entities Content negotiation supported
Transformation Service ●Transforms textual data formats ○Metadata ○Resources ○Standard formats ○Specific formats (e.g. EndNote custom fields) ●Motivation ○Migration of data from MPI ○Exports and dissemination ○Imports ○Continuous interoperability enhancement ○Implement once, use wherever needed
Search&Export Service Ciation style manager ●Searches and exports results ●Citation styles (Citation style manager) ○EndNote ○BibTex ○… ●Reuse ○Data delivered in multiple formats (PDF, HTML, XML, ODT) ○By external systems (content management, wordpress) ●Motivation ○Search results should be available in various outputs ○One service – many presentations (e.g. Wordpress Plug-in) ○One interface – easy inclusion of various export formats
Syndication Service Syndication Service Syndication Service Feeds: Recent releases in repository Recent releases in repository (item versions) … eSciDoc Repository eSciDoc Repository 2: Get feed definition 3: Search/retrieve items 41 Syndication Service Syndication Service Feeds: Recent releases in repository Recent releases in repository (item versions) … eSciDoc Repository eSciDoc Repository 2: Get feed definition 3: Search/retrieve items 41 Syndication Service Syndication Service Feeds: Recent releases in repository Recent releases in repository (item versions) … eSciDoc Repository eSciDoc Repository 2: Get feed definition 3: Search/retrieve items 41 ●Provides with the latest data updates ●RSS ●Atom ●Reuse ○Subscription to feeds and data reuse ○By any external clients ●Extensions ○Media RSS
Validation service Semantical validation Contextual validation Validation rule editor (upcoming)
Data acquisition service Fetches data from known sources via identifier (unAPI interface) Transforms data to other format
Pubman SWORD Server Deposit of data packages (metadata and fulltexts) Logic implements a pubman specific workflow
PID Cache manager ●Fetches Handles from the GWDG Handle System (dummy resolution) ●Assigns a pre-fetched handle to the resource ●Synchronizes the assigned handle with the resolution to a resource in the Handle system EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, )
A note on the metadata profiles ●DCAP based (Dublin Core Application Profile) ●DC terms (identified URIs) ●eSciDoc solution specific terms (identified by URIs) ●METS/MODS ●Publicly available ●Functional description ●Schemas ●Interoperability levels ●Shared term definitions (done) ●Semantic interoperability (done) ●Description set syntactic interoperability (prepared) ●Description set profile interoperability (prepared)
Premises ●Applications ○Web-based ○Internationalized ○Integrated Help system ○Easy to use ○Easy to install ●Services and infrastructure ○Reusable, interoperable, composed, technology-independent ○Extensible, Scalable and performant ●Data ○Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization ○Described with published metadata profiles ○Interoperable and enabled for reuse and repurpose
Related projects and new developments DARIAH Digital Research Infrastructure for Arts and Humanities (see Imeji AWOB Astronomers Workbench Resource Registries ECHO – European Cultural Heritage Online (see )
Thank you!