eSciDoc –Object and content modelling experiences Natasa Bulatovic, MPDL n.bulatovic@mpdl.mpg.de
eSciDoc Development teams FIZ Team: Fachinformationszentrum Karlsruhe MPDL Team: Max Planck Digital Library, Munich Service Management (SvM) User interface engineering (GUI) Software Developement (DEV)
Max-Planck Society: Subject domains Biology and Medicine Section Developmental and Evolutionary Biology/Genetics Immunobiology and Infection Biology/Medicine Cognition Research Microbiology/Ecology Neurosciences Plant Research Structural and Cell Biology Chemistry, Physics and Technology Section Astronomy/Astrophysics Chemistry Solid State Research/Material Sciences Earth Sciences and Climate Research High Energy and Plasma Physics/Quantum Optics Computer Science/Mathematics/Complex Systems Humanities Section Cultural Studies Jurisprudence Social and Behavioral Sciences 20000 potential users (Scientists, Librarians, Research groups, Project groups) in the 80 research institutes of the Max Planck Society
eSciDoc initial focus Main usage scenarios Publication Data (Institutional repository, metadata management, quality assurance workflows) Research Data (Digital collections of images, multimedia content, cooperative authoring and annotating) Common functions that have to be supported Generalized and unified data model for all content resources Content modeling of resources as a specialization instrument Versioning PID User management, authentication and authorization Service orientation means: Reusability — regardless of whether immediate reuse opportunities exist, services are designed to support potential reuse. Formalized contracts— For services to interact, they need not share anything but a formal contract that describes each service and defines the terms of information exchange. Loose coupling - services are loosely coupled— Services must be designed to interact without the need for tight, cross-service dependencies. Abstraction - of logic-the only part of a service that is visible to the outside world is what is exposed via the service contract. Underlying logic, beyond what is expressed in the descriptions that comprise the contract, is invisible and irrelevant to service requestors. Composability- services may compose other services. This allows logic to be represented at different levels of granularity and promotes reusability and the creation of abstraction layers. Autonomous - the logic governed by a service resides within an explicit boundary. The service has control within this boundary and is not dependent on other services for it to execute its governance. Statelessness - services should not be required to manage state information, as that can impede their ability to remain loosely coupled. Services should be designed to maximize statelessness even if that means deferring state management elsewhere. Discoverability - services should allow their descriptions to be discovered and understood by humans and service requestors that may be able to make use of their logic
eSciDoc core services Context handler Item handler Container handler “We need high level of abstraction and some common functions …” Context handler Item handler Container handler Organizational units handler Role handler Content model handler Semantic store handler .. Data-centric (CRUD), logic-centric (versioning, release, withdraw, etc.).
eSciDoc intermediate services “.. but we also need some more added functionality …” Duplicate detection Image handling Metadata handler Validation of data Retrieval/download statistics Workflow management Functionality adding services, Technology gateways, adapters, façades
eSciDoc application services “… we still need to create additional services…” Depositing Publishing Quality assurance Citation manager Export manager SearchAndOutput Controlled vocabularies Process-centric or simply public services shared with partner instutitions
Towards specialization “.. and in addition end user interfaces to manage different content …” Publication items (at least one author must exist) Face images (metadata describe object on the image) Digitized Book (law books from Holly Roman Empire) Language resources (description of languages, features, other resources) “… different metadata schema and controlled vocabularies…” metadata profiles management Metadata schema/ context / content model specific validation rules “… with different workflows …” Publication management workflow (“depositing” service enriches core workflow)
eSciDoc solutions Publication management Image collections Scanned books
eSciDoc project landscape
Solution development: towards data specialization Services work with generic object patterns such as Items, Containers, Metadata Solutions work with specific content models such as Publication Items, Albums, Law Books Solutions require different metadata sets such as Publication metadata, Face image metadata, Law Book and Page metadata
eSciDoc generic data model
Publication data model
Publication item (styled XML) http://coreservice.mpdl.mpg.de:8080/ir/item/escidoc:28236 Please use http://test-pubman.mpdl.mpg.de/ to check on latest demo data available
Publication Item (View)
Faces data model Faces Album Faces Item
Faces Album and Item (XML) http://test-faces.mpdl.mpg.de/faces/album/escidoc:12005 Faces Item: http://test-faces.mpdl.mpg.de/faces/details/escidoc:5345 NOTE: Links may not be functional all time, as data are for DEMO purposes Please use http://test-faces.mpdl.mpg.de/ to check on latest demo data available
Faces Album (View)
Faces Item (View)
VIRR data model CM1: Multivolume CM2: Volume CM1: Scanned Page CM2: Table of Contents
Virr multvolume, volume (XML) Multivolume: http://dev-coreservice.mpdl.mpg.de:8080/ir/container/escidoc:88279 http://dev-coreservice.mpdl.mpg.de:8080/ir/container/escidoc:88279:10/struct-map Volumes: http://dev-coreservice.mpdl.mpg.de:8080/ir/container/escidoc:88912 (has Toc Item) http://dev-coreservice.mpdl.mpg.de:8080/ir/container/escidoc:88913 NOTE: Links may not be functional all time, as these data are for development purposes Check http://test-virr.mpdl.mpg.de/ to check if demo data are available or http://virr.mpdl.mpg.de for latest version of VIRR Solution Scanned page: http://dev-coreservice.mpdl.mpg.de:8080/ir/item/escidoc:16149 TOC: http://dev-coreservice.mpdl.mpg.de:8080/ir/item/escidoc:139789
VIRR Multivolume, volume (View)
Virr volume (View)
Virr Page (View)
eSciDoc services and solutions: What next? Extending existing solutions and services Setting up productive environment (performance, service authorization, data ingestion) More data diversity Building more services and solutions (component reusability, tools, enhance SOA) Improve data interoperability, data dissemination, data repurposing Community-based open source development
eSciDoc - Content model definition See also: http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Content_Models http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Content_Model_Object For more details http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Content_Models http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Content_Model_Object
eSciDoc project resources eSciDoc project web pages, Infrastructure download http://www.escidoc.org/ MPDL collaboratory network http://colab.mpdl.mpg.de MPDL software download http://escidoc1.escidoc.mpg.de/projects/common_services/ http://escidoc1.escidoc.mpg.de/projects/pubman/ http://escidoc1.escidoc.mpg.de/projects/faces/ http://escidoc1.escidoc.mpg.de/projects/virr/ http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Admin The project webpage is a useful information resource for developers or any other interested people. The project webpage contains useful information on the build environment, source code access, javadocs, code analysis etc.
Thank you! n.bulatovic@mpdl.mpg.de