Biblio-transformation-engine: An open source framework and use cases in the digital libraries domain 7th International Conference on Open Repositories.

Slides:



Advertisements
Similar presentations
Geographic Digital Content Components André Santanchè Advisor: Dr. Claudia Bauzer Medeiros Database Group Unicamp - Brazil.
Advertisements

IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
New digital libraries and aggregations in Greece: the case of the Hellenic Aggregator Dr. Emmanouel Garoufallou Veria Central Public.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Title of the presentation | Date |1 Grey Literature Repositories and CRIS in a SOA Environment Nikos Houssos National Documentation Centre (EKT) WORKSHOP.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Components and Architecture CS 543 – Data Warehousing.
Understanding Metamodels. Outline Understanding metamodels Applying reference models Fundamental metamodel for describing software components Content.
Firefox 2 Feature Proposal: Remote User Profiles TeamOne August 3, 2007 TeamOne August 3, 2007.
Jun Peng Stanford University – Department of Civil and Environmental Engineering Nov 17, 2000 DISSERTATION PROPOSAL A Software Framework for Collaborative.
UNIT-V The MVC architecture and Struts Framework.
RMIS - Building a Research Management Information System at the University of Glamorgan Leanne Beevers & Neil Williams.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
JWST Integrated Modeling Environment James Webb Space Telescope.
Vireo: The TDL Solution to Electronic Thesis and Dissertation Submission and Management Brought to you by the Texas Digital Library
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Using CERIF-based CRIS to support the academic and research community: emerging services in Greece Nikos Houssos National Documentation Centre / National.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Multimedia Communication and Information Logistics for AFTER-SALES AND PRODUCT LIFE- CYCLE SUPPORT Click to edit Master title style
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Open Access Repositories by EKT: The National Archive of PhD Theses and the "Helios" Institutional Repository NHRF Nikos Houssos National Documentation.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Archivists’ Toolkit: Introduction March 12, 2007 Jody Lloyd Thompson.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
Current Research Information Systems in Greece Dr Nikos Houssos National Documentation Centre (EKT) / National Hellenic Research Foundation (NHRF)‏ Dr.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
DSpace - Digital Library Software
1 The EDIT System, Overview European Commission – Eurostat.
Title of the presentation | Date |1 Nikos Houssos National Documentation Centre (EKT/NHRF) CRIS for research information management.
I. Understanding Record Loading and EDIS II. Database Statistics & Top 10 Search III. Problem with merging records IV. Pseudo Tag (Special 035 Tag ) V.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
ΕΚΤ Access to Knowledge ΕΚΤ Access to Knowledge CERIF API: Access and reuse research information in CRIS Dimitris Karaiskos Vasilis Bonis, Nikos Pougounias.
Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution Structured programming Product SW.
1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.
Presentation Title Subtitle DSpace UI Prototype 7 Spring, Angular.js, and the DSpace REST API.
Digital Library Services team Indico Workshop - CERN – Invenio: a possible search system for Indico.
CS223: Software Engineering
SHIWA Desktop Cardiff University, Budapest, 3 rd July 2012.
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 PSI/PhUSE Single Day Event – SAS Applications – June 11, 2009 SAS Drug Development from the Inside Magnus Mengelbier Director.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
SHIWA Desktop Cardiff University David Rogers, Ian Harvey, Ian Taylor, Andrew Jones.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Images/Header.jpg Greek FOSS Conference 2010, National Technical University of Athens, May 2010 Achieving interoperability in digital libraries through.
Avoiding Redundancy in the Management of Technical Documentation and Models: Requirements Analysis and Prototypical Implementation for Enterprise Architecture.
UNICOS Application Builder Architecture
LOCO Extract – Transform - Load
Software Design and Architecture
The Re3gistry software and the INSPIRE Registry
Service-centric Software Engineering
Grey Literature Repositories and CRIS in a SOA Environment
ABHISHEK SHARMA ARVIND SRINIVASA BABU HEMANT PRASAD 08-OCT-2018
Automation of Control System Configuration TAC 18
Presentation transcript:

Biblio-transformation-engine: An open source framework and use cases in the digital libraries domain 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012 Kostas Stamatis, Nikolaos Konstantinou, Anastasia Manta, Christina Paschou and Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece

Agenda Introduction Motivation - the recurring need for data transformations The proposed solution Use cases / experience reports Summary – conclusions – future work 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Motivation Data transformations are needed everywhere in digital libraries / scholarly communication systems Painful and tedious procedure Many sub-tasks of the entire procedure reoccur and could be reused Need for systematic framework for data transformations to accelerate the process, reduce errors and facilitate reuse 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Analysis – basis steps in data transformations Retrieve source data records Apply processing: – (Optionally) Remove data records – (Optionally) Add/modify/delete field values within records – Transform data source to output format (implement the corresponding mapping) Generate desired output – Export to a file and/or directly update databases / external systems Need for incremental / selective data loading -> processing and output conditions may require repeated execution of the loading/processing cycle 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Design goals Customisable, non-intrusive, easy to use, integrate and extend (e.g. support a variety of data source types) Separation of concerns in development – e.g. development of transformation logic independent of data sources – Example: No need to be aware of MARC to develop a function to harmonise encoding of dates Support for recurring execution of the data loading/processing cycle according to specific criteria (e.g. useful for OAI-PMH) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

The biblio transformation engine 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Components of the engine Data Loader: Retrieves data from input source(s) according to DataLoading Spec ProcessingStep: Transforms input in some way – Filters: removes records according to specific criteria – Modifier: updates records according to specific criteria – Initializer: initializes data in processing steps (e.g. load author names to Filter) Output Generator: Creates the desired output (e.g. export file, direct update of database) Record abstraction: simple common interface for all types of records that allows complex transformation functions 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Processing workflow Load data – transform input to records If processing conditions are met, begin processing – sequential execution of Filters and Modifiers If output conditions are met, begin output – execution of OutputGenerator(s) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Processing workflow 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012 Load source data Processing conditions OK Generate output Output conditions OK Apply Filters & Modifiers Modify LoadingSpec YES NO YES

The transformation engine – data model 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Implementation FLOSS library developed in Java (maven used as a build tool) Configuration outside the code - dependency injection mechanisms of the Spring framework core container – Specification of Data Loader, Processing Steps, Conditions, OutputGenerator – Mapping from source to target format (for one-to- one field mappings) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Example of mapping configuration 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

FLOSS library Available at European Union Public License Feel free to download and use it! Looking forward to feedback, questions,… (contributions also welcome ) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Use case 1 – Generate Linked Open Data Sources: Repository records, legacy cultural material records, research information in CERIF Corresponding data loaders reused Filters/Modifiers can be totally agnostic of RDF and input formats Use Jena RDF library to generate RDF triples Add/generate appropriate identifiers/URI for each entity (either at the modifier or output generator level) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Use case 2 – Import/export data/export to/from repositories Source record formats: EndNote, RIS, Bibtex, UNIMARC Developed data loaders for each format, re- used output generator for DSpace Export to different formats and reference styles based on repository records – Implemented for DSpace – For reference styles uses the citeproc-js library and the Citation Style Language (CSL) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Use case 3 – Feed the VOA3R aggregator Get records of the Hellenic National Archive of Doctoral Dissertation (HEDI – didaktorika.gr) to the VOA3R aggregator (Virtual Open Access Agriculture & Aquaculture Repository) Developed subject-based filter and injected it into an enhanced OAI-PMH server using the library. ~1070 of approximately records, needed to apply techniques to cater for the distribution sparsity of “suitable” records combined with resumption token Seamless on-the-fly deployment and co-existence with sets targeted to other aggregators (DART, openarchives.gr) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Use case 4 – Feed Europeana Include in Europeana content from the Technical Chamber of Greece (TEE) Records in TEE library catalog (UNIMARC), available through a Z39.50 interface Developed Z39.50 data loader, appropriate filters and modifiers (independent of UNIMARC) Mapping to ESE implemented through a modifier ~6800 from the TEE records sent to Europeana Repeatable, automated procedure through an enhanced OAI-PMH server using the library International Conference on Theory and Practice of Digital Libraries (TPDL 2011), Berlin, Sept ember 2011

Future work Support more types of data transformations (contributions welcome ) Extend declarative specification of mappings to cover more sophisticated cases Configurable support for common data model to facilitate reuse of Filter and Modifier implementations Systematically study the user experience, identify and implement potential improvements 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Thank you for your attention! More info: kstamatis AT ekt.gr nkons AT ekt.gr amanta AT ekt.gr cpaschou AT ekt.gr nhoussos AT ekt.gr 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012