Batch metadata update Draft Natasa Bulatovic 14.06.2010.

Slides:



Advertisements
Similar presentations
OOR Federation Dan Cerys Jim Chatigny Mike Dean OOR Panel on Coordinating our OOR Software Development 19 February 2010 OOR Panel on Coordinating our OOR.
Advertisements

Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
ECHO Browse Reclassification Document ID: ECHO_Ops_Con_023 Version: 2.
WASTE MANAGEMENT ©2010 SciQuest USA Confidential 1 Powered by RFx User Guide.
MODULE 4 File and Folder Management. Creating file and folder A computer file is a resource for storing information, which is available to a computer.
Introduction to ZPORTAL Prepared by Houeida K. Charara Electronic Resources Librarian LAU Libraries ©2010.
Versioning of Digital Objects in a Fedora-based Repository Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
What is so good about Archie and RevMan 5
Definitions Collaboration – working together on team projects and sharing information, often through ad-hoc processes, to accomplish project goals. Document.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Exchange formats and APIs Questions – how and when to access metadata? – lifecycle/status – how to access? can things disappear? – is CSV enough? – is.
Git: Part 1 Overview & Object Model These slides were largely cut-and-pasted from tutorial/, with some additions.
This chapter is extracted from Sommerville’s slides. Text book chapter
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
Version Control with Subversion. What is Version Control Good For? Maintaining project/file history - so you don’t have to worry about it Managing collaboration.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Eprints Open Source Document Repository Henok Mikre ORNL and University of Tennessee Summer Intern 1.
CSC271 Database Systems Lecture # 4.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP Training Documentation Template.
Enrolment Services – Class Scheduling Fall 2014 Course Combinations.
School Census Summer 2008 for Secondary Schools Jim Haywood – Version 1.1.
Designing Persistency Delos NoE, Preservation Cluster Workshop: Persistency in Digital Libraries 14. February 2006, Oxford Internet Institute.
1 OPOL Training (OrderPro Online) Prepared by Christina Van Metre Independent Educational Consultant CTO, Business Development Team © Training Version.
March 2014 Basic Content Management Tuffolo Group Perspective TUFFOLO.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
10/18/2015 NORTEL NETWORKS CONFIDENTIAL – FOR TRAINING PURPOSES ONLY Global Documentation Evolution System Overview and End-to-End Process Training.
CSC350: Learning Management Systems COMSATS Institute of Information Technology (Virtual Campus)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SISP 6.1 Delta Training Documentation.
Getting Started with SharePoint 2010 Gareth Johns IT Skills Development Advisor.
Union Catalog Architecture Tsach Moshkovits, Development Team Leader Olybris, Ex Libris Seminar 2005 Kos, April 2005.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Agenda Basic concepts and demo in service portal (search KB articles)
Human Resources 1 G-Top Global Workflow Employee View September 2014.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Copyright (c) 2006 IBM Corporation; made available under the EPL v1.0 Update Policy ~ Where we are in 3.2.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Access Methods File store information When it is used it is accessed & read into memory Some systems provide only one access method IBM support many access.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
Source Control Repositories for Enabling Team Working Doncho Minkov Telerik Corporation
© 2015 Ex Libris | Confidential & Proprietary Yoel Kortick Senior Librarian Cataloging introductory flow.
Data access and sharing policies Ecosystem Approach Community of Practice (EA-CoP) Data access and sharing policies Towards the finalization of the document.
BOF-1147, JavaTM Technology and WebDAV: Standardizing Content Management Java and WebDAV Juergen Pill Team Leader Software AG Remy Maucherat Software Engineer.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
About FACES Collection of images of adult emotional facial stimuli (171 women and men) 6 emotions: neutrality, sadness, disgust, fear, anger and happiness.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Using E-Business Suite Attachments
Modules State College of Florida
VI-SEEM Data Discovery Service
Configuration Management and Prince2
Cataloging introductory flow
Database Database is a large collection of related data that can be stored, generally describes activities of an organization. An organised collection.
Chapter 2 Database Environment Pearson Education © 2009.
-A File System for Lots of Tiny Files
Machine Independent Features
Database Environment Transparencies
eSciDoc – Content model requirements
Digital Stewardship Curriculum
Designing and Using Normalization Rules
A. Götzfried Head of Unit B 5
Presentation transcript:

Batch metadata update Draft Natasa Bulatovic

What we need eSciDoc Repository Very fast metadata updates RDF Metadata (preferred) Searching, indexing Versioning (not high requirement for metadata) AA Relations, linking etc.

How we can achieve eSciDoc batch metadata update is very slow Metadata to be in separate store But splitting it completely from eSciDoc repository would be disadvantage as metadata+content are not considered as a single resource Drawback: only item level metadata with this proposal (not container/component-level metadata are covered)

What we can use eSciDoc Handlers Current services (aa, indexing, etc.) eSciDoc component – external url

How - 1? eSciDoc Repository eSciDoc Metadata Store Container Handler Item Handler Metadata Handler Core services Additional (or core) service

How - 2? eSciDoc Repository eSciDoc Metadata Store (RDF) Container Handler Item Handler Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Core services Additional (or core) service Item Some items (cmodel based) would store their metadata in an eSciDoc metadata store (link to graph node in metadata store) Current services (aa, indexing, etc.) eSciDoc component – external url Component 1 (image/Fulltext) Internal-managed Component 2 (image/Fulltext) External-url Component 3 (Metadata record) External-url External content (e.g. supplementar y material) MD-Face- 1 happin ess young female MD-Face- 2 happin ess young male

How - 3? eSciDoc Repository eSciDoc Metadata Store (RDF) Container Handler Item Handler Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Core services Additional (or core) service Item Component 1 (image/Fulltext) Internal-managed Component 2 (image/Fulltext) External-url Component 3 (Metadata record) External-url External content (e.g. supplementar y material) MD-Face-1 happiness young female Last- modification- date-metadata eSciDoc-AA- properties We would have to implement own-pdp for metadata update, but AA rules/policies are already stored in eSciDoc AA eSciDoc AA properties: context-id, created-by, last-date-of-modification, public- status Not clear completely, we need to work on this, but sufficient for start

Possible workflows (ingest) escidoc:item1 escidoc:item2 Ingest (or create- items) into eSciDoc Escidoc:item1(metad ata) Escidoc:item2(metad ata) Ingest into metadata store

Possible workflows (metadata batch update) escidoc:item1 escidoc:item2 Lock container/all members in eSciDoc Escidoc:item1( metadata) Escidoc:item2( metadata) only not withdrawn can be modified Updates of metadata store Escidoc:item1 Escidoc:item2 Unlock container/all members in eSciDoc Finish metadata updates Start metadata updates Statuses of items in eSciDoc core are independent from updates in metadata store Only pre-requisite: withdrawn can not be modified any longer (must be checked) Modification of the content via external-url does not version the resource If needed, versioning can be implemented in same principle (all metadata versions shall be kept in this case in metadata store) Metadata-store filters / search only has to be implemented separately Additionally, eSciDoc search service works with content-referenced by external url (according FIZ) (we might have to adopt the indexing of full-text a bit, checking with FIZ) Submit/Release/Withdraw (purely eSciDoc operations, as so far) Who can update? (All who can as well in escidoc, we have to implement the PDP for MDStore) Bookmarking: as before (only difference: via escidoc metadata are retrieved as content via locator) Metadata store must be persistent as escidoc:core see notes on Locking on slide 13

Possible workflows (metadata batch update – option) escidoc:item1 escidoc:item2 Lock container/all members in eSciDoc Escidoc:item1( metadata) Escidoc:item2( metadata) only not withdrawn can be modified Updates of metadata store Escidoc:item1 Escidoc:item2 Unlock container/all members in eSciDoc Finish metadata updates Start metadata updates *after items/containers are unlocked, they can be re-released again *During this release (if necessary) metadata records can be stored as additional component of the item *This would require again some time to finish all operations, but needs to be tested *see notes on Locking on slide 13 Release items/container (option) Grab referenced content and create another component as XML/RDF internal managed content in escidoc-item

What is missing in this draft? Containers/Components batch metadata edit Why: because containers/components can not have components! – Potential workaround: each container has md-record which contains only a link to metadata store (but is quite cumbersome) – Stage 2 for escidoc-core extension could be: allow for external metadata storage Integrity: in stage 1 metadata store could be separate storage, therefore integrity would be heavier to achieve – To check: maybe only allow it for released items? – Otherwise: MDStore must implement integrity checking towards eSciDoc (e.g. if items in escidoc were deleted, MDStore would still have the graph)

Which metadata to be managed in MD Store? Context vs. content model level settings – Recommended: Cmodel level settings Future options: – Utility: temporary put MD in Temporary MD Store for update (on selected context (independently on Cmodel) – Can be applied to any resource – Requires lock of resources – Requires time to finish the batch-update operations – If not in Cmodel (if metadata are taken for quick modification) => items with updated records have to be batch-updated (evtl. Released, submitted) in escidoc core (will take some time however – but possible) Whether to store metadata in MDStore or not? – Depends on use-cases e.g. if users would often have need to do batch updates (if that is actually part of normal work) – ToDo: find recommended top limit for batch updates in eSciDoc ( thousand items) However, these would depend of whether escidoc-core will take our model as native service or not (more modifications might be needed in this case)

On Locking eSciDoc resources will be locked in eSciDoc Only user who locked them can unlock them But anyhow, only one user e.g. collection editor can mark this operation as finished (see finish metadata updates) Do we need it? – Depends, for stage 1 we may not need it – Purpose: to prevent updates via both regular ItemHandler and MDStore at the same time

What is the metadata store? RDF/Jena based? Run team to decide: check Willy’s tests with triple store updates

Next steps Test, test, test Check with FIZ Check indexing when storage is external-url Check possibility to put separate stylesheet Note: this proposal is not final for escidoc-core updates – to bring this into escidoc-core slightly different approach should be considered external storage for MDRecords shall be allowed more integrity-level operations shall be implemented metadata-locator has to be moved from the component level to the item/container/component level) Metadata indexing … etc. etc.