Presentation is loading. Please wait.

Presentation is loading. Please wait.

Batch metadata update Draft Natasa Bulatovic 14.06.2010.

Similar presentations


Presentation on theme: "Batch metadata update Draft Natasa Bulatovic 14.06.2010."— Presentation transcript:

1 Batch metadata update Draft Natasa Bulatovic 14.06.2010

2 What we need eSciDoc Repository Very fast metadata updates RDF Metadata (preferred) Searching, indexing Versioning (not high requirement for metadata) AA Relations, linking etc.

3 How we can achieve eSciDoc batch metadata update is very slow Metadata to be in separate store But splitting it completely from eSciDoc repository would be disadvantage as metadata+content are not considered as a single resource Drawback: only item level metadata with this proposal (not container/component-level metadata are covered)

4 What we can use eSciDoc Handlers Current services (aa, indexing, etc.) eSciDoc component – external url

5 How - 1? eSciDoc Repository eSciDoc Metadata Store Container Handler Item Handler Metadata Handler Core services Additional (or core) service

6 How - 2? eSciDoc Repository eSciDoc Metadata Store (RDF) Container Handler Item Handler Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Core services Additional (or core) service Item Some items (cmodel based) would store their metadata in an eSciDoc metadata store (link to graph node in metadata store) Current services (aa, indexing, etc.) eSciDoc component – external url Component 1 (image/Fulltext) Internal-managed Component 2 (image/Fulltext) External-url Component 3 (Metadata record) External-url External content (e.g. supplementar y material) MD-Face- 1 happin ess young female MD-Face- 2 happin ess young male

7 How - 3? eSciDoc Repository eSciDoc Metadata Store (RDF) Container Handler Item Handler Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Core services Additional (or core) service Item Component 1 (image/Fulltext) Internal-managed Component 2 (image/Fulltext) External-url Component 3 (Metadata record) External-url External content (e.g. supplementar y material) MD-Face-1 happiness young female Last- modification- date-metadata eSciDoc-AA- properties We would have to implement own-pdp for metadata update, but AA rules/policies are already stored in eSciDoc AA eSciDoc AA properties: context-id, created-by, last-date-of-modification, public- status Not clear completely, we need to work on this, but sufficient for start

8 Possible workflows (ingest) escidoc:item1 escidoc:item2 Ingest (or create- items) into eSciDoc Escidoc:item1(metad ata) Escidoc:item2(metad ata) Ingest into metadata store

9 Possible workflows (metadata batch update) escidoc:item1 escidoc:item2 Lock container/all members in eSciDoc Escidoc:item1( metadata) Escidoc:item2( metadata) only not withdrawn can be modified Updates of metadata store Escidoc:item1 Escidoc:item2 Unlock container/all members in eSciDoc Finish metadata updates Start metadata updates Statuses of items in eSciDoc core are independent from updates in metadata store Only pre-requisite: withdrawn can not be modified any longer (must be checked) Modification of the content via external-url does not version the resource If needed, versioning can be implemented in same principle (all metadata versions shall be kept in this case in metadata store) Metadata-store filters / search only has to be implemented separately Additionally, eSciDoc search service works with content-referenced by external url (according FIZ) (we might have to adopt the indexing of full-text a bit, checking with FIZ) Submit/Release/Withdraw (purely eSciDoc operations, as so far) Who can update? (All who can as well in escidoc, we have to implement the PDP for MDStore) Bookmarking: as before (only difference: via escidoc metadata are retrieved as content via locator) Metadata store must be persistent as escidoc:core see notes on Locking on slide 13

10 Possible workflows (metadata batch update – option) escidoc:item1 escidoc:item2 Lock container/all members in eSciDoc Escidoc:item1( metadata) Escidoc:item2( metadata) only not withdrawn can be modified Updates of metadata store Escidoc:item1 Escidoc:item2 Unlock container/all members in eSciDoc Finish metadata updates Start metadata updates *after items/containers are unlocked, they can be re-released again *During this release (if necessary) metadata records can be stored as additional component of the item *This would require again some time to finish all operations, but needs to be tested *see notes on Locking on slide 13 Release items/container (option) Grab referenced content and create another component as XML/RDF internal managed content in escidoc-item

11 What is missing in this draft? Containers/Components batch metadata edit Why: because containers/components can not have components! – Potential workaround: each container has md-record which contains only a link to metadata store (but is quite cumbersome) – Stage 2 for escidoc-core extension could be: allow for external metadata storage Integrity: in stage 1 metadata store could be separate storage, therefore integrity would be heavier to achieve – To check: maybe only allow it for released items? – Otherwise: MDStore must implement integrity checking towards eSciDoc (e.g. if items in escidoc were deleted, MDStore would still have the graph)

12 Which metadata to be managed in MD Store? Context vs. content model level settings – Recommended: Cmodel level settings Future options: – Utility: temporary put MD in Temporary MD Store for update (on selected context (independently on Cmodel) – Can be applied to any resource – Requires lock of resources – Requires time to finish the batch-update operations – If not in Cmodel (if metadata are taken for quick modification) => items with updated records have to be batch-updated (evtl. Released, submitted) in escidoc core (will take some time however – but possible) Whether to store metadata in MDStore or not? – Depends on use-cases e.g. if users would often have need to do batch updates (if that is actually part of normal work) – ToDo: find recommended top limit for batch updates in eSciDoc (5000-6000 thousand items) However, these would depend of whether escidoc-core will take our model as native service or not (more modifications might be needed in this case)

13 On Locking eSciDoc resources will be locked in eSciDoc Only user who locked them can unlock them But anyhow, only one user e.g. collection editor can mark this operation as finished (see finish metadata updates) Do we need it? – Depends, for stage 1 we may not need it – Purpose: to prevent updates via both regular ItemHandler and MDStore at the same time

14 What is the metadata store? RDF/Jena based? Run team to decide: check Willy’s tests with triple store updates

15 Next steps Test, test, test Check with FIZ Check indexing when storage is external-url Check possibility to put separate stylesheet Note: this proposal is not final for escidoc-core updates – to bring this into escidoc-core slightly different approach should be considered external storage for MDRecords shall be allowed more integrity-level operations shall be implemented metadata-locator has to be moved from the component level to the item/container/component level) Metadata indexing … etc. etc.


Download ppt "Batch metadata update Draft Natasa Bulatovic 14.06.2010."

Similar presentations


Ads by Google