Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.

Similar presentations


Presentation on theme: "Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC."— Presentation transcript:

1 Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN (giuseppe.larocca@ct.infn.it)giuseppe.larocca@ct.infn.it DARIAH ERIC 5 th General VCC Meeting Ljubljana, Slovenia – 22 April 2015 The INFN Digital Repository System

2 Outline  The De Roberto Cultural Heritage’s use case  The gLibrary Digital Repository System  High-level architecture & Technologies used  Data Management APIs  (Some) examples of the Cultural Heritage VRC  Summary and conclusions DARIAH ERIC 5 th General VCC Meeting 2

3 3 Federico De Roberto, an Italian writer born in Naples but spending his life in Catania, has left to the humanistic community numerous works DARIAH ERIC 5 th General VCC Meeting De Roberto Cultural Heritage’s use case Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, sketches, photos, etc.

4 4 Digitalization of manuscripts, typescripts, printed works TIFF Files, one per page, 600 dpi, about 100MB for A3 High resolution scans for in-depth examination 8000 sheets/scans, 3 TB of disk space Different physical formats, A3/A4/custom size DARIAH ERIC 5 th General VCC Meeting Acquisition stage Embedded Metadata TIFF with embedded metadata to provide scan physical features and information about the content ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate, Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice Added with Photoshop after the digitalization phase

5 5 Make those works accessible to the communities Always on-line: 24h x 365 and available from everywhere Simple and easy-to-use interface for non-expert people DARIAH ERIC 5 th General VCC Meeting Requirements Quickly find the desired document Document organization according the physical and semantic metadata  Organization by type/collections  Dynamic filtering of search result sets according the selection of one or more document metadata Long-term preservation Multiple copies (replicas) spread in different geographical sites Reliability of storage systems and replica redundancy to achieve secure preservation

6 6 Store the 8000 scans of De Roberto Heritage & implement the Long-term digital preservation of data Grid & Cloud Storages! DARIAH ERIC 5 th General VCC Meeting Requirements Enable 24/24h access to scientists Web Service Document organization for a quick search Metadata services Simple and easy-to-use system for searches, organization, upload and download of digitalized documents on e- Infrastructures

7 DARIAH ERIC 5 th General VCC Meeting 7 The INFN Digital Repository System (https://glibrary.ct.infn.it/)

8 8 gLibrary is a platform developed by INFN that provides a simple yet powerful system to organize, search, store and retrieve “digital assets” in distributed repositories built on Grid/Cloud/local storage infrastructures hides the underlying technical details to the users “digital assets”: digital object + corresponding metadata DARIAH ERIC 5 th General VCC Meeting in a nutshell

9 9 Digital Object: Any files (PNG, JPG, PDF, TIFF, RAW, MP3, MP4, etc.) Metadata: A set of attributes describing a digital object (resolution, author, title, description, location(geo-coords), subject, etc.) Digital Asset: A digital object + its metadata Collection: A set of digital assets of the same type (Presentations, manuscripts) Repository: A library of digital assets organized by collections (all the Presentations and manuscripts) DARIAH ERIC 5 th General VCC Meeting abstractions

10 10 DARIAH ERIC 5 th General VCC Meeting architecture eToken service Front ends glibrary.ct.infn.it REST API AuthN / AuthZ Science Gateway User Tracking DB Call gLibrary REST API through API Server Gateway Metadata Service Local storage Grid storage Cloud Storage Authorization service GridBOX

11 11 gLibrary Core Services are implemented using Python and node.Js The gLibrary Metadata and File Transfer Services can be accessed through a set of REST API s REST APIs are developed as a WSGI module in Apache container DARIAH ERIC 5 th General VCC Meeting list of technologies used Grid-based and Federated Authentications are now supported! Data Transfer APIs are provided by GridBOX! Metadata service has been deployed using Django framework An OAI-PMH interface has been implemented on top of gLibrary Metadata services to allow external harvesters the extraction of gLibrary repositories’ metadata

12 12 DARIAH ERIC 5 th General VCC Meeting Repository Browser Web App

13 13 DARIAH ERIC 5 th General VCC Meeting e-Cultural Science Gateway in INDICATE

14 14 DARIAH ERIC 5 th General VCC Meeting Repository Uploader HTML5 Web App It allows to upload new assets to already created repository and specify metadata using a predefined schema

15 15 DARIAH ERIC 5 th General VCC Meeting Native Mobile clients for accessing repositories

16 16 DARIAH ERIC 5 th General VCC Meeting Federated Authentication (implementation for mobile appliances) 4. Extract Shibboleth token from response header 1. Get available IDPs Science Gateway 3. Open WebView glibrary.ct.infn.it REST API Now you can issue any API calls to gLibrary REST API 2. Supported IDPs list

17 17 DARIAH ERIC 5 th General VCC Meeting De Roberto Digital Repository from iPhone Demo presented at the EGEE UF5 in Uppsala Some screenshots here [1]1 YouTube video here [2]2

18 DARIAH ERIC 5 th General VCC Meeting 18 Data Management APIs for accessing Grid & Cloud Object Storages

19 Data Management APIs / Download from Grid SE 19 DARIAH ERIC 5 th General VCC Meeting

20 Data Management APIs / Upload to Grid SE (1/3) 20 DARIAH ERIC 5 th General VCC Meeting

21 21 DARIAH ERIC 5 th General VCC Meeting Data Management APIs / Upload to Grid SE (2/3)

22 22 DARIAH ERIC 5 th General VCC Meeting Data Management APIs / Upload to Grid SE (3/3)

23 Data Management APIs / Download from Cloud Object Storage 2323 DARIAH ERIC 5 th General VCC Meeting

24 2424 Data Management APIs / Upload to Cloud Object Storage

25 DARIAH ERIC 5 th General VCC Meeting 25 Repository Management for interacting with the Digital Repository System

26 DARIAH ERIC 5 th General VCC Meeting 26

27 Application’s workflow Register analytics on repository Science Gateway run jobs Browse digital assets DARIAH ERIC 5 th General VCC Meeting 27 HPC Clusters Register DOI Metadata Service eToken service glibrary.ct.infn.it User Tracking DB Local storage Grid storage Cloud Storage REST API

28 run jobs Get digital assets DARIAH ERIC 5 th General VCC Meeting 28 HPC Clusters Metadata Service eToken service glibrary.ct.infn.it User Tracking DB Local storage Grid storage Cloud Storage REST API Science Gateway Application’s workflow

29 29 DARIAH ERIC 5 th General VCC Meeting Summary and conclusions gLibrary aims to provide a simple framework to manage digital assets on distributed storage, hiding underlying technical infrastructure details Current features: REST APIs to access available digital assets Security: Support for Federated Authentication Usability: Several gLibrary front-ends for web and mobile scenarios

30 Thank you ! 30 DARIAH ERIC 5 th General VCC Meeting


Download ppt "Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC."

Similar presentations


Ads by Google