Download presentation
Presentation is loading. Please wait.
Published byDerrick Richards Modified over 8 years ago
1
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN (giuseppe.larocca@ct.infn.it)giuseppe.larocca@ct.infn.it DARIAH ERIC 5 th General VCC Meeting Ljubljana, Slovenia – 22 April 2015 The INFN Digital Repository System
2
Outline The De Roberto Cultural Heritage’s use case The gLibrary Digital Repository System High-level architecture & Technologies used Data Management APIs (Some) examples of the Cultural Heritage VRC Summary and conclusions DARIAH ERIC 5 th General VCC Meeting 2
3
3 Federico De Roberto, an Italian writer born in Naples but spending his life in Catania, has left to the humanistic community numerous works DARIAH ERIC 5 th General VCC Meeting De Roberto Cultural Heritage’s use case Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, sketches, photos, etc.
4
4 Digitalization of manuscripts, typescripts, printed works TIFF Files, one per page, 600 dpi, about 100MB for A3 High resolution scans for in-depth examination 8000 sheets/scans, 3 TB of disk space Different physical formats, A3/A4/custom size DARIAH ERIC 5 th General VCC Meeting Acquisition stage Embedded Metadata TIFF with embedded metadata to provide scan physical features and information about the content ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate, Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice Added with Photoshop after the digitalization phase
5
5 Make those works accessible to the communities Always on-line: 24h x 365 and available from everywhere Simple and easy-to-use interface for non-expert people DARIAH ERIC 5 th General VCC Meeting Requirements Quickly find the desired document Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata Long-term preservation Multiple copies (replicas) spread in different geographical sites Reliability of storage systems and replica redundancy to achieve secure preservation
6
6 Store the 8000 scans of De Roberto Heritage & implement the Long-term digital preservation of data Grid & Cloud Storages! DARIAH ERIC 5 th General VCC Meeting Requirements Enable 24/24h access to scientists Web Service Document organization for a quick search Metadata services Simple and easy-to-use system for searches, organization, upload and download of digitalized documents on e- Infrastructures
7
DARIAH ERIC 5 th General VCC Meeting 7 The INFN Digital Repository System (https://glibrary.ct.infn.it/)
8
8 gLibrary is a platform developed by INFN that provides a simple yet powerful system to organize, search, store and retrieve “digital assets” in distributed repositories built on Grid/Cloud/local storage infrastructures hides the underlying technical details to the users “digital assets”: digital object + corresponding metadata DARIAH ERIC 5 th General VCC Meeting in a nutshell
9
9 Digital Object: Any files (PNG, JPG, PDF, TIFF, RAW, MP3, MP4, etc.) Metadata: A set of attributes describing a digital object (resolution, author, title, description, location(geo-coords), subject, etc.) Digital Asset: A digital object + its metadata Collection: A set of digital assets of the same type (Presentations, manuscripts) Repository: A library of digital assets organized by collections (all the Presentations and manuscripts) DARIAH ERIC 5 th General VCC Meeting abstractions
10
10 DARIAH ERIC 5 th General VCC Meeting architecture eToken service Front ends glibrary.ct.infn.it REST API AuthN / AuthZ Science Gateway User Tracking DB Call gLibrary REST API through API Server Gateway Metadata Service Local storage Grid storage Cloud Storage Authorization service GridBOX
11
11 gLibrary Core Services are implemented using Python and node.Js The gLibrary Metadata and File Transfer Services can be accessed through a set of REST API s REST APIs are developed as a WSGI module in Apache container DARIAH ERIC 5 th General VCC Meeting list of technologies used Grid-based and Federated Authentications are now supported! Data Transfer APIs are provided by GridBOX! Metadata service has been deployed using Django framework An OAI-PMH interface has been implemented on top of gLibrary Metadata services to allow external harvesters the extraction of gLibrary repositories’ metadata
12
12 DARIAH ERIC 5 th General VCC Meeting Repository Browser Web App
13
13 DARIAH ERIC 5 th General VCC Meeting e-Cultural Science Gateway in INDICATE
14
14 DARIAH ERIC 5 th General VCC Meeting Repository Uploader HTML5 Web App It allows to upload new assets to already created repository and specify metadata using a predefined schema
15
15 DARIAH ERIC 5 th General VCC Meeting Native Mobile clients for accessing repositories
16
16 DARIAH ERIC 5 th General VCC Meeting Federated Authentication (implementation for mobile appliances) 4. Extract Shibboleth token from response header 1. Get available IDPs Science Gateway 3. Open WebView glibrary.ct.infn.it REST API Now you can issue any API calls to gLibrary REST API 2. Supported IDPs list
17
17 DARIAH ERIC 5 th General VCC Meeting De Roberto Digital Repository from iPhone Demo presented at the EGEE UF5 in Uppsala Some screenshots here [1]1 YouTube video here [2]2
18
DARIAH ERIC 5 th General VCC Meeting 18 Data Management APIs for accessing Grid & Cloud Object Storages
19
Data Management APIs / Download from Grid SE 19 DARIAH ERIC 5 th General VCC Meeting
20
Data Management APIs / Upload to Grid SE (1/3) 20 DARIAH ERIC 5 th General VCC Meeting
21
21 DARIAH ERIC 5 th General VCC Meeting Data Management APIs / Upload to Grid SE (2/3)
22
22 DARIAH ERIC 5 th General VCC Meeting Data Management APIs / Upload to Grid SE (3/3)
23
Data Management APIs / Download from Cloud Object Storage 2323 DARIAH ERIC 5 th General VCC Meeting
24
2424 Data Management APIs / Upload to Cloud Object Storage
25
DARIAH ERIC 5 th General VCC Meeting 25 Repository Management for interacting with the Digital Repository System
26
DARIAH ERIC 5 th General VCC Meeting 26
27
Application’s workflow Register analytics on repository Science Gateway run jobs Browse digital assets DARIAH ERIC 5 th General VCC Meeting 27 HPC Clusters Register DOI Metadata Service eToken service glibrary.ct.infn.it User Tracking DB Local storage Grid storage Cloud Storage REST API
28
run jobs Get digital assets DARIAH ERIC 5 th General VCC Meeting 28 HPC Clusters Metadata Service eToken service glibrary.ct.infn.it User Tracking DB Local storage Grid storage Cloud Storage REST API Science Gateway Application’s workflow
29
29 DARIAH ERIC 5 th General VCC Meeting Summary and conclusions gLibrary aims to provide a simple framework to manage digital assets on distributed storage, hiding underlying technical infrastructure details Current features: REST APIs to access available digital assets Security: Support for Federated Authentication Usability: Several gLibrary front-ends for web and mobile scenarios
30
Thank you ! 30 DARIAH ERIC 5 th General VCC Meeting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.