Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr.

Similar presentations


Presentation on theme: "1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr."— Presentation transcript:

1 1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr. Antonio Calanducci (antonio.calanducci@ct.infn.it)antonio.calanducci@ct.infn.it Istituto Nazionale di Fisica Nucleare – Catania

2 2 Federico De Roberto cultural heritage De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic communities numerous works Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, draft with handwriting corrections, magazines, cuts, sketches, photos 3

3 3 Fondo letterario De Roberto Digitalization of manuscripts, typescripts, printed works –TIFF Files, one per page, 600 dpi, about 100MB for A3 High resolution scans for in-depth examination –Multipage PDF, one per work, 300 dpi, varying file sizes 40- 400MB Overall examination of works –8000 scans, 2 Terabyte of disk space –Different physical formats, A3/A4/custom size 55 Digitalization

4 4 Fondo letterario De Roberto Embedded Metadata –TIFF with embedded metadata to provide scan physical features and information about the content ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice –Added with Photoshop after the digitalization phase (Adobe XMP format) 55 Metadata

5 5 Obiettivi e requisiti Make those works accessible to the humanistic research communities Immediately find the desired document –Document organization according the physical and semantic metadata By type By category Dynamic filtering of search result set according the selection of one or more document metadata Long-term preservation (digital preservation) –Multiple copies (replicas) spread in different geographical sites –Reliability of storage systems and replica redundancy to achieve secure preservation 66 Goals and requirements

6 6 Data Management in Grid Storage Element(SE): front-end server aggregating a set of (pool) hard disks providing the illusion of a big (virtual) disk 77 container of users files generally one SE per site mirrored disks to avoid data loss in case of hardware failures fine-grained set up of file permissions: owner, group, given lists of users and groups (Access Control Lists - ACLs) Keep the mapping file-physical disk of the pool File Catalogue: provide a unique virtual file system among several Storage Elements: keep track of which SE (or SEs) contains a given file –keep track of replicas –mapping file-Storage Element filename Data Management in Grid

7 7 Metadata Catalogue: store and organize metadata of files saved on Storage Elements and registered on the File Catalogue –metadata organized by collection (sort of directory) each collection has its schema, a set of defined attributes: es: /deroberto/scans/manuscripts oTitle: La lupa oAuthor: Federico De Roberto, Giovanni Verga oGenre: Tragedia Lirica oPages: 34 oFileType: TIFF osurl: srm://infn-se- 01.ct.pi2s2.it/dpm/ct.pi2s2.it/home/cometa/generated/2008-06- 14/filede4d6266-56c4-4d66-95b6-3d69063ef081 –responsible to answer users queries against metadata describing files, to find out their physical location for future retrieval 88 Data Management in Grid

8 8 99 The Sicilian Grid COMETA

9 9 99 300+ TBytes International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Current deployment - (COMETA Grid)

10 10 gLibrary project Challenge: –to offer a intuitive, flexible, secure and multiplatform system to handle digital libraries on a Grid infrastructure Digital Assets: (items handled in a digital library) –Any kind of content and/or media represented as a digital file. Es.: Images (Photos, Scans, Screenshots, Logos,...) Audio (Songs, Sound Tracks, Ringtones,...) Video (Movie, Trailers, Mobile phone videos,...) Presentations, Letters, Reports, Invoices, Receipts E-Books, E-Mails, Papers, Magazines etc etc... gLibrary allows to store, organize, search and retrieve digital assets on a Grid environment 10 The gLibrary project

11 11 Caratteristiche di gLibrary Intuitive front-end implemented as a web application: –accessible from everywhere, it needs only Internet access –usable by any web browser (Internet Explorer, Mozilla Firefox, Opera, Safari) from any operating system (Windows, Linux, Mac Os X) ---> multiplatform It requires a Java Virtual Machine (available on any OS) 11 –Extensive usage of AJAX (Asyncronous JavaScript and XML) make web applications dynamic and interactive providing a desktop-like user experience International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 gLibrary features

12 12 Organizzazione delle DL Types and Categories definition by repository providers: 12 Assets are organized by type: –a list of specific attributes to describe each kind of asset to be managed by the system –hierarchical (a child type shares and extend parents attributes) –queried during searches and/or organized by category: –Group together related assets of different types; –Useful also to define subsets of assets belonging to the same type –Multiple category assignment per asset (tagging) International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Assets organization

13 13 Ricerca intuitiva Assets are browsed selecting a type (or category) and selecting one or more filters: –attributes of the selected types, chosen from a defined list, used to narrow the result set Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values (à la iTunes browsing) –Classical search by description and keywords available too 13 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Intuitive and instant search

14 14 Dettaglio dellasset selezionato 14 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Details of asset selection

15 15 Memorizzare e recuperare gli assets Users can upload their local assets on one or more (creating replicas) Storage Elements of the Grid –Uploads managed through Java Applets –Files already on SE can be included in a digital library by the File File Catalogue browser Download from SEs to the users laptop/desktop: –selection of a replica link from a list –download java applet 15 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Assets storing and retrieval

16 16 Sicurezza e gestione degli utenti Being a grid application, gLibrary inherits all the security features coming from the underlying technologies –X.509 digital certificates authentication –Transfers based on proxy authorization –VOMS (Virtual Organization Membership Service) usage to distinguish users and assign the right permissions 3 kind of user role for each digital library deployed: –gLibraryManager: define the hierarchies of types and categories (with their attributs) and filters grant submission rights to generic users –gLibrarySubmitter: upload new assets and define permissions on its entries (fine-grained rights assignment) –generic users: enabled to searches and downloads (on assets they have rights to) 16 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Security and user management

17 17 Architettura di gLibrary 17 User Login applet AMGA Metadata Catalogue LFC File Catalogu e SE Upload/Download applet VOMS Server 1. local proxy creation 2. proxy transfer over HTTPS 3. get role 6. direct transfer from SE 5. proxy retrieved over HTTPS 4. find the right asset gLibrary architecture

18 18 Possibili scenari duso Suitable to communities with needs of sharing big amount of digital resources in a easy and secure way Some examples: –consumer users: sharing of photos, music, movies, documents, office, etc.. –enterprise/industrial/research communities: presentations, invoices, layouts, sounds, scans, manuscripts :) Each community defines how to describe their content (and how to search for it) setting permissions in order to grant or deny access to specific users, groups and whole organizations, exploiting the huge storage capabilities, organization and security features offered by a Grid infrastructure A use case: De Roberto Digital Repository 18 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Possible usage scenarios

19 19 Goals: –to store the 8000 scans of De Roberto Heritage ----> Grid Storage Elements –to enable an ubiquitous and 24/24h access to scientists ---> web application –document organization for a fast search ---> metadata services –long-term digital preservation of data ---> redundancy through replicas of files on several Storage Elements –easy-to-use interface for searches, organization, upload and download of digitalized documents -----> 19

20 20 Metadata per la DR digital library Types definition for the assets of the DR library Attributes definition per type. Es: 20 AttributoValore Titlela lupa Authorfederico de roberto, giovanni verga Descriptionmanoscritto della tragedia lirica … Keywordsverismo, federico de roberto, la lupa, … CaptionWriterstefania iannizzotto, alessandro … CopyrightStatuscopyrighted PageNum5 TotalPages34 DocumentGenretragedia lirica PublicationYear1916 Publsherofficine tipo-litografiche barravecchia e balestrini FileTypePDF Resolution300 ScanQualitygood Filter definition per type. Es: DocumentGenre Title FileType ScanQuality DocumentType PublicationYear PublicationStatus Publisher Location Metadata used in the DR digital library

21 21 Browsing and filtering screenshot 21

22 22 Downloading 22 Downloading

23 23 Download completato 23 Download completed

24 24 Upload 24 International Workshop on Cyberinfrastructure and Archeology, San Mianiato (PI), 16th-17th Oct 08 Upload

25 25 Estrazione automatica dei metadati There are some libraries that allow automatic metadata extraction from given file types: –exiftool –Imagero Both have been able to read XMP metadata. Es: –$ exiftool -E -XMP:Subject -XMP:Description -XMP:Rights -XMP:Title -XMP:Author -FileName -FileSize 001\ gli\ illustri\ amanti.tif –Subject : federico de roberto, manoscritti letterari, verismo, gli illustri amanti, la.mu.s.a., facoltà di lettere e filosofia catania, società di storia patria per la sicilia orientale –Description : manoscritto de gli illustri amanti, conservato presso la biblioteca della società di storia patria per la sicilia orientale –Rights : società di storia patria per la sicilia orientale catania.la.mu.s.a., facoltà di lettere e filosofia, università degli studi di catania –Title : gli illustri amanti –File Name : 001 gli illustri amanti.tif –File Size : 106 MB We are working to integrate those libraries to speed up the acquisition stage 25 Automatic metadata extraction

26 26 Conclusioni gLibrary challenge is to offer a flexible, multiplatform, secure and easy-to-use system to handle digital libraries on Grid –flexible: allow to handle any kind of asset, defined by the library admin –multiplatform: implemented as a web application with Java applets can be accessed by any OS –secure: fine grained permission (Grid certificate based) can be set for assets –easy-to-use: its intuitive interface, with à la iTunes browser allows to find the desired asset with just a few mouse clicks In a few weeks a prototype of the De Roberto Digital Repository has been implemented with gLibrary. It will enable scientists to access those works from anywhere and anytime in a simple and smart way and it will allow the long-term preservation of this cultural heritage 26 Summary

27 27 Riferimenti Contact: antonio.calanducci@ct.infn.it, lamusa@unict.itantonio.calanducci@ct.infn.it Prototype of the De Roberto Digital Repository: –https://glibrary.ct.infn.it/deroberto/https://glibrary.ct.infn.it/deroberto/ gLibrary project homepage (currently under maintaince): –https://glibrary.ct.infn.it/https://glibrary.ct.infn.it/ Papers: A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, A Digital Library Management System for the Grid, Fourth International Workshop on Emerging Technologies for Next-generation GRID (ETNGRID 2007) at 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007), GET/INT Paris, France, June 18-20, 2007 (http://etngrid.diit.unict.it/2007/index.html).http://etngrid.diit.unict.it/2007/index.html A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, gLibrary: Digital Asset Management System for the Grid, IEEE Hypermedia and Grid Systems Conference at 30th Jubilee International Convention MIPRO, Opatija, Croatia, May 21-25 2007 (http://www.mipro.hr/)http://www.mipro.hr/ 27 References

28 28 Thanks for your attention https://glibrary.ct.infn.it/deroberto / Thanks for your attention https://glibrary.ct.infn.it/deroberto /


Download ppt "1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr."

Similar presentations


Ads by Google