Digitization to preserve Cultural Heritage. A use case - Federico De Roberto works, Trujillo, 14th Apr 09 Antonio Calanducci

Slides:



Advertisements
Similar presentations
1 UNIONE EUROPEA Digital Libraries on the Grid to preserve cultural Heritage A use case: Federico De Roberto manuscripts Leandro Ciuffo on behalf of Dr.
Advertisements

Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
Building The Rare book Collection at Rijeka University Library in the Digital Age Ines Cerovac, Senka Tomljanović, Rijeka University Library Seminar The.
E-science grid facility for Europe and Latin America A Data Access Policy based on VOMS attributes in the Secure Storage Service Diego Scardaci.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
UNIONE EUROPEA Jorge Sevilla Cedillo Istituto Nazionale di Fisica Nucleare – Catania 2.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Easy Access to Grid infrastructures Dr. Harald Kornmayer (NEC Laboratories Europe) Dr. Mathias Stuempert (KIT-SCC, Karlsruhe) EGEE User Forum 2008 Clermont-Ferrand,
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) The Egyptian Grid Infrastructure Maha Metawei
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Integration of China Relics and gLite with gLibrary You MENG
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
REST API to develop application for mobile devices Mario Torrisi Dipartimento di Fisica e Astronomia – Università degli Studi.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Consorzio COMETA UNIONE EUROPEA On the use of e-Infrastructures for Arts and Cultural Heritage Prof. Roberto Barbera
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA GRIDICOM G. Foti 1), S. Scifo 2), R. Barbera 3), F. Portuese 4), S. Parisi 5)
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
UNIONE EUROPEA Jorge Sevilla Cedillo Istituto Nazionale di Fisica Nucleare – Catania 2.
The eCSG Mobile App Mario Torrisi INFN – Division of Catania 24 June 2013 Webinar on the eCSG 1.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
Vincenzo Spinoso EGI.eu/INFN
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Introduction to Data Management in EGI
AMGA Web Interface Salvatore Scifo INFN sez. Catania
VI-SEEM Data Repository
GSAF Grid Storage Access Framework
VI-SEEM Data Repository
GSAF Grid Storage Access Framework
AMGA Web Interface Vincenzo Milazzo
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Digitization to preserve Cultural Heritage. A use case - Federico De Roberto works, Trujillo, 14th Apr 09 Antonio Calanducci INFN Catania EGEE-III First Review Data Grids to preserve Cultural Heritage A use case Federico De Roberto works

Data Grids for conservation of cultural inheritance - EGEE-III First Review De Roberto cultural heritage De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic community numerous works Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, cuts, sketches, photos, etc.

Data Grids for conservation of cultural inheritance - EGEE-III First Review Digitalize to preserve them Some sheets are damaged (mold, crumbed pieces) and need physical restoration Digitalization to avoid the loss of this works, some of them still unpublished and relevant for the humanistic communities

Data Grids for conservation of cultural inheritance - EGEE-III First Review

Acquisition stage Digitalization of manuscripts, typescripts, printed works –TIFF Files, one per page, 600 dpi, about 100MB for A3 High resolution scans for in-depth examination –PDF, one per work, 300 dpi, varying file sizes MB Overall examination of works –8000 sheets/scans, 3 Terabyte of disk space –Different physical formats, A3/A4/custom size Embedded Metadata –TIFF with embedded metadata to provide scan physical features and information about the content ImageWidth, ImageHeight, XResolution, FileSize, CreationDate, ModifyDate Description, Keywords, CaptionWriter, Title, Author, Copyright Status, Copyright Notice –Added with Photoshop after the digitalization phase (Adobe XMP format)

Data Grids for conservation of cultural inheritance - EGEE-III First Review Goals and requirements Make those works accessible to the humanistic communities –Always on-line: 24 x 365 –Available from everywhere –Simple and easy-to-use interface for non-expert people Quickly find the desired document –Document organization according the physical and semantic metadata Organization by type/collections Dynamic filtering of search result sets according the selection of one or more document metadata Long-term preservation (digital preservation) –Multiple copies (replicas) spread in different geographical sites –Reliability of storage systems and replica redundancy to achieve secure preservation

Data Grids for conservation of cultural inheritance - EGEE-III First Review Data Management in Grid Storage Element(SE): front-end server aggregating a set of (pool) hard disks providing the illusion of a big (virtual) disk  “container” of users’ files  generally one SE per site  mirrored disks to avoid data loss in case of hardware failures  fine-grained set up of file permissions: owner, group, given lists of users and groups (Access Control Lists - ACLs)  Keep the mapping file-physical disk of the pool File Catalogue: provide a unique virtual file system among several Storage Elements: keep track of which SE (or SEs) contains a given file –keep track of replicas –mapping file-Storage Element filename

Data Grids for conservation of cultural inheritance - EGEE-III First Review Data Management in Grid Metadata Catalogue: store and organize metadata of files saved on Storage Elements and registered on the File Catalogue –metadata organized by “collection” (sort of directory)  each collection has its schema, a set of defined attributes: es: /deroberto/scans/manuscripts oTitle: “La lupa” oAuthor: “Federico De Roberto, Giovanni Verga” oGenre: “Tragedia Lirica” oPages: 34 oFileType: TIFF osurl: srm://infn-se- 01.ct.pi2s2.it/dpm/ct.pi2s2.it/home/cometa/generated/ /filede4d c4-4d66-95b6-3d69063ef081 –responsible to answer users’ queries against metadata describing files, to find out their physical location for future retrieval

Data Grids for conservation of cultural inheritance - EGEE-III First Review store the 8000 scans of De Roberto Heritage -- --> Data Grid Storage Elements enable an ubiquitous and 24/24h access to scientists ---> Web Application document organization for a quick search ---> Metadata Services long-term digital preservation of data ---> redundancy through Replicas of files on several Storage Elements simple and easy-to-use system for searches, organization, upload and download of digitalized documents on the Grid -----> What Data Grids can offer to them

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary features INFN-developed tool totally based It allows to store, organize, browse & search and retrieve digital assets on a Grid environment with an intuitive front-end Digital Assets examples :

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary as the iTunes for the Grid

Data Grids for conservation of cultural inheritance - EGEE-III First Review Browse & Search Assets can be browsed selecting a type (or collection) and selecting one or more filters: –attributes of the selected types, chosen from a defined list, used to narrow the result set Filter application is cascading and context-sensitive: the selection of a filter value dynamically influences subsequent filter values (“à la iTunes” browsing) –Classical search by description and keywords available too

Data Grids for conservation of cultural inheritance - EGEE-III First Review Organize assets “Types” and “Collections” definition by repository providers/admins: Assets are organized by type: - a list of specific attributes to describe each kind of asset to be managed by the system - hierarchical (a child type shares and extend parent’s attributes) - queried during searches and/or organized by collection: - Group together related assets even of different types; - Useful also to define subsets of assets belonging to the same type - Multiple category assignment per asset (tagging like) Collections

Data Grids for conservation of cultural inheritance - EGEE-III First Review Store & Retrieve Users can upload their local assets on one or more (creating replicas) Storage Elements of the Grid –Files already on grid SE can be registered in a gLibrary repository by the LFC File Catalogue browser Download from SEs to the users’ laptop/desktop: –selection of a replica link from a list Transfers are handled from the browser over HTTP/HTTPS provided that users have their own X.509 Grid Certificate imported

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary Architecture

Data Grids for conservation of cultural inheritance - EGEE-III First Review Technologies used Web standards: –Javascript/AJAX/JSON on the client side –PHP5 classes to implement business logic on the server side Grid technologies: –Storage Element SRM interface to get the TURLs (Transfer URLs) –Transfers handled with GridFTP and X.509 cert auth HTTPS –X.509 based Globus Security Infrastructure with the VOMS extensions to handle authentication and authorization (ACL based) on Metadata and Storage Elements –All grid services implemented with the EGEE gLite middleware (DPM Storage Elements, AMGA Metadata Catalogue, LFC File Catalogue, VOMS Services) Other standards: –Subset of XMP Metadata Dublin Core Metadata set will be included in the next release

Data Grids for conservation of cultural inheritance - EGEE-III First Review Metadata used in the DR digital library Types definition for the assets of the DR repository >>>> Attributes definition per type. Es (Manuscripts): AttributoValore Titlela lupa Authorfederico de roberto, giovanni verga Descriptionmanoscritto della tragedia lirica … Keywordsverismo, federico de roberto, la lupa, … CaptionWriterstefania iannizzotto, alessandro … CopyrightStatuscopyrighted PageNum5 TotalPages34 DocumentGenretragedia lirica PublicationYear1916 Publsherofficine tipo-litografiche barravecchia e balestrini FileTypePDF Resolution300 ScanQualitygood Filter defined per type. Es: -DocumentGenre -Title -FileType -ScanQuality -DocumentType -PublicationYear -PublicationStatus -Publisher -Location

Data Grids for conservation of cultural inheritance - EGEE-III First Review Actual Data Grid used COMETA Consortium Grid Infrastructure (completely based)

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary deployment At the moment, on the COMETA Consortium infrastructure –100% gLite 3.1 based (DPM SEs, AMGA, VOMS, LFC) it could be easily deployed to the EGEE production infrastructure for any VO –install the front-end on a VO server and enable the supported VOs list in the AMGA server

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary vs gCube gCube, SOA system to create DL on Grid –developed in the context of EU-funded DILIGENT/D4Science projects –collection of basic services (information, storage, metadata, indexing) implemented as their own WSRF (80%) and based gLite ones (20%) –more heavywight, diverge from traditional Grid Infrastruture in Europe gLibrary –best effort, unfunded development –100% gLite based –lightweight, easy deployable on current European infrastructure –provides essential working features –fast deployment of new repositories

Data Grids for conservation of cultural inheritance - EGEE-III First Review gLibrary and RESPECT Currently not proposed for the EGEE RESPECT program, for the following reason: –still in prototypal status –lack of abstract APIs, but those can be easily implementable –once APIs will be ready, it will be submitted to EGEE to include it in RESPECT Future development will go on in the context of IGI and future regional projects

Data Grids for conservation of cultural inheritance - EGEE-III First Review Who can benefit from gLibrary Community that has medium/large digital objects repositories to share in a short time –files can be taken on their servers if the are reachable from Internet and/or moved to Grid SEs Upcoming deployment of new reps: – musical scores of ancient Neapolitan musicians –digitized documents coming from Sicilian Library (Verga manuscripts) –Deployment of INFN Cern Document System (CDS) Invenio repositories on gLibrary

Data Grids for conservation of cultural inheritance - EGEE-III First Review Automatic metadata extraction There are some libraries that allow automatic metadata extraction from given file types: –exiftool –Imagero We have been used exiftool to extract XMP metadata from TIFF images, e.g.: –$ exiftool -E -XMP:Subject -XMP:Description -XMP:Rights -XMP:Title -XMP:Author -FileName -FileSize 001\ gli\ illustri\ amanti.tif –Subject : federico de roberto, manoscritti letterari, verismo, gli illustri amanti, la.mu.s.a., facoltà di lettere e filosofia catania, società di storia patria per la sicilia orientale –Description : manoscritto de gli illustri amanti, conservato presso la biblioteca della società di storia patria per la sicilia orientale –Rights : società di storia patria per la sicilia orientale catania.la.mu.s.a., facoltà di lettere e filosofia, università degli studi di catania –Title : gli illustri amanti –File Name : 001 gli illustri amanti.tif –File Size : 106 MB

Data Grids for conservation of cultural inheritance - EGEE-III First Review Live DEMO

Data Grids for conservation of cultural inheritance - EGEE-III First Review More screenshots...

Data Grids for conservation of cultural inheritance - EGEE-III First Review Other screenshots

Data Grids for conservation of cultural inheritance - EGEE-III First Review References Contact: Prototype of the De Roberto Digital Repository: – YouTube video: – Previous papers: – A.Calanducci, R.Barbera, J.Sevilla, A. De Filippo, M.Saso, S. Iannizzotto, F. De Mattia, F.Vicinanza. “Data Grids for Conservation of Cultural Inheritance”, 1st International Workshop on Data Grids for e- Science (DaGreS09) at ACM International Conference on Computing Frontiers, May 18-20, 2009 ( – A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, “A Digital Library Management System for the Grid”, Fourth International Workshop on Emerging Technologies for Next-generation GRID (ETNGRID 2007) at 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007), GET/INT Paris, France, June 18-20, 2007 (

Data Grids for conservation of cultural inheritance - EGEE-III First Review Questions Thank you for the attention