Digital Library Storage using iRODS Data Grids Mark Hedges, Tobias Blanke Centre for e-Research, King’s College London Arts and Humanities Data Service.

Slides:



Advertisements
Similar presentations
Texas Digital Library Services Preservation Network.
Advertisements

Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
ASPiS - Architecture for a Shibboleth-Protected iRODS System Mark Hedges, Tobias Blanke Centre for e-Research, Kings College London Adil Hasan, Jens Jensen.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Special collections and digital libraries: a new role for consortia? Dale Flecker Harvard University Library.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Distributed components
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
"Keeping alert: issues to know today for long-term digital preservation with repositories" Neil Beagrie Fedora Users Group Open Repositories Southampton.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
OU Digital Library development project Liz Mallett – Project Manager James Alexander – Project Developer 25 January 2012.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Aims and Objectives “ The Archaeology Data Service (ADS) supports research, learning and teaching with high quality and dependable digital resources.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Research Data Management At the Smithsonian Using SIdora Nano Tech Working Group May 15, 2014.
Using SRB and iRODS with the Cheshire3 Information Framework Building Data Grids with iRODS May, 2008 National e-Science Centre Edinburgh Dr Robert.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
CHAPTER TEN AUTHORING.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Crystal25 Hunter Valley, Australia, 11 April 2007 Crystal25 Hunter Valley, Australia, 11 April 2007 JAINIS (JCU and Indiana Instrument Services): A Grid.
ASPiS Security Jens Jensen Science and Technology Facilities Council AHM, 8-11 Sep 2008 Edinburgh.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Building flexible workflows with Fedora at the University of York Julie Allinson and Frank Feng The 5 th International Conference on Open Repositories.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Introduction to ASP.NET development. Background ASP released in 1996 ASP supported for a minimum 10 years from Windows 8 release ASP.Net 1.0 released.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Archiving CAD in Archaeology: Ingest to Dissemination (or The ADS experience to date) Kieron Niven Archaeology Data Service, University of York, UK.
Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Lecture 3 Prescriptive Process Models
An Overview of Data-PASS Shared Catalog
Software Design and Architecture
Flexible Extensible Digital Object Repository Architecture
CS 501: Software Engineering Fall 1999
Flexible Extensible Digital Object Repository Architecture
Experiences of the Digital Repository of Ireland
Strategic uses of Web Content Management Systems
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

Digital Library Storage using iRODS Data Grids Mark Hedges, Tobias Blanke Centre for e-Research, King’s College London Arts and Humanities Data Service Arts and Humanities e-Science Support Centre Adil Hasan Rutherford Appleton Laboratory Science and Technology Facilities Council

Overview Background: AHDS and Centre for e- Research Background: data deluge and broader data challenge Digital libraries and e-research infrastructures Digital libraries and data grids (SRB/iRODS)

What is/was the AHDS Arts and Humanities Data Service Established 1996, funded until 2008 Distributed structure Mission: to collect, preserve and distribute digital resources produced by and for arts and humanities research (mainly in the UK)

What is CeRch? Centre for e-Research at King’s College London Established 2007 Incorporates staff and expertise of AHDS and other groups such as AHeSSC Continuity, but change of focus

Research data management In use now? Future use? Data Curation Data Preservation Curation: The activity of managing and promoting the use of data from its creation to ensure it is fit-for-purpose and remains available for discovery and re-use. Preservation: An archiving activity in which data are maintained over time so they can still be accessed and understood through changes in technology

Data Challenge in the Humanities History Archaeology Literature/Linguistics Visual Arts Performing Arts Ongoing growth of corpora due to major digitisation projects Highly diverse in type and size: images, text, music, video, database, multi-media Require specialised knowledge Highly complex, contextual, fuzzy, uncertain, inconsistent, incomplete Rapid expansion: AHDS data size increased 20-fold between 2005 & 2008 Increasing number of large objects (e.g. video, archaeology scans)

Digital library systems Fedora Commons (at AHDS/CeRch) Supports digital resources that are diverse and structurally complex Flexible metadata management Disseminator framework supporting more complex and application specific processing of digital resources Not a stand-alone DL, but a component of an integrated research infrastructure

Issues Focuses on support for structure/ complexity rather than storage issues Doesn't natively support distribution of data Performance limitations when processing large objects

Data Grids Storage Resource Broker (SRB), a widely-used data grid technology developed by the San Diego Super Computer Center Addresses storage issues for digital repository and preservation environments Provides uniform, searchable access to virtualised, distributed resources, so DL is insulated from: –physical location of data –types of storage –migrating to new hardware Scalable – as library grows, new resources can be added dynamically Auditing facilities

Limitations Not open source Not easy to exclude unwanted services Very effective for storage management, but not integrated with wider infrastructure. Not easy to integrate application-specific requirements (either change the core code, or implement in client, or use proxy commands) No built-in implementation of workflow (have to script this outside SRB, whether server or client side), or of asynchronous processing. Requires choreography between SRB admin and person running workflow. Relatively restricted support for metadata extension (Fedora supports but how to integrate)

iRODS The open source successor to SRB Provides similar data virtualisation Rule Engine allows data management policies to defined and realised as rules Policy virtualisation – insulation from how policies are implemented Execution of rules driven by events System level rules have great potential to ‘hide’ required data management operations from user/application level Event-condition-action model

What are rules? (1) Rules (or policies) are sets of operations that you want to impose on an object (file, user, resource, etc). –The operations are called “micro-services” –Each micro-service is a C-app that executes and does something (e.g. checksum data, convert a file from one format to another). –Micro-services are transactional (recovery operations created for each micro-service). In most cases you can define server-side workflow as a rule controlling a set of {micro- services, rules}.

What are rules? (2) Rule cast as {event: condition: action set: recovery set:}. –Can build rules of rules. –Allows you to model complex workflows. Supports execution of rules on most convenient resource (usually run on server connect to). Supports delayed execution of rules (i.e. “run this rule this evening”). Supports periodic execution of rules (i.e. “run this rule every evening”).

iRODS rules The components of a rule definition are as follows: actionDef | condition | workflowChain | recoveryChain Where: actionDef identifies the action to be carried out condition is necessary condition for execution workflowChain is sequence of actions to be executed recoveryChain is corresponding sequence of recovery actions (to ensure consistent state). Rule can be built up cumulatively from other rules. Data passed into/within rules (via parameters/context). Note: syntax may change in near future.

Example rule - preservation Executed when an object has been ingested acPostProcForPut | | acCheckObjectIntegrity## acAnalyseObject## acNormaliseObject## msiSysReplDataObj(PresRescGrp,all) | nop##nop##nop##msiCleanUpReplicas

Example rule - application Executed when an object has been ingested acPostProcForPut | $format == "image/tiff" && $objectcategory="highResMS" | msiCheckForJPEGTiling## msiTiffToJPEGTiling## msiValidateTiffToJPEGTiling | nop##msiCleanUpJPEGTiling## msiCleanUpJPEGTiling

Example Retrieving large objects for processing Retrieving entire object not always necessary, and can be inefficient Move the processing to the data Disseminators -> rules

Next steps/issues Prototypes -> production Developing more comprehensive set of rules for managing digital objects Jobs requiring data from multiple locations Dynamic deployment of jobs Virtual workspaces

Contacts mark.hedges at kcl.ac.uk tobias.blanke at kcl.ac.uk a.hasan at rl.ac.uk