Presentation is loading. Please wait.

Presentation is loading. Please wait.

iRODS for Research Data Management

Similar presentations


Presentation on theme: "iRODS for Research Data Management"— Presentation transcript:

1 iRODS for Research Data Management
SURF Bootcamp - TU/e June 2017 Narges Zarrabi (SURFsara)

2 iRODS? integrated Rule-Oriented Data Systems Data (and storage) management middleware Open source (roughly 250k lines of C++ code) First Release in 2009 Governed by iRODS consortium: standardise development and releases operated at Univ. of North Carolina, by RENCI (Renaissance Computing Institute) and DICE (Data Intensive Cyber Environments Centre) Open source software—such as iRODS—provides several benefits to users. First, because the source code is publicly available, the user community can monitor the entire development process. Second, developers within the user community can monitor and fix any errors in the code, extend the existing code, and contribute new code. iRODS is open-source, data management middleware that enables users to: Access, manage, and share data across any type or number of storage systems located anywhere, while maintaining redundancy and security control over their data with extensible rules that ensure the data is archived, described, and replicated in accordance with their needs.

3 What is iRODS? iRODS is a middleware, rather than a prepackaged solution Acts as a bridge between the user and the infrastructure A layer that sits above the file systems that contain data Supports plugins and configurable policies Can manage large amounts of data iRODS is a layer that sits above the file systems that contain data, and below domain-specific ap- plications. Because iRODS has a plugin framework and is technology-agnostic, it provides insulation from vendor lock-in. System administrators can slide iRODS on top of an existing heterogeneous data infrastructure and construct a flexible data grid. As middleware, iRODS allows administrators to track and control access to the data under their care; and through Zone Reports (i.e., snapshots of an iRODS zone accessed via the izonereport iCommand), administrators can also monitor the status of the Zone (i.e., iRODS deployment).

4 Data management & requirements
User wants to store data: Regardless of Size, data sensitivity Easy access, data availability Not bother too much about details  technology has to do most Data manager needs to make sure policies are met: Data security, safe storage, long-term storage Keep costs low  when can data be moved to cheaper media? Data is stored on appropriate media (size, availability) Storage System admins want users to Behave! E.g. no small files on my system please. Use the systems efficiently!

5 What does iRODS provide?
Storage virtualization of different disk and tape storage systems  All your storage in a single namespace (Logical name space) Data discovery, through the use of system and user generated metadata A rule engine to automate data management according to defined policies  implement data policies Secure collaboration and data sharing between collaborating or distributed teams  Federation between iRODS instances  Virtualization, which provides a one-stop shop for all data regardless of the heterogeneity of storage devices. Whether data is stored on a local hard drive, a remote Ceph cluster, or Amazon’s S3 object store, iRODS’ virtualization layer presents data resources in the classic files and folders format, within a single namespace. Storage virtualization of different disk and tape storage systems A logical namespace across storage locations A policy engine that can automate data management according to defined rules A method to create and define user specific procedures and functions Various client interfaces A flexible architecture

6 iRODS Components iRODS Server: installed at each storage location
iRODS Rule Engine: controls operations performed at the storage location (policies) iRODS metadata catalog (iCAT) iRODS User Clients The iRODS Server contains both the driver that issues the local storage resource protocol and the iRODS Rule Engine that controls operations performed at the storage location.

7 iRODS Rule Engine A critical component of the iRODS system
Keeps track and interprets both system- and user-defined rules Rules are definitions of actions to be performed by the server The iRODS built-in Rule Engine, interprets the rules and calls the appropriate microservices. The Rule Engine, which keeps track of state and interprets both system-defined rules and user-defined rules, is. Rules are definitions of actions that are to be performed by the server. These actions are defined in terms of microservices and other actions. The iRODS built-in Rule Engine interprets the rules and calls the appropriate microservices.

8 iRODS Metadata Catalog (iCAT)
Each iRODS zone has one iCAT iCAT manages the mapping between logical and physical name spaces. iCAT server can be either local or remote Uses a relational database The database contains information about: The iRODS zone (to manage the zone) Metadata (managing metadata) The virtual filesystem Resource configuration User information Each iRODS zone contains an iCAT server which uses a relational database to organize the content of the zone and to maintain iRODS metadata. iRODS neither creates nor manages a database instance itself, just the tables within the database. Therefore, the database instance should be created and configured before installing iRODS.

9 iRODS Clients Web-based and Standalone GUIs : iRODS Cloud Browser, MetaLnx, iDrop, Kanki WebDAV for drag-and-drop access built in to the OS APIs: Python, REST, Java, C++ Portals, External Systems: iPlant Discovery Environment, Islandora, Fedora Commons Command Line Interface iRODS native client support iCommands: UNIX like command line interface iRODS Explorer for Windows iRODS web browser iDrop: client-side transfer and data management application iRODS API Jargon – Java API Prods – PHP API

10 iRODS Zone iRODS deployment (or Zone)
An iRODS Zone have exactly one iCAT server (with an iCAT database) An iRODS Zone can have zero or more Resource servers (which connects to an existing Zone and can provide additional storage resources) The simplest iRODS installation consists of one iCAT server and zero Resource servers. Each iRODS deployment—or Zone—is composed of: iRODS Metadata Catalog (iCAT) database Catalog Provider Optional Catalog Consumers. The iCAT is a relational database that holds all the information about your data, users, and zone that the iRODS servers need to facilitate the management and sharing of your data. The iCAT contains the information about ˆ the zone for the purposes of sharing across zones, ˆ data and their metadata, ˆ the virtual file system, ˆ resource configuration, and ˆ user information. All iRODS servers in a Zone run the same core code and are peers. Each server may have its own set of policies, rules, and plugins. However, the Catalog Provider holds the connection to and communicates with the iCAT database. Consumer servers must communicate with the database by connecting through the Catalog Provider. A zone may have as many Consumer servers as needed. Using multiple Consumer servers can enhance the performance, security, and resilience of a Zone by providing redundancy, both within a single location and distributed geographically A single computer cannot have both an iCAT server and a Resource server installed.

11 iRODS Data organization
Logical Name Space (virtual layer managed by iCAT) Physical Name Space (physical resources)

12 iRODS Data organization
Data Objects (files, which have a physical path as well as a virtual path)  Data Collections (directories, which only exist in the iCAT database and do not have an associated physical path) Collection0 Data Object Object A Collection1 Object X …/Collection0/ …/Collection0/DataObject A-B …/Collection0/Collection1/ …/Collection0/Collection1/DataObject X-Z Logical Name Space (iCAT) independent from physical resources Similar to Unix Directory and Files ichmod to set user/group ACLs Core.re: msiSetDefaultResc(Resource) .irodsEnv: irodsDefResource=Resource In iRODS, files are stored as Data Objects on disk and have an associated physical path as well as a virtual path within the iRODS file system. iRODS collections, however, only exist in the iCAT database and do not have an associated physical path (allowing them to exist across all resources, virtually).

13 iRODS Metadat The iCAT server stores metadata in the form of “triples” in the relational database. Attribute Value Unit (AVU) triple Metadata may be attached to: Files Users Groups Collections (subdirectories) Resources (data containers).  iRODS metadata can be searched and queried by the use “Virtual” Resource

14 iRODS Resources (Storage) Resource is a Software or Hardware system that stores data A resource is a logical mapping of a resource name to a number of physical attributes of a resource 3 Resource classes: Cache Archival Compound iRODS resources are pieces of a file system, external servers or software in which data can be stored. High Latency (Tape) Low Latency (Disk) “Virtual” Resource

15 iRODS Data Request Workflow

16 Composable Resources Composable resource Types: Replication
synchronise resources Round Robin rotate through children for uploading Load balance Compound resource cache resource and archive resource Low Latency Composable Resource Storage resources can be grouped and managed by composable resources. There are several types of composable resources which already implement a sort of data policy.

17 The iRODS solution – Users view
+ Extra information: attribute: distance value: 12 units: miles ---- attribute: author value: Alice units: Abstraction layer: Mapping from logical to physical namespace iCAT (iRODS metadata catalogue): /irodsZone/home/user/Collection0/testfile.txt  /irodsVault/home/rods/testfile.txt In more detail. The usual situation for a user is: You have several storage media. The user needs to be aware of the different protocols with which you can communicate with the storage devices. Furthermore, the user needs to be familiar with the data policies and move his data according to them. E.g. if a file is too big for some storage media, the user needs to find that out and move his data to a different resource. The user needs to keep the overview, which data is stored where. iRODS abstracts from the different storage media. I.e. the user does not have to see or to worry about which media there are, which protocols he needs and which data policies the system requires. IRODS provides different plugins to manage different storage systems, e.g. S3, tape, disk and even certain software like Thredds/opendap. iRODS takes care of that and exposes one filesystem-like view on all data. The system takes care of managing data across storage resources. So data can be moved by the system or users can decide (within certain limits) themselves where to put the data. Storage layer: Different storage media Different protocols to steer data Disk S3 Tape

18 iRODS is … Not meant for ad hoc data management
Overkill when you decided for one and only one storage system Not a storage monitoring system

19 Use iRODS when … Combining several storage systems that you want your users to access and steer in a uniform way  spare users to employ tons of different protocols Scale out easily  simply plug in new storage system Automatise data management: execute data workflows regularly or upon action  Enforce data policies Managing data across different administratory domains with different storage solutions

20 Websites Website https://www.irods.org/ Documentation
Downloads iRODS Forum Bug Tracking

21 Hands-on Setup icommands On Lisa: iRODS server “eveZone” iCAT Rule
engine iRODS server “eveZone” User Interface machine Login: di4rX On Lisa: icommands Upload, download ACLs, metadata Optional: resources Link to the training material:

22 Christine Staiger (SURFsara) Jeroen Engelberts (SURFsara)
Thank You! Questions? Thanks to Christine Staiger (SURFsara) Jeroen Engelberts (SURFsara)


Download ppt "iRODS for Research Data Management"

Similar presentations


Ads by Google