Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview of iRODS Integrated Rule-Oriented Data System

Similar presentations


Presentation on theme: "An Overview of iRODS Integrated Rule-Oriented Data System"— Presentation transcript:

1 An Overview of iRODS Integrated Rule-Oriented Data System
Michael Wan

2 Motivation for iRods Based on experience with SRB - Distributed Data Management System Global Logical Name space - UNIX like directories and files Single Global User Name Space - Single sign-on Federated middleware system Client/server model – Federation of resource servers with uniform interfaces Robust access control MCAT – Metadata catalog SRB used by many projects and many different requirements and configurations

3 BIRN: Biomedical Information Research Network

4 NOAO Data Flow

5 Archive Service Architecture
Remote Institute Site UK eScience Archive process Data Path Local Storage Filer Central ‘Cache’ Site RAL Ingestion Site WAN Firewall 1 JANET WAN Local machines ads0sb01.cc.rl.ac.uk Central SRB Server Local SRB Server 2 Tape Traffic Sphymove in to container SRB-ADS Server 3 ADS Tape Resource disk disk Ssyncont Sreplcont ADS SRB Disk Cache Resource Local Vault disk Firewall Central “cache” Vault 4 Firewall 1 Archive Submission Interface Data Ingestion of collection hierarchy into SRB Uses Java jargon API interface (equivalent of Sput –b) Ingested to /bbsrc/institute/scratch/project/year/user/dateandtime At end of ingestion data logically moved using Smv to: /bbsrc/institute/local-archive/project/year/user/dateandtime 3 Scheduled transfer to ADS resource Implemented via CRON job using Sreplcont command which is driven by central SRB Server Entire container replicated using Sreplcont command Logical Structure preserved as /bbsrc/institute/remote-archive/project/year/user/dateandtime 2 Scheduled transfer to Central SRB Server (Driven from Central SRB Server) Smkcont command used to create container on central SRB Server Data moved from Site SRB to container on central SRB Server using Sphymove Upon data transfer completion archived data is logically move with Smv to /bbsrc/institute/remote-archive/project/year/user/dateandtime 4 Synchronization of container to tape resource and removal of original container from Central SRB Server Ssyncont –d –a command used, allowing for a family of containers

6 SRB BaBar architecture.
2 Zones (SLAC + Lyon) SRB (2) (1) (3) SRB HPSS/Lyon HPSS/SLAC SRB SRB SRB MCAT SRB MCAT CC-IN2P3 (Lyon) SLAC (Stanford, CA)

7 Motivation for iRods (cont)
Implementing all these features becomes unmanageable Sput [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName Need a more flexible way to configure the system More power and flexibility for users User defined workflow to be executed on the server SRB code requires major refactoring Rewrite – make it open source

8 iRods features Most SRB data grid features
Global logical name space, global user name space Fine grain access control Federated resources – heterogeneous Zone federation – across organizations and admin domains Data replication and synchronization Parallel I/O User defined metadata

9 iRods Features (cont) Improvements Total rewrite from scratch
New, more flexible and efficient protocol Client/server, server/server 2 modes – native (binary) – more efficient XML – easier for developer of other languages – php and java Reduce number msg exchange Put/get/replicate/copy of small files SRB – 3 msg, create, write, close iRods – one msg (data included in the request) Reduce number of tables in the Metadata catalog Reduce the number of joints SRB – over 100 tables iRods – a few large tables Small files upload/download – a factor of 3-4 improvement Restart capability – restart file

10 iRods Rules and Workflow System – Target apllications
Target applications include: Data grids for sharing data Distributed workflow. Persistent archival, data preservation Real-time sensor data collections Large scale data analysis

11 iRods rule and workflow system
Two basic levels System Level – used by Sys Admin Automatic execution of data management policies Data Integrity Validation of checksums Replication and synchronization of replicas Data distribution and archival Automatic caching (staging) Replication of data to remote sites Migration to archival Resource Purging Replica (cached copies)

12 iRods rule and workflow system (cont)
System Level Other data management policies Data ingestion - pre-processing, post-processing Resource selection for upload Copy selection for download. Data retention and deletion policy Access controls – foreign zone user, public user Generation of Archival Packages – metadata, data bundle – zip, tar Other Administrative management policies Data transport tuning - parallel I/O, number of streams. Audit trails

13 iRods rule and workflow system (cont)
User Level Workflow System Execution of User designed workflow. Request server to perform a series of micro-services with a single call micro-services are predefined functions which can be called by the workflow scripting language Most iRods APIs have been converted to micro-services Depends on the user community for contribution to the micro-service library.

14 iRODS - integrated Rule-Oriented Data System
Client Interface Admin Interface Rule Invoker Resources Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Resource-based Services Rule Consistency Check Module Consistency Check Module Consistency Check Module Micro Service Modules Engine Current State Confs Metadata-based Services Rule Base Metadata Persistent Repository Micro Service Modules

15 Rules and WorkFlow implementation
Two interfaces to the Rule engines: Logic programming interface Cryptic Used mostly for system level rules Scripting language interface Programming language like Support condition (if/else) and loops (while) Internally translated to logic programming rules.

16 Rule - Logic programming interface
Rule composed of four parts: Name | condition | micro-service set | recovery set Postprocessing rule example: Files replication acPostProcForPut |$objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop Preprocessing rule example: Files staging acPreprocForDataObjOpen| |$objPath like /tempZone/home/rods/birn/* | msiStageDataObj(demoResc8)|nop##nop

17 Rule – Scripting Language interface
Easier to use. Mostly for user level workflow Work in progress Example 1 replFileSet(*condition,*resourceName) { acGetIcatResults("replicate", *condition, *result); /* queries iCAT for dataNames the met the condition */ foreach (result) /* for each tuple in the result */ acGetDataName(*result,*dataName); /* get the dataName from the result uple */ msiDataObjRepl(*dataName, *resourceName, *stat1) ::: writeLine(stdout,"Replication failed for *dataName with *stat1"); /* ::: denotes recovery operation. In this case, an error message is written */ writeLine(stdout,"Replicated *dataName to resource *resourceName with status *stat2"); } writeLine(stdout,"Replication Finished Successfully for *condition"); *Condition= COLL_NAME = '/tempZone/home/rods or *Condition= DATA_TYPE = 'DICOM'

18 Rule – Scripting Language interface Example 2 apiTestWorkflow (
Rule – Scripting Language interface Example 2 apiTestWorkflow (*InFile, *OutFile1, *OutFule2) { msiDataObjOpen(*InFile,*S_FD); msiDataObjCreate(*OutFile1,"null",*D_FD); msiDataObjLseek(*S_FD,10,SEEK_SET,*Stat1); msiDataObjRead(*S_FD,10000,*R_BUF); msiDataObjWrite(*D_FD,*R_BUF,*W_LEN); msiDataObjClose(*S_FD,*Stat2); msiDataObjClose(*D_FD,*Stat3); msiDataObjCopy(*OutFile1,*OutFile2,null,*Stat4); delay ("<PLUSET>1m</PLUSET>") { msiDataObjRepl(*OutFile2,demoResc8,*Stat5); } msiDataObjUnlink(*OutFile1,*Stat6); writeParams(stdout,"*R_BUF,*W_LEN"); }

19 Rules and micro-services implemented
Over 20 System level rules Administrative Storage Resource selection Data pre-processing Data post-processing Data deletion Parallel I/O Over 20 User level micro-services Operations on data – checksum, replicate, open, read, write Metadata extraction

20 Rule and Workflow system
“rule exec” daemon Execute rules, workflows and micro-services in the background The delay function causes the rule execution env to be checkpointed and saved Job submission through making an entry in the Job table in DB “Rule exec” daemon checks Job table for job to execute Time of execution Delayed by certain time At certain time Frequency iqstat command – check status iqdel command – delete a job from queue Job scheduling and remote execution – future work.

21 iRods Status Version 0.5 released Dec 21, 2006
Version 0.9 released May 30, 2007 Contains sufficient features to be deployed as a Data Grid 90,000 lines of C code. Server, client C lib and iCommands Version 1.0 scheduled for fall, 2007 Web interface – php/java script Java classes – Jargon Oracle iCat Zone federation Open source - BSD license

22 More Information Michael Wan


Download ppt "An Overview of iRODS Integrated Rule-Oriented Data System"

Similar presentations


Ads by Google