An Overview of iRODS Integrated Rule-Oriented Data System

Slides:



Advertisements
Similar presentations
Texas Digital Library Services Preservation Network.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
The Storage Resource Broker and.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
A Very Brief Introduction to iRODS
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
Distributed components
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Generic policy rules and principles Jean-Yves Nief.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1, A.Hasan 2, F.Di Lodovico 3, Y.Watase 1, T.Sasaki 1 1. High.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Core SRB Technology for 2005 NCOIC Workshop By Michael Wan And Wayne Schroeder SDSC SDSC/UCSD/NPACI.
Introduction to iRODS Jean-Yves Nief. Talk overview Data management context. Some data management goals: –Storage virtualization. –Virtualization of the.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
File and Object Replication in Data Grids Chin-Yi Tsai.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
IRODS Service in GIMI. 2 User Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 iRODS: A Rule Oriented Data ManagementSystem SRB Space.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Introduction to The Storage Resource.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
IRODS Advanced Features Michael Wan
SAM projects status Robert Illingworth 29 August 2012.
Federating Data in the ALICE Experiment
Simulation Production System
Data Bridge Solving diverse data access in scientific applications
Introduction to iRODS Jean-Yves Nief.
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Policy-Based Data Management integrated Rule Oriented Data System
The Client/Server Database Environment
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Module 01 ETICS Overview ETICS Online Tutorials
VORB Virtual Object Ring Buffers
A Scripting Server for Domain Automation Tasks
Status and plans for bookkeeping system and production tools
Sending data to EUROSTAT using STATEL and STADIUM web client
STATEL an easy way to transfer data
Presentation transcript:

An Overview of iRODS Integrated Rule-Oriented Data System Michael Wan mwan@sdsc.edu http://irods.sdsc.edu/

Motivation for iRods Based on experience with SRB - Distributed Data Management System Global Logical Name space - UNIX like directories and files Single Global User Name Space - Single sign-on Federated middleware system Client/server model – Federation of resource servers with uniform interfaces Robust access control MCAT – Metadata catalog SRB used by many projects and many different requirements and configurations

BIRN: Biomedical Information Research Network

NOAO Data Flow

Archive Service Architecture Remote Institute Site UK eScience Archive process Data Path Local Storage Filer Central ‘Cache’ Site RAL Ingestion Site WAN Firewall 1 JANET WAN Local machines ads0sb01.cc.rl.ac.uk Central SRB Server Local SRB Server 2 Tape Traffic Sphymove in to container SRB-ADS Server 3 ADS Tape Resource disk disk Ssyncont Sreplcont ADS SRB Disk Cache Resource Local Vault disk Firewall Central “cache” Vault 4 Firewall 1 Archive Submission Interface Data Ingestion of collection hierarchy into SRB Uses Java jargon API interface (equivalent of Sput –b) Ingested to /bbsrc/institute/scratch/project/year/user/dateandtime At end of ingestion data logically moved using Smv to: /bbsrc/institute/local-archive/project/year/user/dateandtime 3 Scheduled transfer to ADS resource Implemented via CRON job using Sreplcont command which is driven by central SRB Server Entire container replicated using Sreplcont command Logical Structure preserved as /bbsrc/institute/remote-archive/project/year/user/dateandtime 2 Scheduled transfer to Central SRB Server (Driven from Central SRB Server) Smkcont command used to create container on central SRB Server Data moved from Site SRB to container on central SRB Server using Sphymove Upon data transfer completion archived data is logically move with Smv to /bbsrc/institute/remote-archive/project/year/user/dateandtime 4 Synchronization of container to tape resource and removal of original container from Central SRB Server Ssyncont –d –a command used, allowing for a family of containers

SRB BaBar architecture. 2 Zones (SLAC + Lyon) SRB (2) (1) (3) SRB HPSS/Lyon HPSS/SLAC SRB SRB SRB MCAT SRB MCAT CC-IN2P3 (Lyon) SLAC (Stanford, CA)

Motivation for iRods (cont) Implementing all these features becomes unmanageable Sput [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName Need a more flexible way to configure the system More power and flexibility for users User defined workflow to be executed on the server SRB code requires major refactoring Rewrite – make it open source

iRods features Most SRB data grid features Global logical name space, global user name space Fine grain access control Federated resources – heterogeneous Zone federation – across organizations and admin domains Data replication and synchronization Parallel I/O User defined metadata

iRods Features (cont) Improvements Total rewrite from scratch New, more flexible and efficient protocol Client/server, server/server 2 modes – native (binary) – more efficient XML – easier for developer of other languages – php and java Reduce number msg exchange Put/get/replicate/copy of small files SRB – 3 msg, create, write, close iRods – one msg (data included in the request) Reduce number of tables in the Metadata catalog Reduce the number of joints SRB – over 100 tables iRods – a few large tables Small files upload/download – a factor of 3-4 improvement Restart capability – restart file

iRods Rules and Workflow System – Target apllications Target applications include: Data grids for sharing data Distributed workflow. Persistent archival, data preservation Real-time sensor data collections Large scale data analysis

iRods rule and workflow system Two basic levels System Level – used by Sys Admin Automatic execution of data management policies Data Integrity Validation of checksums Replication and synchronization of replicas Data distribution and archival Automatic caching (staging) Replication of data to remote sites Migration to archival Resource Purging Replica (cached copies)

iRods rule and workflow system (cont) System Level Other data management policies Data ingestion - pre-processing, post-processing Resource selection for upload Copy selection for download. Data retention and deletion policy Access controls – foreign zone user, public user Generation of Archival Packages – metadata, data bundle – zip, tar Other Administrative management policies Data transport tuning - parallel I/O, number of streams. Audit trails

iRods rule and workflow system (cont) User Level Workflow System Execution of User designed workflow. Request server to perform a series of micro-services with a single call micro-services are predefined functions which can be called by the workflow scripting language Most iRods APIs have been converted to micro-services Depends on the user community for contribution to the micro-service library.

iRODS - integrated Rule-Oriented Data System Client Interface Admin Interface Rule Invoker Resources Service Manager Rule Modifier Module Config Modifier Module Metadata Modifier Module Resource-based Services Rule Consistency Check Module Consistency Check Module Consistency Check Module Micro Service Modules Engine Current State Confs Metadata-based Services Rule Base Metadata Persistent Repository Micro Service Modules

Rules and WorkFlow implementation Two interfaces to the Rule engines: Logic programming interface Cryptic Used mostly for system level rules Scripting language interface Programming language like Support condition (if/else) and loops (while) Internally translated to logic programming rules.

Rule - Logic programming interface Rule composed of four parts: Name | condition | micro-service set | recovery set Postprocessing rule example: Files replication acPostProcForPut |$objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop Preprocessing rule example: Files staging acPreprocForDataObjOpen| |$objPath like /tempZone/home/rods/birn/* | msiStageDataObj(demoResc8)|nop##nop

Rule – Scripting Language interface Easier to use. Mostly for user level workflow Work in progress Example 1 replFileSet(*condition,*resourceName) { acGetIcatResults("replicate", *condition, *result); /* queries iCAT for dataNames the met the condition */ foreach (result) /* for each tuple in the result */ acGetDataName(*result,*dataName); /* get the dataName from the result uple */ msiDataObjRepl(*dataName, *resourceName, *stat1) ::: writeLine(stdout,"Replication failed for *dataName with *stat1"); /* ::: denotes recovery operation. In this case, an error message is written */ writeLine(stdout,"Replicated *dataName to resource *resourceName with status *stat2"); } writeLine(stdout,"Replication Finished Successfully for *condition"); *Condition= COLL_NAME = '/tempZone/home/rods or *Condition= DATA_TYPE = 'DICOM'

Rule – Scripting Language interface Example 2 apiTestWorkflow ( Rule – Scripting Language interface Example 2 apiTestWorkflow (*InFile, *OutFile1, *OutFule2) { msiDataObjOpen(*InFile,*S_FD); msiDataObjCreate(*OutFile1,"null",*D_FD); msiDataObjLseek(*S_FD,10,SEEK_SET,*Stat1); msiDataObjRead(*S_FD,10000,*R_BUF); msiDataObjWrite(*D_FD,*R_BUF,*W_LEN); msiDataObjClose(*S_FD,*Stat2); msiDataObjClose(*D_FD,*Stat3); msiDataObjCopy(*OutFile1,*OutFile2,null,*Stat4); delay ("<PLUSET>1m</PLUSET>") { msiDataObjRepl(*OutFile2,demoResc8,*Stat5); } msiDataObjUnlink(*OutFile1,*Stat6); writeParams(stdout,"*R_BUF,*W_LEN"); }

Rules and micro-services implemented Over 20 System level rules Administrative Storage Resource selection Data pre-processing Data post-processing Data deletion Parallel I/O Over 20 User level micro-services Operations on data – checksum, replicate, open, read, write Metadata extraction

Rule and Workflow system “rule exec” daemon Execute rules, workflows and micro-services in the background The delay function causes the rule execution env to be checkpointed and saved Job submission through making an entry in the Job table in DB “Rule exec” daemon checks Job table for job to execute Time of execution Delayed by certain time At certain time Frequency iqstat command – check status iqdel command – delete a job from queue Job scheduling and remote execution – future work.

iRods Status Version 0.5 released Dec 21, 2006 Version 0.9 released May 30, 2007 Contains sufficient features to be deployed as a Data Grid 90,000 lines of C code. Server, client C lib and iCommands Version 1.0 scheduled for fall, 2007 Web interface – php/java script Java classes – Jargon Oracle iCat Zone federation Open source - BSD license

More Information Michael Wan mwan@sdsc.edu http://irods.sdsc.edu