Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generic policy rules and principles Jean-Yves Nief.

Similar presentations


Presentation on theme: "Generic policy rules and principles Jean-Yves Nief."— Presentation transcript:

1 Generic policy rules and principles Jean-Yves Nief

2 Talk overview An introduction to CC-IN2P3 activity. iRODS in production: –Why are we using it ? –Who is using it ? –Prospects. iRODS rules policies through examples: –Resource Monitoring System. –Biomedical applications: Human data. Animal data. –Arts and Humanities. –Other rules: Mass storage system interface, access rights. –Pitfalls. –Future usages. 20/09/10Repository workshop - Garching2

3 CC-IN2P3 activities 20/09/10Repository workshop - Garching3 dapnia Federate computing needs of the french scientific community in: –Nuclear and particle physics. –Astrophysics and astroparticles. Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, …. Opened now to biology, Arts & Humanities.

4 iRODS @ CC-IN2P3: why using it ? National and international collaborations. Users spread geographically (Europe, America, Australia…).  Need for storage virtualization: -federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…). -transparent data access for end users. -middleware working on heterogeneous OS. -common logical name space. -virtual organization (access rights, groups etc…). -metadata search. -Easy interface with any kind of clients applications (APIs, drivers). 20/09/10Repository workshop - Garching4

5 iRODS @ CC-IN2P3: why using it ? SRB being used since 2003: –3 PBs handled for 10 different experiments (HEP, astro, biology). –Decomissionning: end of 2012 ? Limitation: –no centralized data management (DM).  no enforcement of DM policy. iRODS rules based policy: –adequate solution. –from the user point of view: virtualization of data management policy. 20/09/10Repository workshop - Garching5

6 iRODS @ CC-IN2P3: who is using it ? Arts and Humanities (Adonis): –Long term data preservation. –Web and batch jobs access. Biology (phylogenetic), fluid mechanics: –grid jobs. Biomedical applications: –Human and animal imagery. Biology (phylogenetic), fluid mechanics: –grid jobs. High Energy physics: –Neutrino experiment. 20/09/10Repository workshop - Garching6

7 iRODS @ CC-IN2P3: who is going to use it ? Astrophysics experiments: –LSST … Other biomedical, physics projects. iRODS will be part of French NGI. All the SRB instances to be moved to iRODS.  1 PB should be reached soon. 20/09/10Repository workshop - Garching7

8 Rules examples: Arts and Humanities 20/09/10Repository workshop - Garching8 CRDO CINES CC-IN2P3 1.Data transfer: CRDO  CINES (Montpellier). 2.Archived at CINES. 3.iRODS transfer to CC- IN2P3: iput file.tar 4.Automatic untar at Lyon + checksum. 5.Automatic registration in Fedora-commons (delayed rule). Fedora Archive Ex: archival and data publication of audio files (CRDO).

9 Rules examples: biomedical data Human and animal data (fMRI, PET, MEG etc…). Usually in DICOM format. Main issue for human data: –Need to be anonymized ! Need to do metadata search on DICOM files.  Rule: 1.Check for anonymization of the file: send a warning if not true. 2.Extract a subset of metadata (based on a list stored in iRODS) from DICOM files. 3.Add these metadata as user defined metadata in iRODS. 20/09/10Repository workshop - Garching9

10 Rules examples: resource monitoring system 20/09/10Repository workshop - Garching10 iRODS iCAT server iRODS data server 1.Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. Perf script 3. Results sent back to the iCAT. 4. Store metrics into iCAT. DB 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi).

11 Other rules Mass Storage System integration: –Using compound resources: iRODS disk cache + tapes. –Data on disk cache replication into MSS asynchronously (1h later) using a delayExec rule. –Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: –Rules needed for fine granularity access rights management. –Eg: 3 groups of users (admins, experts, users). ACLs on / /*/rawdata => admins : r/w, experts + users : r ACLs on all others subcollections => admins + experts : r/w, users : r 20/09/10Repository workshop - Garching11

12 Developpements needed Scripts/binaries: –Metadata extraction from DICOM files. –Registration of files into Fedora-Commons. –…  Needed whatever storage system being used underneath. Micro-services: –ACLs, tar/untar of archives file,…  APIs already available, did not require a large amount of work (parts of iRODS distro). –Resource Monitoring System: bigger developpement, includes modification of the iCAT schema. Rules: –Most of them are simple. –Somes requires more work (Adonis project), workflow more complex. 20/09/10Repository workshop - Garching12

13 Pitfalls and bugs Writing complex rules: –Avoid writing them directly using the.irb syntax. –Becomes difficult to debug especially with nested actions.  solution: need to use ruleGen to generate rules in a more user friendly manner. Some memory leaks found with irodsReServer with Oracle as a backend:  Fixed in 2.4. delayExec syntax bugs:  Fixed in 2.4 and 2.4.1. Rules in configuration file at the moment: –Must be consistent on all the iRODS servers.  Will be in the iCAT database in the future. 20/09/10Repository workshop - Garching13

14 Prospects Rules for database interaction (in progress): –Will be used by DTM (developped at CC-IN2P3): DTM managed list of tasks to be processed by a batch cluster. DTM requires a database to manage the tasks. –Rule launched by the client will interact with the DTM database through iRODS: More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication. Database schema upgrade transparent for the client (no SQL code launched on the client side). Xmessaging system (part of iRODS): –Allow to exchange messages between different iRODS process or clients. –e.g.: Could be used to monitor job status in a distributed computing environnement. 20/09/10Repository workshop - Garching14

15 Acknowledgement Thanks to: –Pascal Calvat. –Yonny Cardenas. –Thomas Kachelhoffer. –Pierre-Yves Jallud. 03/25/10iRODS at CC-IN2P315


Download ppt "Generic policy rules and principles Jean-Yves Nief."

Similar presentations


Ads by Google