Presentation is loading. Please wait.

Presentation is loading. Please wait.

IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction.

Similar presentations


Presentation on theme: "IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction."— Presentation transcript:

1 iRODS usage at CC-IN2P3 Jean-Yves Nief

2 Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction with other services: –Mass Storage System, backup system, Fedora Commons etc... –iRODS clients usage. Architecture examples with collaborating sites. Rules examples. SRB to iRODS migration. To-do list and prospects. 27/09/12iRODS at CC-IN2P32

3 CC-IN2P3 activities 27/09/12iRODS at CC-IN2P33 IRFU Federate computing needs of the french scientific community in: –Nuclear and particle physics. –Astrophysics and astroparticles. Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, …. Opened now to biology, Arts & Humanities.

4 iRODS setup @ CC-IN2P3 In production since early 2008. 14 servers: –2 iCAT servers (metacatalog): Linux SL4, Linux SL5 –12 data servers (520 TB): Sun Thor x454 with Solaris 10, DELL v510 with Linux SL5. Metacatalog on a dedicated Oracle 11g cluster. Monitoring and restart of the services fully automated (crontab + Nagios). Automatic weekly reindexing of the iCAT databases. Accounting: daily report on our web site. 27/09/12iRODS at CC-IN2P34

5 iRODS setup @ CC-IN2P3 27/09/12iRODS at CC-IN2P35 iCAT server iCAT server Data server Data server Data server Data server … … DNS alias: ccirods DNS alias: load balanced. redundancy improved. avoid single point of failure. scalability improved.

6 iRODS monitoring: Nagios 27/09/12iRODS at CC-IN2P36

7 iRODS interaction with other services Mass storage system: HPSS. –Using compound resources. –Interfaced using the universal MSS driver (RFIO protocol used). –Staging requests ordered by tapes using Treqs. Backup system: TSM. –Used for projects who do not have the possibility to replicate precious data on other sites. Fedora Commons: –Storage backend based on iRODS using FUSE. –Rules to register iRODS files into Fedora. External databases: –Rules using RDA. 27/09/12iRODS at CC-IN2P37

8 iRODS clients Clients: from laptop to batch farms. –Authentication: password or X509 certificates. iCommands: most popular. –From any platform: Windows, Mac OSX, Linux (RH, CentOS, Debian…), Solaris 10. Java APIs: interaction with iRODS within workflows. C APIs: direct access to files (open, read, write) to do « random access ». Drivers for some viewer such as OsiriX (biomedical apps). FUSE for legacy web sites and Fedora Commons. Windows explorer and iDrop. 27/09/12iRODS at CC-IN2P38

9 Who is using iRODS ? High energy and nuclear physics: –BaBar: data management of the entire data set between SLAC and CC-IN2P3: total foreseen 2PBs. –dChooz: neutrino experiment (France, USA, Japan etc…): 500 TBs. Astroparticle and astrophysics: –AMS: cosmic ray experiment on the International Space Station (500 TBs). –TREND, BAOradio: radioastronomy (170 TBs). Biology and biomedical applications: phylogenetics, neuroscience, cardiology (50 TBs). Arts and Humanities: Adonis (62 TBs). 27/09/12iRODS at CC-IN2P39

10 iRODS use cases Data sharing and transfers for wide spread communities. Online data access from any kind of front-end app (web, home grown clients, batch farm…) allowing data policies to be run on the data underneath. Data archive. Not intended for massive I/O ops from a batch farm! ( not a parallel file system) 27/09/12iRODS at CC-IN2P310

11 Who is using iRODS ? 27/09/12iRODS at CC-IN2P311

12 Architecture example: BaBar 27/09/12iRODS at CC-IN2P312 archival in Lyon of the entire BaBar data set (total of 2 PBs). automatic transfer from tape to tape: 3 TBs/day (no limitation). automatic recovery of faulty transfers. ability for a SLAC admin to recover files directly from the CC-IN2P3 zone if data lost at SLAC.

13 Architecture example: dChooz 27/09/12iRODS at CC-IN2P313

14 Architecture example: embryogenesis and neuroscience 27/09/12iRODS at CC-IN2P314

15 Rules examples (I) Delayed replication to the MSS: –Data on disk cache replication into MSS asynchronously (1h later) using a delayExec rule. –Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: –Rules needed for fine granularity access rights management. –Eg: 3 groups of users (admins, experts, users). ACLs on / /*/rawdata => admins : r/w, experts + users : r ACLs on all others subcollections => admins + experts : r/w, users : r 27/09/12iRODS at CC-IN2P315

16 Rules examples (II) Fedora Commons: –Tar balls content stored in iRODS are automatically registered into Fedora Commons. 1.Automatic untar of the files + checksum on the iRODS side: msiTarFileExtract. 2.Automatic registration in Fedora-commons (delayed rule): msiExecCmd of a java application. Automatic metadata extraction from DICOM files (neuroscience…): –A given predefined list of metadata is extracted from the files using DCMTK (thanks to Yonny), then user metadata are created for each file. 27/09/12iRODS at CC-IN2P316

17 SRB to iRODS migration SRB still used: 1.7 PBs still there. Migration to iRODS already made for BioEmergence (embryogenesis) in 2010: –Data workflow was using Jargon: transparent. –Migration from Scommands to icommands was needed. –2 hours of downtime to complete the migration (scripts were needed). Needs to migrate all the other projects by the end of 2012, beginning of 2013: –SRB is deeply embedded in data management workflows and projects can’t live without SRB.  Main issue: migration should be as « transparent » as possible in order to keep up with the data activity. 27/09/12iRODS at CC-IN2P317

18 To-do list Complete SRB to iRODS migration. Connection control: –Connections can come from anywhere especially batch farms on the data grid. –Servers can be overwhelmed (network, disk activity for hundreds of connection in //). –Causes clients to exit with an error  not good. –Improved version of CCMS (connection control) is needed. Conversion to rules of the scripts used to manage cache space on compound resources. Dealing with filename with accentuated characters for iCommands on Windows. Provide a light weight transfer tool for every single users (ship files between CC- IN2P3 to distant site). Centralized administration through a GUI (15 instances of iRODS running so far). 27/09/12iRODS at CC-IN2P318

19 Prospects 4 PBs in iRODS as of Sep 2012 (should be 5 PBs at the end of this year). Future projects: –Biomedical field: research in cardiology, MS (anonymization) with data from > 10 hospitals. –Private companies (data encryption needed ?). –Astrophysics. Grid: iRODS officially promoted by the French NGI. 27/09/12iRODS at CC-IN2P319

20 Acknowledgement Thanks to: –Pascal Calvat. –Yonny Cardenas. –Rachid Lemrani. –Thomas Kachelhoffer. –Pierre-Yves Jallud. 27/09/12iRODS at CC-IN2P320


Download ppt "IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction."

Similar presentations


Ads by Google