Generic policy rules and principles Jean-Yves Nief.

Slides:



Advertisements
Similar presentations
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Advertisements

Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Repositories, Federations, APIs, Policies - wrap up - Peter Wittenburg these slides are just a personal summary of major points they do not represent per.
A Very Brief Introduction to iRODS
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
OSG Public Storage and iRODS
Introduction to iRODS Jean-Yves Nief. Talk overview Data management context. Some data management goals: –Storage virtualization. –Virtualization of the.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
IRODS Service in GIMI. 2 User Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Transaction-based Grid Data Replication Using OGSA-DAI Presented by Yin Chen February 2007.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Managing Petabytes of data with iRODS at CC-IN2P3
A Technical Overview Bill Branan DuraCloud Technical Lead.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The GridPP DIRAC project DIRAC for non-LHC communities.
GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Lynda : Lyon Neuroimaging Database and Applications (1) Institut des Sciences Cognitives UMR 5015 CNRS ; (2) parallel computing ENS-Lyon ; (3)Centre de.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
The Database Project a starting work by Arnauld Albert, Cristiano Bozza.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
JUX (Java Universal eXplorer) Pascal Calvat. Several grid in the world middleware ARCGOSNAREGI 2.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
1 eScience Grid Environments th May 2004 NESC - Edinburgh Deployment of Storage Resource Broker at CCLRC for E-science Projects Ananta Manandhar.
IRODS at CC-IN2P3: overview Jean-Yves Nief. Talk overview iRODS in production: –Hardware setup. –Usage. –Prospects. iRODS developpements in Lyon: –Scripts.
CC-IN2P3 data repositories Jean-Yves Nief. What is CC-IN2P3 ? 04/12/2009CC-IN2P3 data repositories2 Federate computing needs of the french community:
Jean-Philippe Baud, IT-GD, CERN November 2007
Simulation Production System
An Overview of iRODS Integrated Rule-Oriented Data System
Introduction to iRODS Jean-Yves Nief.
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Module 01 ETICS Overview ETICS Online Tutorials
CC-IN2P3 Jean-Yves Nief, CC-IN2P3 HEPiX, SLAC
Presentation transcript:

Generic policy rules and principles Jean-Yves Nief

Talk overview An introduction to CC-IN2P3 activity. iRODS in production: –Why are we using it ? –Who is using it ? –Prospects. iRODS rules policies through examples: –Resource Monitoring System. –Biomedical applications: Human data. Animal data. –Arts and Humanities. –Other rules: Mass storage system interface, access rights. –Pitfalls. –Future usages. 20/09/10Repository workshop - Garching2

CC-IN2P3 activities 20/09/10Repository workshop - Garching3 dapnia Federate computing needs of the french scientific community in: –Nuclear and particle physics. –Astrophysics and astroparticles. Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, …. Opened now to biology, Arts & Humanities.

CC-IN2P3: why using it ? National and international collaborations. Users spread geographically (Europe, America, Australia…).  Need for storage virtualization: -federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…). -transparent data access for end users. -middleware working on heterogeneous OS. -common logical name space. -virtual organization (access rights, groups etc…). -metadata search. -Easy interface with any kind of clients applications (APIs, drivers). 20/09/10Repository workshop - Garching4

CC-IN2P3: why using it ? SRB being used since 2003: –3 PBs handled for 10 different experiments (HEP, astro, biology). –Decomissionning: end of 2012 ? Limitation: –no centralized data management (DM).  no enforcement of DM policy. iRODS rules based policy: –adequate solution. –from the user point of view: virtualization of data management policy. 20/09/10Repository workshop - Garching5

CC-IN2P3: who is using it ? Arts and Humanities (Adonis): –Long term data preservation. –Web and batch jobs access. Biology (phylogenetic), fluid mechanics: –grid jobs. Biomedical applications: –Human and animal imagery. Biology (phylogenetic), fluid mechanics: –grid jobs. High Energy physics: –Neutrino experiment. 20/09/10Repository workshop - Garching6

CC-IN2P3: who is going to use it ? Astrophysics experiments: –LSST … Other biomedical, physics projects. iRODS will be part of French NGI. All the SRB instances to be moved to iRODS.  1 PB should be reached soon. 20/09/10Repository workshop - Garching7

Rules examples: Arts and Humanities 20/09/10Repository workshop - Garching8 CRDO CINES CC-IN2P3 1.Data transfer: CRDO  CINES (Montpellier). 2.Archived at CINES. 3.iRODS transfer to CC- IN2P3: iput file.tar 4.Automatic untar at Lyon + checksum. 5.Automatic registration in Fedora-commons (delayed rule). Fedora Archive Ex: archival and data publication of audio files (CRDO).

Rules examples: biomedical data Human and animal data (fMRI, PET, MEG etc…). Usually in DICOM format. Main issue for human data: –Need to be anonymized ! Need to do metadata search on DICOM files.  Rule: 1.Check for anonymization of the file: send a warning if not true. 2.Extract a subset of metadata (based on a list stored in iRODS) from DICOM files. 3.Add these metadata as user defined metadata in iRODS. 20/09/10Repository workshop - Garching9

Rules examples: resource monitoring system 20/09/10Repository workshop - Garching10 iRODS iCAT server iRODS data server 1.Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. Perf script 3. Results sent back to the iCAT. 4. Store metrics into iCAT. DB 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi).

Other rules Mass Storage System integration: –Using compound resources: iRODS disk cache + tapes. –Data on disk cache replication into MSS asynchronously (1h later) using a delayExec rule. –Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: –Rules needed for fine granularity access rights management. –Eg: 3 groups of users (admins, experts, users). ACLs on / /*/rawdata => admins : r/w, experts + users : r ACLs on all others subcollections => admins + experts : r/w, users : r 20/09/10Repository workshop - Garching11

Developpements needed Scripts/binaries: –Metadata extraction from DICOM files. –Registration of files into Fedora-Commons. –…  Needed whatever storage system being used underneath. Micro-services: –ACLs, tar/untar of archives file,…  APIs already available, did not require a large amount of work (parts of iRODS distro). –Resource Monitoring System: bigger developpement, includes modification of the iCAT schema. Rules: –Most of them are simple. –Somes requires more work (Adonis project), workflow more complex. 20/09/10Repository workshop - Garching12

Pitfalls and bugs Writing complex rules: –Avoid writing them directly using the.irb syntax. –Becomes difficult to debug especially with nested actions.  solution: need to use ruleGen to generate rules in a more user friendly manner. Some memory leaks found with irodsReServer with Oracle as a backend:  Fixed in 2.4. delayExec syntax bugs:  Fixed in 2.4 and Rules in configuration file at the moment: –Must be consistent on all the iRODS servers.  Will be in the iCAT database in the future. 20/09/10Repository workshop - Garching13

Prospects Rules for database interaction (in progress): –Will be used by DTM (developped at CC-IN2P3): DTM managed list of tasks to be processed by a batch cluster. DTM requires a database to manage the tasks. –Rule launched by the client will interact with the DTM database through iRODS: More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication. Database schema upgrade transparent for the client (no SQL code launched on the client side). Xmessaging system (part of iRODS): –Allow to exchange messages between different iRODS process or clients. –e.g.: Could be used to monitor job status in a distributed computing environnement. 20/09/10Repository workshop - Garching14

Acknowledgement Thanks to: –Pascal Calvat. –Yonny Cardenas. –Thomas Kachelhoffer. –Pierre-Yves Jallud. 03/25/10iRODS at CC-IN2P315