Integrating SRB with the GIGGLE framework Simon Metson Owen Maroney Tim Barrass {s.metson,o.maroney,tim.barrass}@bristol.ac.uk ACAT ’03, KEK, Japan Hello, my name is Simon Metson. My colleagues and I are members of the Particle Physics group at Bristol in the UK.I’m here today to talk about a project that my colleagues and I have been developing. We’ve been giving some thought to how the SDSC’s Storage Resource Broker might interoperate with the European DataGrid. We’ve identified three components to this system, and today I’m going to talk about the first- the integration of the SRB with the GIGGLE framework. 24/02/2019
The GIGGLE Framework Chervenak et al, http://www. globus Generic distributed data discovery system High level of redundancy, failover EDG/LCG and Globus are both developing implementations RLI LRC Framework proposed by Chervenak et al Generic system of data discovery Based on a hierachy of services to avoid a single points of failure SE register PFN:GUID mapping in LRC LRC pushes GUID:LRC mappings onto RLI RLI forwards queries to relevant LRC LRC returns PFN EDG/LCG and Globusboth developing implementations 24/02/2019
Storage Resource Broker SDSC, http://www.npaci.edu/DICE/SRB Integrated data distribution tool Presents a single abstracted file space composed of distributed resources Tape, disk, compound resourses Produced by SDSC ~5 years old – maturing product Used by many large projects (including CMS for its recent Monte Carlo pre-production) MCat SRB 24/02/2019
The Aim Could files stored in SRB be accessed by Grid tools? Interoperability is one of the key benefits of grid middleware! Extend (not replace or modify) tools available to user Active collaboration between members of CMS, BaBar and the SDSC SRB Group Scenarios: Data discovery – locating SRB files using RLS Job submission – publish resource information and send jobs to farms with ‘close’ SRB servers File replication – consistent copying of files to and from SRB using grid tools 24/02/2019
Data Discovery Grid uses GIGGLE framework to publish data management information RLI LRC 24/02/2019
Data Discovery RLI MCat LRC SRB Grid uses GIGGLE framework to publish data management information RLI LRC SRB uses MCat database MCat SRB 24/02/2019
Data Discovery RLI RLI MCat LRC LRC SRB SRB GMCat MCat Create LRC interface to MCat - GMCat MCat SRB RLI LRC GMCat Grid uses GIGGLE framework to publish data management information RLI LRC SRB uses MCat database MCat SRB 24/02/2019
Implementation Plan … Proof of concept prototype complete LRC GMCAT Sync LRC with MCAT contents Test namespace mapping MCAT /zone/user.domain/my/path/myFile LRC srm://hostname/a/path/theFile Not performant Compares the whole MCat database using “Scommands” New functionality of next SRB release expected to improve this Time stamping – reduces number of files to check GUID generated automatically – attaching GUID metadata is slow LRC push GMCAT pull MCAT Extended workshop with SDSC, who took on board our ideas SRB files located by Dataname and replica number Dataname is unix like path Replica number enumerates replicas LRC SURL has hostname, path and file, and scheme Scheme will be srb Need to extract hostname from SRB Create path from dataname and repl_enum To make mapping Generate SURL and GUID Store GUID in MCAT for consistency Push SURL:GUID mapping onto LRC when synching NEXT!!! Go through the mapping in detail, following slides 24/02/2019
The Prototype In Detail User puts file into SRB with SRB client $ Sput myfile.txt /phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt $ ./server_interface.pl guid = e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8 $ edg-lrc -i mappingsByPfn --vo srbrls -h tuber15.phy.bris.ac.uk \ "*/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/*" guid:e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8, srb://tuber13.phy.bris.ac.uk/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt_0 24/02/2019
The Prototype In Detail User puts file into SRB with SRB client $ Sput myfile.txt /phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt $ ./server_interface.pl guid = e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8 $ edg-lrc -i mappingsByPfn --vo srbrls -h tuber15.phy.bris.ac.uk \ "*/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/*" guid:e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8, srb://tuber13.phy.bris.ac.uk/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt_0 GMCat updates LRC 24/02/2019
The Prototype In Detail User puts file into SRB with SRB client $ Sput myfile.txt /phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt $ ./server_interface.pl guid = e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8 $ edg-lrc -i mappingsByPfn --vo srbrls -h tuber15.phy.bris.ac.uk \ "*/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/*" guid:e4fb1ff3-ae4f-4e38-9b88-3703a15a92d8, srb://tuber13.phy.bris.ac.uk/phy.bris.ac.uk/home/srbadmin.phy.bris.ac.uk/test/myfile.txt_0 GMCat updates LRC File is visible through LRC 24/02/2019
Implementation Plan… Developing production system prototype Modifications & further testing required to ensure correct operation within production environment Include features of future SRB releases as they become available Possible use for upcoming CMS Data Challenge LCG POOL objects stored in SRB space Need to access them via an EDG LRC… 24/02/2019
Future Implementation Move to webservice implementation Match EDG LRC interface Provide dynamic mapping of MCAT to EDG RLS namespace No large database queries More maintainable code No production ready RLI currently available CMS intend to use single LRC for DC04 Currently this means we’ll not be able to use the dynamic webservice, unless MCat stores other LRC data 24/02/2019
The Complete System Will need to address information publishing and file replication Former straightforward- SRB servers can publish in the same way as an EDG Storage Element Latter difficult SRM interface to SRB? Most elegant, but there are difficulties gsiFTP server on SRB? Problems- that SRM is designed to solve EDG Replica Manager talks to SRB natively? Not as generic a solution as interface like SRM 24/02/2019
Conclusion SRB and the EDG/LCG can interoperate Data Discovery component well understood, and useful in isolation Full interoperation requires some development effort Great interest from BaBar, SDSC, CMS and RAL on various aspects so far Website in production at http://www.cern.ch/bristol-escience/SRB-RLS-web/ 10,000 new files takes ~45 minutes on a low-spec MCAT box If no files have been added to a list of 10,000 files, synch takes 4 minutes Isn’t this what a fed MCAT does? Difference in EDG / Globus RLS? Timescale for completion of Data Discovery? Full system? What are the difficulties in implementating an SRM interface? Why is a gsiFTP server bad? Are you intending to replace the EDG SE? 24/02/2019