IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team.

Slides:



Advertisements
Similar presentations
HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
by Marc Comeau. About A Webmaster Developing a website goes far beyond understanding underlying technologies Determine your requirements.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
Alejandro Alvarez Ayllon on behalf of the LCGDM developer team IT/SDC 13/12/2013 DAV support in DPM.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM Collaboration Motivation and proposal Oliver Keeble CERN On.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
CERN IT Department CH-1211 Geneva 23 Switzerland GT HTTP solutions for data access, transfer, federation Fabrizio Furano (presenter) on.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Julia Andreeva on behalf of the MND section MND review.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Overview of DMLite Ricardo Rocha ( on behalf of the LCGDM team.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
EMI is partially funded by the European Commission under Grant Agreement RI Roadmap & Future Work Ricardo Rocha ( on behalf of the DPM team )
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
IT-SDC : Support for Distributed Computing Dynafed FTS3 Human Brain Project use cases Fabrizio Furano Alejandro Alvarez.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
DPM: Future Proof Storage Ricardo Rocha ( on behalf of the DPM team ) EMI INFSO-RI
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
How to use Drupal Awdhesh Kumar (Team Leader) Presentation Topic.
Dynamic Federation of Grid and Cloud Storage Fabrizio Furano, Oliver Keeble, Laurence Field Speaker: Fabrizio Furano.
Maria Alandes Pradillo, CERN Training on GLUE 2 information validation EGI Technical Forum September 2013.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Federating Data in the ALICE Experiment
Jean-Philippe Baud, IT-GD, CERN November 2007
WLCG IPv6 deployment strategy
DPM at ATLAS sites and testbeds in Italy
DPM status and directions
Ricardo Rocha ( on behalf of the DPM team )
Dockerize OpenEdge Srinivasa Rao Nalla.
Dynafed, DPM and EGI DPM workshop 2016 Speaker: Fabrizio Furano
Taming the protocol zoo
DPM Python tools Andrea Manzi CERN DPM Workshop 07th December 2015.
SRM2 Migration Strategy
GFAL 2.0 Devresse Adrien CERN lcgutil team
DPM releases and platforms status
DPM and SRM-less operation
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Geant4 Documentation Geant4 Workshop 4 October 2002 Dennis Wright
Presentation transcript:

IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team

07 Dec 2015 IT-SDC DPM status and directions Welcome  On Monday and Tuesday the DPM workshop took place  Very dense program, very committed participants, many comments and good ideas  Described the current status and the next technical plans  These slides are just a distilled summary  For more details: 

07 Dec 2015 IT-SDC DPM status and directions DPM macro-features  Clusterization of pools of disks and storage in general, local or networked  DB-based metadata persistency  Aims to minimize unnecessary maintenance cost  Multiple data protocols for the same cluster  HTTP, WebDAV, Xrootd, GridFTP1/2  Historical support for the Grid standards  SRM, X509/VOMS  Focus on giving simplified setup tools  Before it was YAIM  Now it’s a YAIM-style standalone puppet template  Focus on support and community-building  DPM exists thanks to its considerable and active user base

07 Dec 2015 IT-SDC DPM status and directions Country reports  The morning was filled by the very high quality ‘Country site reports’  Reports, requests, proposals from  Italian sites  Puppet successful pioneers, how to move/delete files FAST, Dynafed+DPM heavy tests  French sites  Large community, would like more emphasis on releases, proposes to unify more subsystems under the same version/packaging, SRM issues, wish enhanced DM operations  Australian sites  Upgrade concerns, SRM issues  UK sites  Concerns over upgrading setup, IO limiting, SRM issues  All these are active in the testbed activity of the DPM collaboration  All these are multipetabyte SEs that foresee growth  The contribution of friend sites has been extremely important to DPM

07 Dec 2015 IT-SDC DPM status and directions Belle II  Presentation about the Belle-II computing model  33% of the storage is/will be DPM  Interest in ACLs (DPM has them)  Uses LFC as a file catalogue  Using Xrootd, Interested in expanding to HTTP, seeking for confirmations that ROOT supports it  We gave them … We support actively DAVIX and TDavixFile  Keeping an eye on the Http federations initiatives

07 Dec 2015 IT-SDC DPM status and directions DPM status  Infosys numbers  60PBs in total  176 instances  6 sites larger than 2 PB, 18 sites larger than 1PB. The largest so far is 3.3PB  Quite a few plan multiPB expansions in the next year  Main aspects that have been considered and remain our focus  Consolidation, keeping sysadmin cost at the lowest  Performance, scalability  High quality HTTP, WebDAV, Xrootd, GridFTP support  Support. In touch with sysadmins as much as we can  We like how dpm-users-forum is doing

07 Dec 2015 IT-SDC DPM status and directions DPM status  About 240 tickets in 10 months  some trivial, others required months  DPM (DMLite bundle 0.7.3) went EPEL-test on the 1st of August  Consolidation, puppet polished  No known core performance issues  NB the old LCGDM stack remained untouched, that has some very well known ones  Epel-prod status in mid-October  Metapackages sent to EMI end of October, available mid Nov  Dmlite bugfix release beginning of November  DPM-dsi (gridFTP) recompiled for the new Globus, beginning of November

07 Dec 2015 IT-SDC DPM status and directions highlights  Space reporting on top directories, in the browser and according to RFC 4331  Drain and replication through HTTP  Many improvements to dmlite-shell, becomes more complete  Many improvements on dmlite logging, hundreds of new messages, round of harmonization for readability  Core now can store multiple checksum types coming from multiple frontends  GridFTP redirection, latest Globus support, solved Globus race conditions.  DPMBox, new fancy WebDAV interface  Sendfile() support in disk nodes  Important ACL fixes  Puppet standalone setup is consolidated

07 Dec 2015 IT-SDC DPM status and directions Release status  Stable situation  We support EPEL/Fedora 5 and 6  Centos7 support is in advanced state  Plan to abandon metapackages, they work, yet cause delays in the releases  They also cause doublechecks, which may be good  Discussion: the individual versions for different packages is perceived as a complication. We should move to limiting proliferation of version numbers in a single release  We already did big steps in this direction. Admins want more  Andrea Manzi gave a very exhaustive description of the release status

07 Dec 2015 IT-SDC DPM status and directions New Web presence  Drupal   Open to the public  Indexed by public search engines  Collaborators can add articles  Links to the historical Wiki  TWiki still somehow more practical to create cross-linked pages  News get published on Twitter

07 Dec 2015 IT-SDC DPM status and directions New build system  The Bamboo service was disrupted during Spring ’15  The recovered service was very poor  We migrated all the plans to Jenkins (they are 52 complex plans !!, for 3 platforms each)  New build system:  Needed to find a solution that continues guaranteeing the total separation between build system and stuff being built  Never touch the build system across commits and versions, just check results  For Jenkins the solution was a few closed scripts  No big complaints, mock builds stayed the same  This gave almost two difficult months. Situation OK since summer.

07 Dec 2015 IT-SDC DPM status and directions The GridFTP quest  This release of sees dpm-dsi 1.9.7: the GridFTP redirection  Goal is to be able to use GridFTP in a scalable way without SRM “redirection”  A long trip, needed to explore all the complex situations  Difficulties interfacing with Globus, required setting strict dependencies in EPEL, and rebuild at every globus update  My impression: a real pain that however guarantees that the software we release has been tested in the currently available set of packages.Not too bad to be forced to doublecheck globus. Should be done anyway.  Last improvement was about checksum calculations  Special thanks to  Andrea Sartirana for helping with testing  Mattias Ellert for helping with the difficult rebuild issues  More details on Andrey’s talk later today

07 Dec 2015 IT-SDC DPM status and directions DPM shell  Feature-rich toolbox for DPM administration  Python-based, expandable, scriptable, command history, autocompletion, …  We decided to harmonize it into an admin tool, beside being just an invoker of DMLite methods  The friendly feeling gives a lot of power to investigate and fix things

07 Dec 2015 IT-SDC DPM status and directions HTTP and WebDAV  Stable support  Good performance, 3 rd party copy, scalable  Interesting initiatives about using a DPM together with clouds, federations, and using S3 backends  Soon sites will be requested HTTP support by WLCG  SAM tests for it are coming  We are moving all the admin tools to use HTTP instead of RFIO. Dpm-drain just came.  Since now, HTTP is mandatory in a DPM setup for the new tools to work  Interesting discussions in the context of the HTTP TF

07 Dec 2015 IT-SDC DPM status and directions Used space reporting on directories  Similar to ‘du’, always up to date for the first levels of the hierarchy  Coincides with the SRM numbers only in the cases where spacetokens have been used linked to directory prefixes  E.g. (very wisely) ATLAS  /atlasscratchdisk /atlasdatadisk  Writes (putdone requests) issued through SRM aren’t accounted for automatically  Only the batch resync tool can see them, and running it regularly is NOT a good idea  To avoid confusing people by now it’s disabled in puppet by default. It’s easy to enable.

07 Dec 2015 IT-SDC DPM status and directions YAIM has been dead for one year  YAIM cannot work anymore for DPM  YAIM breaks the setup  YAIM is not supported  Don’t use it.

07 Dec 2015 IT-SDC DPM status and directions Puppet setup  The setup of a DPM can be done with puppet in standalone mode  YAIM-ish feel. Run it manually in a machine to configure that machine  No need to puppetize the site  My impression is that it’s pretty good, doing several things that yaim could not do reliably or not at all in its best days

07 Dec 2015 IT-SDC DPM status and directions Puppet setup  Puppet is the recommended way to setup a DPM  Recommended also for LFC (e.g. for Belle2)  With LFC also YAIM (untouched since 3 years) and manual can be options  Andrea Manzi showed all the Puppet setup steps in the Puppet Hands on

07 Dec 2015 IT-SDC DPM status and directions Related: Dynafed/DataBridges  The Dynamic Federation project (Dynafed) is a DMLite plugin, hence a close relative to the modern DPM  Stripped-off, ultra-simplified, very low maintenance cost  Used to do HTTP federations and to give secure uniform access to S3/Azure storage  Mentioned a few times during the workshop  When deployed on Cloud Storage, and instrumented with multiple Apache auth plugins it’s commonly called Data Bridge  Very promising solution to harmonize 3 rd party and private Cloud storage and Grid storage in a seamless and extremely scalable way

IT-SDC : Support for Distributed Computing Looking forward

07 Dec 2015 IT-SDC DPM status and directions Development direction  Supposedly final stages of a 4-years long smooth transition  Consolidation, removing rough edges of such a complex system to make DPM the lowest cost grid storage available  Improve support for SRM-less sites and other non-SRM sciences  Keeping of course all the relevant frontends: gridftp, xrootd, http  Cut the ultra-complex dependencies between the DMLite and the LCGDM stacks, making the latter optional  Through simplification and cuts, give some historically difficult features  Coherent checksum calculations, recalculations, (re)checks  File pull/caching callouts (now DPM still has simil-CASTOR1 code buried)  Explore lightweight DPMs only working as file caches  Freespace reporting. Simplified quotas on directories. Can work as spacetokens if they are treated as directories by the VO

07 Dec 2015 IT-SDC DPM status and directions An SRM-less DPM  Almost everything now passes through DMLite or soon will  Compatibility maintained at the DB level  All the relevant client tools use WebDAV  GridFTP can do redirections by itself, without needing SRM calls  HEP heavy workloads can use Xrootd or HTTP  Bristol is already running in SRM-less mode for CMS (with DPM-over- HDFS)  Real world SRM-less ATLAS tests will be done by Sylvain Blunier  For DPM the question is: Can the newer DMLite components live without the historical LCGDM ones ?  At least, the system now is mature for a decise tech step

07 Dec 2015 IT-SDC DPM status and directions 05 Lug 2013 DPM IT-SDC The step 23 VFS Options/Others Oracle S3 HDFS Dynamic Feds (Ugr) Dynamic Feds (Ugr) Legacy daemons rfio dpns srm Legacy clients CSec dpm (daemon) Very difficult to evolve DEV is frozen Got 3X faster in 1.8.7/8 Used less and less Main slowdown until DMLite section dmlite core WebDAV Xrootd mysql profiler adapter gsiftp mem cache mem cache OPTIONAL

07 Dec 2015 IT-SDC DPM status and directions The next step  Challenge: The DMLite stack still relies on the older LCGDM components, through the adapter plugin  DMLite calls needing coordination (e.g. PUT or freespace) are just forwarded to the old dpm daemons  This triggers old friends like rfio, libshift, Csec  DMLite adapter also uses rfio directly  Now we have “used space reporting”, we also need “total/free space reporting” with the same level of simplicity  We also want straightforward quotas  To move on and implement cleanly (means cheap and good) the new features we have to cut the dependency  Eric Cheung did this as a proof-of-concept prototype using fastCGI, codename DPMRest  We will soon start implementing this REST interface to DPM to cut the dependencies to the historical LCGDM sw stack

07 Dec 2015 IT-SDC DPM status and directions Codename DPMRest - Main items  One config file or directory with standard-looking files  clear, readable syntax  statements are position-independent  Config subsystem borrowed from Dynafed  Simple, flexible space management  A cloud-storage like version of quotas on path prefixes  Coincides with the spacetokens when spacetokens used on directory prefixes (e.g. ATLAS)  Spacetokens often rely on the client to provide unobvious info to be accounted. No more need for this.  Multiple checksums management and checksum request queuing  Queued file callouts for implementing site file caches

07 Dec 2015 IT-SDC DPM status and directions DPMRest … when ?  The idea is to start in a few weeks and deliver in Q4/2016  As you may know, CERN IT is undergoing a “reorganization”, and our team goes into the Storage group together with the EOS team  This likely will have little impact on DPM in the short period  Our simplification plans stay

07 Dec 2015 IT-SDC DPM status and directions Conclusion  Has been a great workshop, with very intense participation. THANKS !!  DPM Goals: flexibility, low maintenance, ease of setup  Many discussions around moving from YAIM to standalone Puppet  Goal: native frontends implement the protocols  was a good release, is another step forward. It’s time for the last mile of rationalization  It’s time to address historical topics:  Space reporting  SRM-less operation  Flexible Checksums  Possibility to work with (maybe as ?) file caches  Next step: use REST tech, exportable to other systems, easy to understand by others  Performance: got appreciations for the past work. Meets the CMS requirements, planned improvements for the AAA case  Tuning hints: