Presentation is loading. Please wait.

Presentation is loading. Please wait.

IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team.

Similar presentations


Presentation on theme: "IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team."— Presentation transcript:

1 IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team

2 07 Dec 2015 IT-SDC DPM status and directions Welcome  On Monday and Tuesday the DPM workshop took place  Very dense program, very committed participants, many comments and good ideas  Described the current status and the next technical plans  These slides are just a distilled summary  For more details:  https://indico.cern.ch/event/432642/ https://indico.cern.ch/event/432642/

3 07 Dec 2015 IT-SDC DPM status and directions DPM macro-features  Clusterization of pools of disks and storage in general, local or networked  DB-based metadata persistency  Aims to minimize unnecessary maintenance cost  Multiple data protocols for the same cluster  HTTP, WebDAV, Xrootd, GridFTP1/2  Historical support for the Grid standards  SRM, X509/VOMS  Focus on giving simplified setup tools  Before it was YAIM  Now it’s a YAIM-style standalone puppet template  Focus on support and community-building  DPM exists thanks to its considerable and active user base

4 07 Dec 2015 IT-SDC DPM status and directions Country reports  The morning was filled by the very high quality ‘Country site reports’  Reports, requests, proposals from  Italian sites  Puppet successful pioneers, how to move/delete files FAST, Dynafed+DPM heavy tests  French sites  Large community, would like more emphasis on releases, proposes to unify more subsystems under the same version/packaging, SRM issues, wish enhanced DM operations  Australian sites  Upgrade concerns, SRM issues  UK sites  Concerns over upgrading setup, IO limiting, SRM issues  All these are active in the testbed activity of the DPM collaboration  All these are multipetabyte SEs that foresee growth  The contribution of friend sites has been extremely important to DPM

5 07 Dec 2015 IT-SDC DPM status and directions Belle II  Presentation about the Belle-II computing model  33% of the storage is/will be DPM  Interest in ACLs (DPM has them)  Uses LFC as a file catalogue  Using Xrootd, Interested in expanding to HTTP, seeking for confirmations that ROOT supports it  We gave them … We support actively DAVIX and TDavixFile  Keeping an eye on the Http federations initiatives

6 07 Dec 2015 IT-SDC DPM status and directions DPM status  Infosys numbers  60PBs in total  176 instances  6 sites larger than 2 PB, 18 sites larger than 1PB. The largest so far is 3.3PB  Quite a few plan multiPB expansions in the next year  Main aspects that have been considered and remain our focus  Consolidation, keeping sysadmin cost at the lowest  Performance, scalability  High quality HTTP, WebDAV, Xrootd, GridFTP support  Support. In touch with sysadmins as much as we can  We like how dpm-users-forum is doing

7 07 Dec 2015 IT-SDC DPM status and directions DPM status  About 240 tickets in 10 months  some trivial, others required months  DPM 1.8.10 (DMLite bundle 0.7.3) went EPEL-test on the 1st of August  Consolidation, puppet polished  No known core performance issues  NB the old LCGDM stack remained untouched, that has some very well known ones  Epel-prod status in mid-October  Metapackages sent to EMI end of October, available mid Nov  Dmlite bugfix release beginning of November  DPM-dsi (gridFTP) recompiled for the new Globus, beginning of November

8 07 Dec 2015 IT-SDC DPM status and directions 1.8.10 highlights  Space reporting on top directories, in the browser and according to RFC 4331  Drain and replication through HTTP  Many improvements to dmlite-shell, becomes more complete  Many improvements on dmlite logging, hundreds of new messages, round of harmonization for readability  Core now can store multiple checksum types coming from multiple frontends  GridFTP redirection, latest Globus support, solved Globus race conditions.  DPMBox, new fancy WebDAV interface  Sendfile() support in disk nodes  Important ACL fixes  Puppet standalone setup is consolidated

9 07 Dec 2015 IT-SDC DPM status and directions Release status  Stable situation  We support EPEL/Fedora 5 and 6  Centos7 support is in advanced state  Plan to abandon metapackages, they work, yet cause delays in the releases  They also cause doublechecks, which may be good  Discussion: the individual versions for different packages is perceived as a complication. We should move to limiting proliferation of version numbers in a single release  We already did big steps in this direction. Admins want more  Andrea Manzi gave a very exhaustive description of the release status

10 07 Dec 2015 IT-SDC DPM status and directions New Web presence  Drupal  http://lcgdm.web.cern.ch/ http://lcgdm.web.cern.ch/  Open to the public  Indexed by public search engines  Collaborators can add articles  Links to the historical Wiki  TWiki still somehow more practical to create cross-linked pages  News get published on Twitter

11 07 Dec 2015 IT-SDC DPM status and directions New build system  The Bamboo service was disrupted during Spring ’15  The recovered service was very poor  We migrated all the plans to Jenkins (they are 52 complex plans !!, for 3 platforms each)  New build system: https://jenkins.cern.ch/lcgdm/https://jenkins.cern.ch/lcgdm/  Needed to find a solution that continues guaranteeing the total separation between build system and stuff being built  Never touch the build system across commits and versions, just check results  For Jenkins the solution was a few closed scripts  No big complaints, mock builds stayed the same  This gave almost two difficult months. Situation OK since summer.

12 07 Dec 2015 IT-SDC DPM status and directions The GridFTP quest  This release of 1.8.10 sees dpm-dsi 1.9.7: the GridFTP redirection  Goal is to be able to use GridFTP in a scalable way without SRM “redirection”  A long trip, needed to explore all the complex situations  Difficulties interfacing with Globus, required setting strict dependencies in EPEL, and rebuild at every globus update  My impression: a real pain that however guarantees that the software we release has been tested in the currently available set of packages.Not too bad to be forced to doublecheck globus. Should be done anyway.  Last improvement was about checksum calculations  Special thanks to  Andrea Sartirana for helping with testing  Mattias Ellert for helping with the difficult rebuild issues  More details on Andrey’s talk later today

13 07 Dec 2015 IT-SDC DPM status and directions DPM shell  Feature-rich toolbox for DPM administration  Python-based, expandable, scriptable, command history, autocompletion, …  We decided to harmonize it into an admin tool, beside being just an invoker of DMLite methods  The friendly feeling gives a lot of power to investigate and fix things

14 07 Dec 2015 IT-SDC DPM status and directions HTTP and WebDAV  Stable support  Good performance, 3 rd party copy, scalable  Interesting initiatives about using a DPM together with clouds, federations, and using S3 backends  Soon sites will be requested HTTP support by WLCG  SAM tests for it are coming  We are moving all the admin tools to use HTTP instead of RFIO. Dpm-drain just came.  Since now, HTTP is mandatory in a DPM setup for the new tools to work  Interesting discussions in the context of the HTTP TF

15 07 Dec 2015 IT-SDC DPM status and directions Used space reporting on directories  Similar to ‘du’, always up to date for the first levels of the hierarchy  Coincides with the SRM numbers only in the cases where spacetokens have been used linked to directory prefixes  E.g. (very wisely) ATLAS  /atlasscratchdisk /atlasdatadisk  Writes (putdone requests) issued through SRM aren’t accounted for automatically  Only the batch resync tool can see them, and running it regularly is NOT a good idea  To avoid confusing people by now it’s disabled in puppet by default. It’s easy to enable.

16 07 Dec 2015 IT-SDC DPM status and directions YAIM has been dead for one year  YAIM cannot work anymore for DPM  YAIM breaks the setup  YAIM is not supported  Don’t use it.

17 07 Dec 2015 IT-SDC DPM status and directions Puppet setup  The setup of a DPM can be done with puppet in standalone mode  YAIM-ish feel. Run it manually in a machine to configure that machine  No need to puppetize the site  My impression is that it’s pretty good, doing several things that yaim could not do reliably or not at all in its best days

18 07 Dec 2015 IT-SDC DPM status and directions Puppet setup  Puppet is the recommended way to setup a DPM  Recommended also for LFC (e.g. for Belle2)  With LFC also YAIM (untouched since 3 years) and manual can be options  Andrea Manzi showed all the Puppet setup steps in the Puppet Hands on

19 07 Dec 2015 IT-SDC DPM status and directions Related: Dynafed/DataBridges  The Dynamic Federation project (Dynafed) is a DMLite plugin, hence a close relative to the modern DPM  Stripped-off, ultra-simplified, very low maintenance cost  Used to do HTTP federations and to give secure uniform access to S3/Azure storage  Mentioned a few times during the workshop  When deployed on Cloud Storage, and instrumented with multiple Apache auth plugins it’s commonly called Data Bridge  Very promising solution to harmonize 3 rd party and private Cloud storage and Grid storage in a seamless and extremely scalable way

20 IT-SDC : Support for Distributed Computing Looking forward

21 07 Dec 2015 IT-SDC DPM status and directions Development direction  Supposedly final stages of a 4-years long smooth transition  Consolidation, removing rough edges of such a complex system to make DPM the lowest cost grid storage available  Improve support for SRM-less sites and other non-SRM sciences  Keeping of course all the relevant frontends: gridftp, xrootd, http  Cut the ultra-complex dependencies between the DMLite and the LCGDM stacks, making the latter optional  Through simplification and cuts, give some historically difficult features  Coherent checksum calculations, recalculations, (re)checks  File pull/caching callouts (now DPM still has simil-CASTOR1 code buried)  Explore lightweight DPMs only working as file caches  Freespace reporting. Simplified quotas on directories. Can work as spacetokens if they are treated as directories by the VO

22 07 Dec 2015 IT-SDC DPM status and directions An SRM-less DPM  Almost everything now passes through DMLite or soon will  Compatibility maintained at the DB level  All the relevant client tools use WebDAV  GridFTP can do redirections by itself, without needing SRM calls  HEP heavy workloads can use Xrootd or HTTP  Bristol is already running in SRM-less mode for CMS (with DPM-over- HDFS)  Real world SRM-less ATLAS tests will be done by Sylvain Blunier  For DPM the question is: Can the newer DMLite components live without the historical LCGDM ones ?  At least, the system now is mature for a decise tech step

23 07 Dec 2015 IT-SDC DPM status and directions 05 Lug 2013 DPM IT-SDC The step 23 VFS Options/Others Oracle S3 HDFS Dynamic Feds (Ugr) Dynamic Feds (Ugr) Legacy daemons rfio dpns srm Legacy clients CSec dpm (daemon) Very difficult to evolve DEV is frozen Got 3X faster in 1.8.7/8 Used less and less Main slowdown until 1.8.9 DMLite section dmlite core WebDAV Xrootd mysql profiler adapter gsiftp mem cache mem cache OPTIONAL

24 07 Dec 2015 IT-SDC DPM status and directions The next step  Challenge: The DMLite stack still relies on the older LCGDM components, through the adapter plugin  DMLite calls needing coordination (e.g. PUT or freespace) are just forwarded to the old dpm daemons  This triggers old friends like rfio, libshift, Csec  DMLite adapter also uses rfio directly  Now we have “used space reporting”, we also need “total/free space reporting” with the same level of simplicity  We also want straightforward quotas  To move on and implement cleanly (means cheap and good) the new features we have to cut the dependency  Eric Cheung did this as a proof-of-concept prototype using fastCGI, codename DPMRest  We will soon start implementing this REST interface to DPM to cut the dependencies to the historical LCGDM sw stack

25 07 Dec 2015 IT-SDC DPM status and directions Codename DPMRest - Main items  One config file or directory with standard-looking files  clear, readable syntax  statements are position-independent  Config subsystem borrowed from Dynafed  Simple, flexible space management  A cloud-storage like version of quotas on path prefixes  Coincides with the spacetokens when spacetokens used on directory prefixes (e.g. ATLAS)  Spacetokens often rely on the client to provide unobvious info to be accounted. No more need for this.  Multiple checksums management and checksum request queuing  Queued file callouts for implementing site file caches

26 07 Dec 2015 IT-SDC DPM status and directions DPMRest … when ?  The idea is to start in a few weeks and deliver in Q4/2016  As you may know, CERN IT is undergoing a “reorganization”, and our team goes into the Storage group together with the EOS team  This likely will have little impact on DPM in the short period  Our simplification plans stay

27 07 Dec 2015 IT-SDC DPM status and directions Conclusion  Has been a great workshop, with very intense participation. THANKS !!  DPM Goals: flexibility, low maintenance, ease of setup  Many discussions around moving from YAIM to standalone Puppet  Goal: native frontends implement the protocols  1.8.9 was a good release, 1.8.10 is another step forward. It’s time for the last mile of rationalization  It’s time to address historical topics:  Space reporting  SRM-less operation  Flexible Checksums  Possibility to work with (maybe as ?) file caches  Next step: use REST tech, exportable to other systems, easy to understand by others  Performance: got appreciations for the past work. Meets the CMS requirements, planned improvements for the AAA case  Tuning hints: https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/TuningHintshttps://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/TuningHints


Download ppt "IT-SDC : Support for Distributed Computing DPM workshop 2015 Speaker: Fabrizio Furano, on behalf of the DPM team."

Similar presentations


Ads by Google