Download presentation
Presentation is loading. Please wait.
1
Kevin Thaddeus Flood University of Wisconsin
Tier-2 Workshop Kevin Thaddeus Flood University of Wisconsin 19 April 2018
2
Tier-2 Workshop Tier-2 workshop on Thursday 16:00-18:00 (40-2-A01)
Users/Admins roundtable discussion Identify/address problems Share solutions Standardize “best practices” Several EWK users/admins shared their experiences Frank Wuerthwein Dmytro Kovalskyi Ezio Torassa Pablo Garcia Kalanand Mishra Thanks! 19 April 2018
3
EWK user experience is generally very positive
Tier-2 User Feedback EWK user experience is generally very positive Able to successfully run/manage large numbers of jobs, merge output files, efficiently generate physics submit jobs through both CRAB and local batch manager publish output datasets to DBS T2s have good login pools, large RAM, fast (and multi-core) machines shorten compile/link times PhEDex dataset transfer tools easy to use Fast network links between user sites and T2s 19 April 2018
4
Tier-2 User Feedback Some issues still to be addressed
Batch priorities sometimes seem out of balance, what is optimal assignment of batch slots when competing tasks collide Widely varying job failure rates across sites Job failure rate scales with number of input events, perhaps due to memory leaks with more failures where less available memory resource – more of a problem with earlier releases before 2_1_X Difficult to debug failures in writing output files for remote jobs back to user facility (FNAL user problem, so likely T2 issue) No efficient/easy method for user file transfer offsite from T2 login pool disk “copy [file] into my /store/user at UCSD, then lcg-cp it into castor, and back out onto my desktop [at CERN]” Grid tools for managing /store/user are difficult, much easier when possible to manage area interactively from login pool No standardized Linux installation at T2s favorite/useful tools missing in some cases out-of-date/insecure tools in some cases (Firefox 1.5!!) 19 April 2018
5
Tier-2 Admin Issues Need standard procedures to link PAG/POG users to T2 admins for routine and special communication Something more than announcement/complaint lists Publishing datasets works well for storage management, but unpublished datasets provide unregulated storage “backdoor” “30% of the used disk space in Legnaro is written with this "back door". Is difficult to follow what people [are] doing, it will be a hard work to clean DBS and files when this usage will become too large.” No good replication/backup options for dCache There are two but don’t scale well to large numbers Connecting problematic batch jobs (e.g., large wall time, small CPU cycles) to particular users can be difficult Could use more info on how to use VOMRS/siteDB 19 April 2018
6
Tier-2 PAG/POG Convenor Feedback
Lots of spare CPU cycles available for private production but data management tools difficult for users/admins Analysis vs production queues need balance in favor of physics users rather than production in Wisconsin, production sleeps when there are pending analysis jobs Need management tools for PAG/POG convenors available/consumed storage reports generated periodically and at thresholds (80/90/95/99% full) formal accounting for the officially allocated storage space (exact specification of what counts against storage cap) 19 April 2018
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.