Accelerating Debugging In A Highly Distributed Environment CHEP 2015 OIST Okinawa, Japan April 28, 2015 Andrew Hanushevsky, SLAC
April 14, 20152CHEP 2015 Motivation Deploying a highly distributed system It’s federated across dozens of sites Sites are administratively independent The system competes with other site priorities Getting site attention can sometimes be difficult Site expertise is highly variable Sites span almost all time zones How do you resolve a site problem?
April 14, 20153CHEP 2015 Problematic Approaches ssh to the site’s problem node Multiple policies, accounts, password, etc This is not scalable May not even be given sufficient access the site admin Painfully slow iterative process The problem may disappear before you are done There must be a better way
April 14, 20154CHEP 2015 What Do You Really Need? Most of the time resolution simply needs Ability to look at the log files Check the configurations files Verify system settings Perhaps debug with a core file Remote access without site peculiarities That is, independent of the site’s setup File naming and locations
April 14, 20155CHEP 2015 Remote Debugging Is Not New Almost all browsers have this capability Standardization being attempted Windows remote desktop Remote gdb via gdbserver There are many more available schemes Usually overly intrusive Require expertise on both ends to use
April 14, 20156CHEP 2015 XRootD Minimalist XRootD Approach digFS An exportable pseudo file system - digFS Uniformly lays out all important information Site admin has complete control What information to export Who can actually see what Using strong authentication & authorization rules Detail logging of accesses and denials Information is strictly read-only Nothing in the site can be changed details
April 14, 20157CHEP 2015 digFS digFS Authorization digFS digFS consults an authorization file Created by the site and specified in config file xrootd.diglib * authfile all [-]conf [-]core [-]logs [-]proc allow gsi host krb5 pwd sss unix g=group h=host n=name o=org r=role ++
April 14, 20158CHEP 2015 core/cmsd core/xrootd digFS The digFS View /=/ logs/cmsd logs/xrootd proc/cmsd proc/xrootd conf conf/etc Virtual exported path (Linux only) (Site specific) Configuration files Core files Log files /proc/self file system All independent of actual Internal layout details
April 14, 20159CHEP 2015 Adding Additional Files Special directive in the config file dig.addconf path [fname] Adds the file pointed to by path to namespace Specifically, appears as /=/conf/etc/fname Defaults to last component of path Useful for supplying site specific information E.g. dig.addconf /etc/init.d/xrootd init.d.xrootd
April 14, CHEP 2015 Deployment Not A Slam-Dunk Firewalls may interfere Standard proxies are of little help Looking to see what can be done here Resistance to even minimalistic scheme Some sites forbid any kind of remote access Some admins dislike any kind of intrusion But, whatever we get is better than nothing
April 14, CHEP 2015 What The Brave Would Like Additional useful abilities… Change certain directive settings Tracing; perhaps others for a specific duration Generate a gcore or pstack Restart the daemon(s) These would be subject to site admin rules
April 14, CHEP 2015 Acknowledgements Current Software Contributors ATLAS: Doug Benjamin, Patrick McGuigan, Ilija Vukotic CERN: Lukasz Janyst, Andreas Peters, Sebastien Ponce, Elvin Sindrilaru Fermi: Tony Johnson Root: Gerri Ganis SLAC: Andrew Hanushevsky, Wilko Kroeger, Daniel Wang, Wei Yang UCSD: Alja Mrak-Tadel, Matevz Tadel UNL: Brian Bockelman WLCG: Mattias Ellert, Fabrizio Furano, David Smith US Department of Energy Contract DE-AC02-76SF00515 with Stanford University Now For The Demo!