11th April 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Liverpool 11 th April 2003
Tim Adye2 BaBar Batch CPU Use at RAL
11th April 2003Tim Adye3 BaBar Batch Users at RAL (running at least one non-trivial job each week)
11th April 2003Tim Adye4 Kanga Disk Saga In December we had filled up all ~20 TB at RAL Freed up some space by deleting (most) old Series-8 data and started importing the backlog A minor upgrade of our old data server on 19 Feb, csfsun02, prompted a major loss of data Recovered 1.3 TB scavenged from csfsun02 disks 1.4 TB re-imported from SLAC disk 0.3 TB restored from SLAC HPSS Half way through recovering, discovered that csfsun02 was still bad. All data migrated to borrowed servers. All Kanga data restored and up-to-date with SLAC production on 28 March.
11th April 2003Tim Adye5 Security Incident SucKIT Linux root exploit has been spreading throughout the HEP community An infected machine records all passwords typed on that machine Includes passwords used to connect to other machines ssh included; fortunately not klog It’s not unlikely that CSF passwords have been compromised by another system To protect CSF from further attack, all passwords that have been used recently were reset Tuesday Users contacted by phone and post I can give you your new password today
11th April 2003Tim Adye6 Linux Upgrade Nearly all machines at RAL now run RedHat 7.2 Exceptions are babar-old.gridpp.rl.ac.uk front-end (AKA csfc ) Will be switched off next week babarbuild batch queue RH72 batch workers can run RH6 jobs, but RH72 machines can’t build code in release analysis-13 and before, so Upgrade to analysis-13b or later Use the babarbuild queue to compile and link; run in the normal queues
11th April 2003Tim Adye7 CSF Batch System Much work behind the scenes Reliability and optimising queuing algorithms Use bbrbsub to submit, eg. bbrbsub -l cput=01:00:00 BetaApp myAnalysis.tcl bbrbsub is a wrapper for qsub, so you can use qsub options (see “ man qsub ”)
11th April 2003Tim Adye8 Recently Planned Improvements – 1 Since November Install dedicated import-export machines Fast (Gigabit) network connection Special firewall rules to allow scp, bbftp, bbcp, etc. Two new RH72 Linux machines csfmove01.rl.ac.uk for exports AFS authentication improvements PBS token passing and renewal integrated login (AFS token on login, like SLAC) Not yet implemented
11th April 2003Tim Adye9 Objectivity support Works now for private federations, but no data import First step will be to provide Objy conditions database access Objy conditions snapshot installed by Tim Barrass… Then we lost our Objy server, csfsun02 Upgrade Suns to Solaris 8 and integrate into PBS 4 x 4-CPU Solaris 8 systems now available in babarsol queue, eg. bbrbsub –q babarsol job.sh Recently Planned Improvements – 2 Since November
11th April 2003Tim Adye10 Support Grid “generic accounts”, so special RAL user registration is no longer necessary Users without an entry in the grid-mapfile will be assigned to babar001, babar002, … babar050 The pool account will forever more be bound to that certificate DN, so you will always run under the same babar0NN Recently Planned Improvements – 3 Since November
11th April 2003Tim Adye11 Support For help, post to “RAL Tier A” HyperNews forum; or contact Emmanuel Olaiya (at SLAC) or me (at RAL)