STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association GridKa Tier1 Report Christopher Jung
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern) Steinbuch Centre for Computing Dr. Christopher Jung – GridKa Cloud Meeting CPU usage For April 2010: atlas group #jobswalltime [h]CPU time [h] CPU time / walltime average wait time [h] prd542,5691,825,1891,380, sgm2, plt319,701187,583144, usr52,005157,001113, d18,54117,1998,
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern) Steinbuch Centre for Computing Dr. Christopher Jung – GridKa Cloud Meeting Space tokens As of 5th of May, 08:30: ATLAS space token size [GB]used [GB]free [GB]usage DATADISK600,000294,571305, % DATATAPE30, % GROUPDISK10, % HOTDISK1, % MCDISK600,000539,68060, % MCTAPE20, ,8390.8% SCRATCHDISK80,00058,46221, %
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern) Steinbuch Centre for Computing Dr. Christopher Jung – GridKa Cloud Meeting PBS problems Started on 10 th of April PBS mom demon hanging on some WNs (could not access /proc) PBS server hangs, too Implemented a script that automatically reboots PBS when PBS server hangs Script to reboot WNs with kswapd having high CPU load -> did not work as reboot also needs to access /proc Reinstallation of all WNs with new kernel (with hard reboot if necessary) on 15 th and 16 th of April stable PBS since then Possible reasons: Massive NFS problems on 9 th of April (might have been caused by raising ATLAS job limit from 5,000 to 6,000 on 1 st of April) kswapd had high CPU load (kernel bug) causes /proc problems? (also observed by other sites) Additionally, we updated PBS to the latest version this Monday
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern) Steinbuch Centre for Computing Dr. Christopher Jung – GridKa Cloud Meeting June milestone Major increase! Questions: When will disk be needed? All new storage for disk-only? April 2009June 2010 disk2,092 TB4,035 TB tape1,578 TB2,990 TB computing2843 kSI2k (=11372 HEPSPEC06) 4958 kSI2k (=19832 HEPSPEC06)