Status of the Production and Nagios news ALICE TF Meeting 29/07/2010
Status of the production Since yesterday (28/07/2010) ALICE is running out of MC production – Raw data reconstruction: Currently running at CERN (LHC10e). Decrease of the activity during the week – Analysis trains: Ongoing – User analysis: Ongoing – MC production: Finished for the moment. No new MC requirements on pipe
Job profile this week Decrease due to the stop of the MC production
Job profile per users Production clearly dominated by the MC jobs this week As usual, important user analysis activity also this week
Raw data transfers and production Low raw data transfer activity this week: 1.3TB of raw data transferred. (Compatible with the raw data taking regime this week) Around 25TB of raw data recorded in
Status of the sites T1 sites – CNAF: The site has been running a very low number of Alice jobs since more than a week. A GPFS migration caused this problem Still today the number of jobs is low although the operation is finished # jobs should increase in the next hours – RAL ALICE is running over the number of assigned resources Site proposed to put a cap on the number of Alice jobs at This is about 25% of the farm, and is around 10 times Alice's current fairshare allocation, (Alice's current usage is about 65%)This is necessary as the recent high volumes on Alice work caused CMS to run a high priority workload elsewhere.
Status of the sites T2 sites – Subatech will be down starting tomorrow Friday at 16:00 GMT+2 until Monday in the morning. Electrical maintenance In addition some French sites had cooling problems already solved – Grenoble: External network will be down on Saturday, July 31st from 5:30 am till 6:00pm. – Poznan: SE failed during the week, already solved – IPNL: CREAM1.6 migration completed – Torino: CREAM1.6 migration completed – Madrid: SE failing today. Migration activities ongoing. The CREAM system already migrared to CREAM1.6 – Trujillo: Out of production since a long time, in addition SE failing – LBL: SE failing today – Small activities at some Russian sites (new host certificates of the voboxes)
Pending issues Issue reported last week: – Large amount of zombies or extremely long jobs running at the sites (over 46h) Declared as pathological jobs which should be killed Sites were encouraged to whether kill those jobs or decrease the CPU limit time of the ALICE queues to 24h – No news after this during this week
Quattor recipe for the CREAM-CE migration Thanks to Jerome for this instructions – Available at: – ent&view=article&id=46&Itemid=103
Status of Nagios SAM will switched off in September ALL VOBOXES MUST BE PINGABLE AND ACCESIBLE FROM samnag014.cern.ch