Presentation is loading. Please wait.

Presentation is loading. Please wait.

Julia Andreeva on behalf of the Dashboard team. 45 machines (14 dedicated to cms) 9% still on SLC4 (two weeks ago 30%) Services.

Similar presentations


Presentation on theme: "Julia Andreeva on behalf of the Dashboard team. 45 machines (14 dedicated to cms) 9% still on SLC4 (two weeks ago 30%) Services."— Presentation transcript:

1 Julia Andreeva on behalf of the Dashboard team

2 http://dashb-monit.cern.ch 45 machines (14 dedicated to cms) 9% still on SLC4 (two weeks ago 30%) Services migrated during the last 2 weeks: Dashboard web server, build server, GoogleEarth (http://dashb-earth/?vo=cms only from windows  ), SAM availability portal, dashb-nagios-cms-dev, MonALISA collector, …http://dashb-earth/?vo=cms All the migrations were transparent: Installing new machine, and when ready, switch alias) Still to migrate: 2 ML collectors and dashb-sam-cms

3 Needed for ATP and downtime calendar Up to now: Checking Dashboard DB for services that had executed SAM tests in the last month, and defined in SiteDB This approach won’t work with ATP/MDDB(POEM)/MRS/ACE CMS is supposed to specify which services to test Not easy for site administrators to spot problems Need someone from the dashboard to fix it… New approach (thanks to Andrea & Pepe) Checking the BDII and SiteDB Independent of Dashboard DB Easier for site administrators to fix issues Depends on BDII (got several timeout while testing it) Introduce cache in case BDII unavailable

4 Up to now, downtime announcements were verified once. If it affected CMS services, inserted in DB It does not work if topology changes: For instance, services retirement ( CERN in downtime until 2020 ) Now, we compare active and future downtimes with current topology The ‘future downtimes’ will change if the services don’t belong to CMS

5 Progressing well JobExitCode and job execution metrics (CPU,IO, etc…) are not yet reported from the WN for all WMAgent jobs, but for some tasks they are reported already Job execution metrics are not yet recorded in the Dashboard DB. This is in todo list. Have to define with dataops team how job execution metrics information has to be exposed. Aggregation level (workflow?)

6 KIT PICCNAF FNALRALASGC

7 Pretty good consistecy over certain periods of time. There are periods when Dashboard sees more jobs. It is due to the fact that WMAgent test tasks are not seen in RequestOverview Looks like there are also tasks which do not report to Dashboard (WMAgent sees more jobs than Dashboard does). But in general looks like all differences are explained by the fact that some tasks are not visible either in one or in another system. Continue to keep an eye on the consistency of information in both sources. Thanks a lot to Olga Kodolova for implementing the checks and to Steve Foulkes for guidance and collaboration


Download ppt "Julia Andreeva on behalf of the Dashboard team. 45 machines (14 dedicated to cms) 9% still on SLC4 (two weeks ago 30%) Services."

Similar presentations


Ads by Google