Download presentation
Presentation is loading. Please wait.
Published byToby Sherman Modified over 8 years ago
1
Operational support for Tier1 P: Matteuzzi
2
Operational support for Tier1 Controls of: a) Infrastructural problems b) Sw problems a) Detected by the “alarm system” (http://alarm.cnaf.infn.it/webclient/accwebmgr.asp) It allows control on: Refrigerator, UTA, UPS, Electric Generator, Transformer, Temperature in T1 room, Accesses to room, Fire extinction system Alarms: minor alarms sent by Mail and SMS major alarms sent by telephone calls
3
Operational support for Tier1 b) Nagios system allows control on: switches, geographical links, Log servers, DNS, Mailing, Storage servers, Farming Alarms sent by Mail and SMS We are now organizing the service in this way: For item a): ~30 persons (from all services including Grid) on weekly shift during the year; 2 weeks/year for each person on-call 24h For minor alarms, action asap For major alarms, immediate action (in 30’)
4
Operational support for Tier1 For item b): ~ 20 persons (synergy required with Grid team) Direct control during work-time On-call out of work-time Action via Remote access with ADSL Problem solution as best effort
5
Tools for operational support for Tier1 Alarm system Monitoring system on Farm and LSF Queues Monitoring system on Storage and Castor (by Nagios) Monitoring system on Network (by Nagios)
6
Human resources for the infrastructure problems Now: 1 coordinator and 1 technician (Strongly insufficient. Big effort to mantain the actual system and study solutions to optimize the entire infrastructural system) Trend: 1 coordinator, 2 technicians + 2 temporary contracts (In future we must control more than 3.5 KW, 3 Transformers, 3 Electric Generator, 3 UPS, Refrigerator for 1.5 KW of refrigeration power)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.