ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago January 2014 Alarm system A.Caproni
ICT-CPM January 2014 Alarm system status According to operators the alarm panel is useless Too many alarms Stale alarms False alarms Result of a 4h profiling by Patricio (mid Nov 2013) ~31k alarms ACTIVE TERMINATE Pri 0: 41 PRi 1: 1820 Pri 2: 500 Pri 3: Insufficient coverage: Scripts and tools not provided by ALMA computing
ICT-CPM January 2014 Snapshot - 1
ICT-CPM January 2014 Snapshot - 2
ICT-CPM January 2014 Snapshot - 3
ICT-CPM January 2014 AS improvement plan (proposal) Show only “real alarms”, remove the others (trust) Useful documentation in panel (twiki?) Fix most chattering alarms DGCK:*:1, DGCK:*:4 FLOOG,*,7 Fix stale alarms Manager,*,1 LO2BBpX:*:1, LO2BBpX:*:10, LO2BBpX:*:11 WCA:*:1 Improve system startup and device initialization Profile during operations like array creation/destruction, total power… TMCDB configuration (input from System Engineering for BACI props)
ICT-CPM January 2014 AS improvement plan (proposal) ACS next improvements Alarm server to dump alarms on files (ICT-1908) Offline profiling Correlate alarms and logs while debugging (?) After the facts GUIs and tools Alarm panel to group alarms belonging to the same array (ICT- 1760) Nominate a “Alarm System Manager” Regularly profile the AS Check and update the documentation
ICT-CPM January 2014 ACS handed over to OSF after fixing persistence and NCs RTI/DDS tested with 48 antennas Number of alarms expected to grow having more antennas Alarm system performance AS persists alarms in memory Already decoupled from source NC ACS “new” AlarmSource API avoid resending a alarm if its state did not change Enable/disable alarm sending Queuing of alarms Scalability