Technical Services: Unavailability Root Causes, Strategy and Limitations Data and presentation in collaboration with Ronan LEDRU and Luigi SERIO
Jesper Nielsen – Luigi Serio Introduction What we do today and how Systems monitored Fault recording Technical Infrastructure Organizational Committee Analysis of Events in 2016 and comparisons to previous years Unavailability causes Shares of the faults Major contributors and reasons Electrical perturbations Strategy to improve the analysis, fault tracking and root causes Do more with AFT? Better classification of events ? Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Technical Infrastructure Which are the systems we monitor? Cooling Electricity Safety systems Access system IT network Vacuum RF Power converters QPS Collimation Controls for accelerators Etc. Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Better classification, can we easier match with AFT? Ventilation Failures: - Mechanical - PLC - Human error - Maintenance - Instrumentation Demi Water Cooling & Ventilation Cooling Chilled Water Access Primary Water Electricity “Owner” can be EN-CV Or another group Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
AFT Faults – Are these systems really systems? Some “systems” are groups of “systems” Some “systems” are very low level Should it be decided to follow a common standard of what is a system? Can we introduce “groups” of systems? Technical Infrastructure Cooling Ventilation Doors Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Major events – Where are they follow up? What is the TIOC? Equipment groups Experiments coordinators Technical Infrastructure Machine coordination Monitor, record, analyses events related to the infrastructure systems serving the accelerator complex, the experiments and the computer centre Recommend consolidations paths which would correct situations originating from the reduced maintenance, non-conformities or weaknesses of the technical infrastructure Coordinate bigger technical interventions and incidents. Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Jesper Nielsen – Luigi Serio How is a fault recorded? Accelerator stopped due to a technical fault = Major Event is created A major event has 1 or more “facility stop” attached. Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Are Major Events followed up? All Major Events are presented during the weekly TIOC meeting A report is made by each group involved. The “responsible group” “Users” impacted Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
How are the faults distributed? Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Jesper Nielsen – Luigi Serio Faults by failure type Perturbations: 46% Equipment faults: 31% Controls, instrumentation: 12% Equipment faults: 45% Controls, instrumentation: 26% Perturbations: 15% Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Fault downtimes in Controls and Instrumentation PLC 67% less compared to 2015! Calibration: Water circuits were not calibrated at the best time (opening valves for EPC after calibration) Software: IT router IP tables update, and BE-CO FrontEnd Crash Electronics: Power supply for frontend and internal voltage too low in access rack No communication = Unhappy CRYO operators and very long downtimes… Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Jesper Nielsen – Luigi Serio Fault downtimes in Equipment faults Many faults caused by other equipment in short circuit Calibration = Can we better coordinate restarts? Not suited for usage = Can we integrate reliability in project phase? Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
More or less downtime compared to 2015? CRYO less impacted? Down by 30% Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Electrical perturbations 23 hours 35 hours 3 hours If we filter out all “perturbations” less than 10% in voltage dip that only stop LHC: We reduce by 30% Most are assigned to FMCM equipment. Is this related to beam intensity? These types of events are seen since second half of 2015 More time in Stable Beams? Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Thunderstorms related to number of events seen at CERN? A general report from French analytics: 2016: 19% above the normal in instability 2015: Within the most stable years in the last 30 years Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio
Jesper Nielsen – Luigi Serio Conclusion TIOC coordination has proven very effective Coordinating events like EDF 400kV intervention, minimized downtime Best effort on-call in case of emergencies Good follow-up has been done in several cases this year Animal protection in HV areas GSM Network follow up and many improvements UJ33 flooding: Install racks higher, Release valve modifications, IP67 components. We want to better class our events and would like to use the AFT tool. Compare systems on the same level. Make synchronization possible, if event assigned to Technical Infrastructure we would like a way to attach it to our Major Event Perturbations We cannot avoid them, but we can reduce the impact Less downtime in 2016 compared to 2015 (if we disregard long downtime from weasel incident) Evian 2016 - Technical Infrastructure Jesper Nielsen – Luigi Serio