Lessons Learned - SNS Linac PPS Phased Upgrade to Safety PLC John Bombard Protection Systems Engineer Spallation Neutron Source
Overview Summary of SNS Linac Safety PLC Upgrade Design Lessons Installation Lessons Operational Lessons Questions or Comments?
SNS Linac Safety PLC Upgrade The SNS Linac PPS mitigates radiation hazards from beam or other radiation sources in the Front-End “Ion Source” as well as the linear accelerator tunnel sections 5 Monitored Entry/Exit Doors 15 Emergency Stop / Sweep Stations with Status Lights 43 Controlled Devices: 3 Ion Source, 15 RF Modulators, 25 RF Amplifiers Upgrade effort converts radiation safety controls to SIL rated safety class equipment Get ahead of obsolescence issues using non-safety controllers in a safety application First phase is the conversion of the Front End Ion Source
SNS Linac Safety PLC Upgrade Historically used Allen-Bradley ControlLogix programmable logic controllers Two independent systems, we call “PLC-A” and “PLC-B” Software code maintained by two engineer/programmers Follows A-B recommendations for using ControlLogix in a SIL 2 configuration
SNS Linac Safety PLC Upgrade Upgrade replaces ControlLogix with its safety-rated equivalent GuardLogix Two GuardLogix systems, maintaining PLC-A/PLC-B overall architecture Safety field devices in a “Category 2” (“SIL 1” or “PL b/c”) configuration One Input Device and One Output Device per PLC, with diagnostic monitoring The duplex SIL 1 systems can meet SIL 2 reliability requirements
SNS Linac Safety PLC Upgrade Upgrade is a “phased install” Processor modules and one of five large control racks was updated with GuardLogix safety hardware Racks were a complete remove and replace project with all-new hardware Remaining 4 racks still using all ControlLogix hardware to control safety devices and will be upgraded in a future effort Upgrade required new network infrastructure The GuardLogix safety hardware only works over Ethernet, but can still control non- safety ControlLogix hardware over other busses Lesson: Complete remove/replace allows for extended off-line testing and verification, allowing more efficient use of accelerator downtime to install with most of the intrusive testing already done.
Design Lessons - Hybrid Network Types New GuardLogix safety hardware requires Ethernet communications Existing ControlLogix hardware installed using ControlNet communications Original design had Ethernet between new GuardLogix processor and the new field rack, but kept ControlNet between upgraded field rack and the remaining 4 non-upgraded racks Source: Rockwell Automation Knowledge Base
Design Lessons - Hybrid Network Types Later identified this topology would not work at all in practice due to incompatibilities Caught early during design process Decision made to convert all racks, including those not being upgraded, to Ethernet Added an additional ~$70k in unexpected equipment cost Lesson: Validate new network architecture with manufacturer support engineering or with internal testing
Design Lessons – Integrating Pulse Testing Safety field devices in a “Category 2” configuration for GuardLogix requires the use of “Pulse Testing” The voltage supply feeding a switch or actuator is replaced with a pulse train to constantly verify the input’s ability to sense transitions In an ideal configuration, each device should have a different pulse
Design Lessons – Integrating Pulse Testing Most existing field devices at SNS used intermediate junction boxes to fan out DC power for actuators Example: 4 conductor cable was pulled for 3 signals, however proper pulse test implementation requires 6 conductors Replacing these cables and boxes was outside the scope of this phase Lesson: Proper implementation of diagnostic pulse testing will require design changes to field devices, and likely require new cabling
Design Lessons – Software Coding Methods GuardLogix at a basic level uses two user programs, a “Standard Task” and a “Safety Task” The “Safety Task” is logic run on the redundant high reliability hardware This task can only use designated safety memory and safety I/O directly. Standard I/O may be used with a ”mapping” procedure The “Standard Task” is logic run like a standard ControlLogix At SNS, the PPS logic was already a standard ControlLogix program
Design Lessons – Software Coding Methods Two different programmers are used for the “PLC-A” and “PLC- B Systems The PLC-A programmer chose to re-use the existing ControlLogix program running entirely in the Standard Task, only using the Safety Task to perform the mapping to the Safety I/O that is only accessible from the Safety Task The PLC-B programmer chose to totally re-code the logic from the ground up, correctly isolating Standard and Safety functions to their respective tasks However, similar mapping had to occur in reverse, such that the 4 un-upgraded control racks still controlling safety devices with standard I/O could have safety logic control standard I/O
Design Lessons – Software Coding Methods Lesson: Either way, a hybrid system will never fully meet the intended use of safety/standard tasks; so neither implementation is correct or incorrect. Which way it should be done depends on how much time you have during a particular upgrade PLC-A worked exactly as it previously did with little debugging; but will be harder to make compliant with the manufacturer's “correct” implementation of the Safety PLC when all the racks have been upgraded PLC-B required a lot of debugging and tweaking in the field during this upgrade since the code was all-new, but will be easier to maintain and integrate with additional future rack upgrades Additionally, rewriting one program at a time minimized risk of introducing systemic errors to both programs.
Installation Issues – Operability of Other Systems One of the refurbished cabinets contained a component that was part of a separate Oxygen Deficiency Hazard monitoring/alarm system for cryogenic equipment in the Linac tunnel This system is a required Credited Control when the Linac tunnel is occupied by workers The refurbished cabinets each had their own electrical feed, so all power would need to be under lockout/tagout to safely work in these racks Upgrade work was done during a maintenance outage, locking workers out of the Linac tunnel for several weeks was not possible if the ODH system was down
Installation Issues – Operability of Other Systems A small control box was constructed to house this equipment, and was swapped over to during a shorter outage This temporary box needed safety-class redundant backup power such that it was equivalent to what was removed, making it rather expensive and complex for something that was only needed for a few weeks Lesson: Systems that operate independently should have their own independent enclosures and power systems, such that one may be removed from service without affecting the other
Installation Issues – Field Cable Lengths When the upgraded racks were designed they were laid out to separate standard and safety wiring/power, in a different fashion than the existing racks The I/O was moved one cabinet closer to the field, intended to mitigate any cable length problems pulling the existing cables to the new I/O This wasn’t enough, and many of the “PLC-A” cables were too short to reach their intended termination point Additional rows of intermediate terminal blocks had to be added to extend these connections to their proper locations Lesson: During construction, leave adequate service loops for 1-2 major rack changes during the life of a facility. When doing an upgrade, take the existing layout and cable length into account when designing a new rack.
Operational Issues – Default Configurations Since the GuardLogix hardware requires creating a new software development project for the logic, there are many underlying settings that will be reset to defaults One of these values is a “watchdog timer” that detects when the program gets stuck in a loop or otherwise hangs up or crashes If the program doesn’t finish its scan in the specified time, the controller faults and shuts everything down The default is set to a conservative 20ms, but it needs to be tweaked depending on how much logic needs to execute, how long it takes to communicate with the I/O, etc Shortly after commissioning, one of the systems faulted due to a watchdog over-run, since this value was not changed from the default to something better suited for the communications speed and amount of logic being evaluated Lesson: Ensure that you are familiar with all of the underlying configuration settings that need to be updated before releasing a system to operations.
Final Questions or Comments ?