NERC Lessons Learned Summary LLs Published in September 2015
Two NERC lessons learned (LL) to be published in September and one is pending LL Loss of EMS Communications Due to Lack of Validation on EMS Database RTU Configuration Parameter LL – Relay Design and Testing Practices to Prevent Scheme Failure Pending Loss of EMS due to RTU LAN UPS failure NERC Lessons Learned Published in 2015
Commissioning new remote terminal unit (RTU) results in EMS outage One point incorrectly configured results in termination of the remote communication server Process coordinates the polling and commands of RTUs and the insertion of telemetry into the real time database Automatic failover (from primary and back-up) did not correct issue since the error was in the database configuration EMS lost communications with all RTUs State Estimator and Contingency Analysis had stale data Most changes verified prior to execution but the database configuration parameters were not checked Loss of EMS Communications Due to Lack of Validation on EMS Database RTU
Corrective Actions Parameter configuration was corrected Script was run to confirm the configuration error doesn’t exist elsewhere Revised procedure to confirm EMS database RTU configuration is correct in quality assurance prior to uploading a database change to production Created a case with the EMS vendor documenting the defect Vendor will provide a new release with the defect resolved Loss of EMS Communications Due to Lack of Validation on EMS Database RTU
Lessons Evaluate parameters being checked by the database editor to determine if there are any gaps in the parameter validation If so, establish procedures for manual validation SCADA software should be designed to generate error messages and avoid termination upon an incorrect parameter Recovery strategy and procedures should be developed for quick recovery from failed updates Loss of EMS Communications Due to Lack of Validation on EMS Database RTU
Single-phase to ground fault occurred on a 230 kV three-terminal feeder Associated breakers tripped then the line reclosed at one end (as designed) Two protection equipment failures on two separate relay systems prevented the proper clearing after the automatic reclose 1: Loose connection in the trip aux relay coil cutoff contact string 2: Believed to be intermittent connection issue Fault evolved to multi phase and remained on the system for 58 seconds Breaker Failure didn’t initiate since it was tied to same aux trip relay Fault cleared by backup ground protection on two 500 kV lines Relay Design and Testing Practices to Prevent Scheme Failure
Corrective Actions Transmission Owner (TO) evaluated existing procedures to align periodic relay testing with planned transmission outages and perform circuit breaker trip testing 52a contacts rewired in parallel rather than in series Replaced a relay as needed New relay eliminates need for the 52a contacts in the breaker trip circuit Relay Design and Testing Practices to Prevent Scheme Failure
Lessons Key issue: Breaker Failure Initiate (BFI) signal originated from auxiliary trip relay Options: Use separate contact to provide BFI Use dedicated aux relay for BFI Connect protective relay trip contact directly to breaker failure relay input if relay will accommodate voltage input Avoid use of 52a contacts in series Consider periodic functional tests Relay Design and Testing Practices to Prevent Scheme Failure
Rack mounted uninterruptible power supply (UPS) Failed resulting in loss of remote terminal unit (RTU) local area network (LAN) and then the loss of EMS and ICCP for 50 minutes UPS added to accommodate EMS/SCADA hardware upgrade Remained in service as a dual layer UPS UPS battery pack failed and the UPS did not have an internal bypass to house power Failover was not available since the front end processor for the RTU LAN at the back-up was pointed at a replacement system’ The entity was in transition to a new software for their EMS Loss of EMS Due to RTU LAN UPS failure
Corrective Actions RTU LAN routers were plugged into another UPS and rebooted, restoring EMS visibility and ICCP link connectivity FEP at the remote site was reconfigured to allow sending data to both the replacement SCADA/EMS and the existing SCADA/EMS systems simultaneously by using additional IP addresses Primary and Backup Routers at the Primary Site were plugged into alternate UPSs Additional (temporary) rack mounted UPSs will be removed upon cutover to the replacement SCADA/EMS system (Vendor 1 to Vendor 2) Loss of EMS Due to RTU LAN UPS failure
Lessons UPS systems should be checked to verify they have an internal bypass Periodic maintenance and monitoring on any UPS system is beneficial Some UPS systems perform battery maintenance/cycling internally but additional checks may be performed to verify functionality Loss of EMS Due to RTU LAN UPS failure
Directions to Lessons Learned: Go to > “Program Areas & Departments” tab > “Reliability Risk Management” (left side menu) > “Event Analysis” (left side menu) > “Lessons Learned” (left side menu) NERC’s goal with publishing lessons learned is to provide industry with technical and understandable information that assists them with maintaining the reliability of the bulk power system. NERC requests that industry provide input on lessons learned by taking the short survey. The survey link is provided on each Lesson Learned. Link to Lessons Learned
Questions?