Nagios in Power Transmission Utilities Fernando Covatti
Introduction & Agenda Brazilian Electrical Sector Overview CEEE-GT experience within Nagios Core Motivation for Different areas of the company –Telecommunications –Automation –Protection and Control –Supervision Results and Future Plans
Brazillian Electrical Sector Regionalized until mid-1990.
Brazillian Electrical Sector Regionalized until mid Regional companies controlled their respective areas and they could have vertical expertise.
Brazillian Electrical Sector Regionalized until mid Regional companies controlled their respective areas and they could have vertical expertise. Generation, Transmission and Distribution of Electricity.
Brazillian Electrical Sector Regionalized until mid Regional companies controlled their respective areas and they could have vertical expertise. Generation, Transmission and Distribution of Electricity. In the second half of the 1990s, rules changed.
Brazillian Electrical Sector Regionalized until mid Regional companies controlled their respective areas and they could have vertical expertise. Generation, Transmission and Distribution of Electricity. In the second half of the 1990s, rules changed. The increasing interconnectivity of various states created the need to regulate and discipline the electrical sector.
Brazillian Electrical Sector Regionalized until mid-1990 Regional companies controlled their respective areas and they could have vertical expertise. Generation, Transmission and Distribution of Electricity. In the second half of the 1990s, rules changed. The increasing interconnectivity of various states created the need to regulate and discipline the electrical sector. ANEEL and ONS were created.
Brazillian Electrical Sector National Interconnected System (SIN)
Brazillian Electrical Sector National Interconnected System (SIN) Biggest of its kind in the world.
Brazillian Electrical Sector National Interconnected System (SIN) Biggest of its kind in the world. More than 100 thousand km of transmission lines (equal to or higher than 230kV)
Brazillian Electrical Sector National Interconnected System (SIN) Biggest of its kind in the world. More than 100 thousand km of transmission lines (equal to or higher than 230kV) Only 1,7% of Energy used in the country are not in the interconnected system.
Brazillian Electrical Sector National Interconnected System (SIN) Biggest of its kind in the world. More than 100 thousand km of transmission lines (equal to or higher than 230kV) Only 1,7% of Energy used in the country are not in the interconnected system. A failure in a substation or transmission line can impact in the whole country (blackout).
Company Presentation CEEE was founded in 1943.
Company Presentation CEEE was founded in Operates in the 3 main areas of The Brazilian Electrical Sector: Power Generation (G), Transmission (T) and Distribution(D).
Company Presentation CEEE was founded in Operates in the 3 main areas of The Brazilian Electrical Sector: Power Generation (G), Transmission (T) and Distribution(D). The state government has equity control of the company.
Company Presentation CEEE was founded in Operates in the 3 main areas of The Brazilian Electrical Sector: Power Generation (G), Transmission (T) and Distribution(D). The state government has equity control of the company. Considerable Eletrobras participation (~32%), which is the main provider for the federal government
Company Presentation 3,800 employees.
Company Presentation 3,800 employees. 6th largest company in Rio Grande do Sul State (117th largest company in Brazil).
Company Presentation 3,800 employees. 6th largest company in Rio Grande do Sul State (117th largest company in Brazil). Generates 75% of the State Hydroelectricity
Company Presentation 3,800 employees. 6th largest company in Rio Grande do Sul State (117th largest company in Brazil). Generates 75% of the State Hydroelectricity Owns km of transmission lines.
Company Presentation 3,800 employees. 6th largest company in Rio Grande do Sul State (117th largest company in Brazil). Generates 75% of the State Hydroelectricity Owns km of transmission lines. Distributes electrical energy for one third of the State (3.5 million people).
Supervision Area Division was founded in the mid-1970s.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system. With the growth of the system, greater demands were aggregated.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system. With the growth of the system, greater demands were aggregated. New devices were installed.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system. With the growth of the system, greater demands were aggregated. New devices were installed. New demands were made by the regulator.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system. With the growth of the system, greater demands were aggregated. New devices were installed. New demands were made by the regulator. Need to reduce downtime of equipment.
Supervision Area Division was founded in the mid-1970s. Initially focused on the data state of the electrical system. With the growth of the system, greater demands were aggregated. New devices were installed. New demands were made by the regulator. Need to reduce downtime of equipment. Need to remotely control Substations.
Supervision Area Composed mainly of electronic/electrical engineers and technicians.
Supervision Area Composed mainly of electronic/electrical engineers and technicians. Weak Computer knowledge among team members (no course graduation in the IT area).
Supervision Area Composed mainly of electronic/electrical engineers and technicians. Weak Computer knowledge among team members (no course graduation in the IT area). Large gap between new and old employees, due to a large time without new hires.
Supervision Area Composed mainly of electronic/electrical engineers and technicians. Weak Computer knowledge among team members (no course graduation in the IT area) Large gap between new and old employees, due to a large time without new hires. Old concepts and techniques are very difficult to change.
Motivation The amount of data has been growing exponentially.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time. Increasing number of data to be supervised versus selective users interest.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time. Increasing number of data to be supervised versus selective users interest. Several of these data are alarmed for long time.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time. Increasing number of data to be supervised versus selective users interest. Several of these data are alarmed for long time. Disrupting the work of real time staff.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time. Increasing number of data to be supervised versus selective users interest. Several of these data are alarmed for long time. Disrupting the work of real time staff. The maintenance staff is not informed of problems.
Motivation The amount of data has been growing exponentially. Many of these data are not directly linked to real time. Increasing number of data to be supervised versus selective users interest. Several of these data are alarmed for long time. Disrupting the work of real time staff. The maintenance staff is not informed of problems. This leads the system to become discredited.
Motivation Reduction of revenues led to reduction of employees on the long term (retirement and no new hires).
Motivation Reduction of revenues led to reduction of employees on the long term (retirement and no new hires). Telecontrol of substations became a priority in order to reduce Substations operators workforce.
Motivation Reduction of revenues led to reduction of employees on the long term (retirement and no new hires). Telecontrol of substations became a priority in order to reduce Substations operators workforce. Higher availability of systems are required when telecontrol is used.
Motivation Overview Substation Field Devices Substation Protection Realys
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices Operation Centers EMS
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices Operation Centers EMS Operation Center HMI
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices Operation Centers EMS Operation Center HMI National System Operator
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices Operation Centers EMS Operation Center HMI National System Operator Database Servers
Motivation Overview Substation Field Devices Substation Protection Realys Substation Automation Devices Operation Centers EMS Operation Center HMI National System Operator Database Servers Corporate Network
Motivation for Substation Devices Online Graphic Supervision of failure on ethernet based devices inside substations.
Motivation for Substation Devices Online Graphic Supervision of failure on ethernet based devices inside substations. Due to the redundancy and use of RSTP (or other redundancy protocols), the flaws are often unnoticed, and failed devices are not replaced.
Motivation for Substation Devices Online Graphic Supervision of failure on ethernet based devices inside substations. Due to the redundancy and use of RSTP (or other redundancy protocols), the flaws are often unnoticed, and failed devices are not replaced. Preventive Maintenance, mainly in substations implemented with IEC61850.
Motivation for Substation Devices Online Graphic Supervision of failure on ethernet based devices inside substations. Due to the redundancy and use of RSTP (or other redundancy protocols), the flaws are often unnoticed, and failed devices are not replaced. Preventive Maintenance, mainly in substations implemented with IEC Supervision also where there is a 2nd communication channel for the Operation Center.
Motivation for Telecommunications Different communication devices and vendors.
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET).
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet).
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet). Analog and Digital Radio (Serial communication).
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet). Analog and Digital Radio (Serial communication). Power Line Communication.
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet). Analog and Digital Radio (Serial communication). Power Line Communication. Different Management Softwares.
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet). Analog and Digital Radio (Serial communication). Power Line Communication. Different Management Softwares. Architecture only drawn (without online state).
Motivation for Telecommunications Different communication devices and vendors. Multiplexers (SDH and SONET). Switches (Ethernet). Analog and Digital Radio (Serial communication). Power Line Communication. Different Management Softwares. Architecture only drawn (without online state). Most susceptible to failures (shared links with other companies, weather,…).
Nagios Usage Data Excess Problem of the excessive number of points became critical.
Nagios Usage Data Excess Problem of the excessive number of points became critical. There have been a few attempts to solve the problem by reducing the number of points.
Nagios Usage Data Excess Problem of the excessive number of points became critical. There have been a few attempts to solve the problem by reducing the number of points. It did not work for obvious reasons.
Nagios Usage Data Excess Problem of the excessive number of points became critical. There have been a few attempts to solve the problem by reducing the number of points. It did not work for obvious reasons. We need to monitor increasingly data points.
Nagios Usage Data Excess Problem of the excessive number of points became critical. There have been a few attempts to solve the problem by reducing the number of points. It did not work for obvious reasons. We need to monitor increasingly data points. Another attempt was to include filters alarms.
Nagios Usage Data Excess Problem of the excessive number of points became critical. There have been a few attempts to solve the problem by reducing the number of points. It did not work for obvious reasons. We need to monitor increasingly data points. Another attempt was to include filters alarms. These filters alarms end up making users forget most of the filtered points.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions. The interest in this information is not the same for all teams.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions. The interest in this information is not the same for all teams. Also the frequency of monitoring needs to be the same.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions. The interest in this information is not the same for all teams. Also the frequency of monitoring needs to be the same. Thus, we sought to separate roughly into real- time information and maintenance.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions. The interest in this information is not the same for all teams. Also the frequency of monitoring needs to be the same. Thus, we sought to separate roughly into real- time information and maintenance. The real-time system remains SAGE.
Nagios Usage Data Separation Monitoring a growing number of points requires more elaborate solutions. The interest in this information is not the same for all teams. Also the frequency of monitoring needs to be the same. Thus, we sought to separate roughly into real- time information and maintenance. The real-time system remains SAGE. Nagios was introduced as the maintenance system.
Nagios Usage Why Nagios? Stable, was developed over an extensive period of time.
Nagios Usage Why Nagios? Stable, was developed over an extensive period of time. Expandable and customizable, with a wide range of add ons.
Nagios Usage Why Nagios? Stable, was developed over an extensive period of time. Expandable and customizable, with a wide range of add ons. Open software that meets the preferences of the team.
Nagios Usage Why Nagios? Stable, was developed over an extensive period of time. Expandable and customizable, with a wide range of add ons. Open software that meets the preferences of the team. Community of developers and active users.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios. This experiment was not successful.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios. This experiment was not successful. Lack of interest from potential users.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios. This experiment was not successful. Lack of interest from potential users. Fine tuning in Nagios was needed.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios. This experiment was not successful. Lack of interest from potential users. Fine tuning in Nagios was needed. It was needed a person to give daily attention to the maturation process, which didn’t exist.
Nagios Usage First Attempts In the past decade the Telecommunications Area made an attempt to monitor through Nagios. This experiment was not successful. Lack of interest from potential users. Fine tuning in Nagios was needed. It was needed a person to give daily attention to the maturation process, which didn’t exist. Only part of the telecommunications system of the company was monitored.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%. Entry of new members brought new ideas.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%. Entry of new members brought new ideas. Telecommunications System expanded a lot with new multiplexers and switches.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%. Entry of new members brought new ideas. Telecommunications System expanded a lot with new multiplexers and switches. Telecommunications team experienced an influx of new employees.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%. Entry of new members brought new ideas. Telecommunications System expanded a lot with new multiplexers and switches. Telecommunications team experienced an influx of new employees. In 2012, the company lost more than 60% of its revenue due to renovation contracts by the Federal Government.
Nagios Usage Installation Conditions In early 2011, the team was renewed in 50%. Entry of new members brought new ideas. Telecommunications System expanded a lot with new multiplexers and switches. Telecommunications team experienced an influx of new employees. In 2012, the company lost more than 60% of its revenue due to renovation contracts by the Federal Government. This has led to a pressing need for increased monitoring of the system and preventative action.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid-2012.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid Primary focus was to monitor Linux systems.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid Primary focus was to monitor Linux systems. Soon, it expanded to other systems and areas, such as communication status of remote systems.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid Primary focus was to monitor Linux systems. Soon, it expanded to other systems and areas, such as communication status of remote systems. Several features were added in these two years.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid Primary focus was to monitor Linux systems. Soon, it expanded to other systems and areas, such as communication status of remote systems. Several features were added in these two years. Increasing in other areas of the company, like in the Substation Automation.
Nagios Usage Installation Nagios was again considered as a way to monitor the status of various devices. Installation started in mid Primary focus was to monitor Linux systems. Soon, it expanded to other systems and areas, such as communication status of remote systems. Several features were added in these two years. Increasing in other areas of the company, like in the Substation Automation. In June 2014 the system has expanded to a second version, now installed in Telecommunications.
Nagios Usage Panorama Supervision
Nagios Usage Panorama Supervision Telecommunication
Nagios Usage Customized Services Script to check raid disks.
Nagios Usage Customized Services Script to check raid disks. Configuration backup (Manually changed devices).
Nagios Usage Customized Services Script to check raid disks. Configuration backup (Manually changed devices). Configuration check (differences between database and Operation Center configuration).
Nagios Usage Customized Services Script to check raid disks. Configuration backup (Manually changed devices). Configuration check (differences between database and Operation Center configuration). Serial Communication state (RX/TX Bytes).
Nagios Usage Customized Services Script to check raid disks. Configuration backup (Manually changed devices). Configuration check (differences between database and Operation Center configuration). Serial Communication state (RX/TX Bytes). Telecommunication System Devices proprietary protocols (via telnet).
Nagios Usage Customized Services Script to check raid disks. Configuration backup (Manually changed devices). Configuration check (differences between database and Operation Center configuration). Serial Communication state (RX/TX Bytes). Telecommunication System Devices proprietary protocols (via telnet). Expect Language scripts.
Results It has provided notices of failures which could not be detected in a normal situation.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system. Failure of Emergency Control Scheme system.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system. Failure of Emergency Control Scheme system. Failure of backup devices.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system. Failure of Emergency Control Scheme system. Failure of backup devices. Failure of backup communication channels.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system. Failure of Emergency Control Scheme system. Failure of backup devices. Failure of backup communication channels. Reduced response time of maintenance teams to attend occurrences.
Results It has provided notices of failures which could not be detected in a normal situation. Failure of one of the disks in a RAID system. Failure of Emergency Control Scheme system. Failure of backup devices. Failure of backup communication channels. Reduced response time of maintenance teams to attend occurrences. Fault location with an integrated view.
Future and Beyond Transferring Real-time data points to Nagios.
Future and Beyond Transferring Real-time data points to Nagios. Can be expanded to obtain more data from protection relays.
Future and Beyond Transferring Real-time data points to Nagios. Can be expanded to obtain more data from protection relays. Integration within the substation (IEC 61850, DNP LAN).
Future and Beyond Transferring Real-time data points to Nagios. Can be expanded to obtain more data from protection relays. Integration within the substation (IEC 61850, DNP LAN). Increasingly networked devices for substation, easily reaching 50 on today’s equipment.
Future and Beyond Transferring Real-time data points to Nagios. Can be expanded to obtain more data from protection relays. Integration within the substation (IEC 61850, DNP LAN). Increasingly networked devices for substation, easily reaching 50 on today’s equipment. Trend to increase the number of the substation devices.
Future and Beyond Transferring Real-time data points to Nagios. Can be expanded to obtain more data from protection relays. Integration within the substation (IEC 61850, DNP LAN). Increasingly networked devices for substation, easily reaching 50 on today’s equipment. Trend to increase the number of the substation devices. Usage of Nagios in Smart Grids (Bigger Networks)
Future and Beyond Usage of Nagios reports in order to analyze potential points of future failure.
Future and Beyond Usage of Nagios reports in order to analyze potential points of future failure. Provides prospective on where to invest the budget resources.
Future and Beyond Usage of Nagios reports in order to analyze potential points of future failure. Provides prospective on where to invest the budget resources. Relieving the burden of repetitive work.
Future and Beyond Usage of Nagios reports in order to analyze potential points of future failure. Provides prospective on where to invest the budget resources. Relieving the burden of repetitive work. Using Nagios as a tool of "management“: to Decentralized Teams to provide maintenance on failed devices.
Future and Beyond Usage of Nagios reports in order to analyze potential points of future failure. Provides prospective on where to invest the budget resources. Relieving the burden of repetitive work. Using Nagios as a tool of "management“: to Decentralized Teams to provide maintenance on failed devices. Integration with other tools, such as automatic generation of maps, simulators, wiki, etc.
Questions? Any questions? Thanks!
The End Fernando Covatti