A. Srivastava, S. Pandey, P. Banerjee, Y. Wu

SyncAD: Ensemble Based Data Mining Tool for Anomaly Detection In PMU data and Event Detection
A. Srivastava, S. Pandey, P. Banerjee, Y. Wu Smart Grid Demonstration and Research Investigation Lab (SGDRIL) Energy System Innovation center, Washington State University Contact: JSIS October 2017

Synchrophasor Data Quality
Data quality Issues may develop from: Dropouts/packet loss Latency Repeated values Measurement bias Noise in measurement system Bad/missing timestamps Loss of GPS synchronization Incorrect signal meta data Poor server performance Improper device configurations Incorrect phase sequence Missing phase connection Cyber attack How to detect? Anomaly detection PMU Data Quality Conformance testing In lab testing Remote testing after installation Statistical and data mining approaches Cleaning streaming data Cleaning archival data Physics based bad data detection Hybrid or linear state estimation Substation level state estimation 2

Options? Linear regression Chebyshev method
find straight line 𝑦=𝛼+𝛽𝑥 to provide a "best" fit for the data points w.r.t least-squares Chebyshev method Determine a lower bound of the percentage of data that exists within k standard deviations from the mean. μ: mean, σ: standard deviation, k: number of standard deviations from the mean. Amidan, Brett G., Thomas A. Ferryman, and Scott K. Cooley. "Data outlier detection using the Chebyshev theorem." Aerospace Conference, 2005 IEEE. IEEE, 2005.

Does standalone method suffice?
DBSCAN DBSCAN uses two thresholds radius ε and min. A data point is a center node if it has more than min ε-neighbors (points within distance ε); Two centers are reachable if they are in ε-neighbor of each other; a cluster is a sequence of reachable centers and their ε-neighbors New clusters is formed after the event ends. Points far away from any cluster are outliers. Current During events the data points get out of reach of a cluster i.e. its greater than radius ε of the boundary points. – this is changed. Time Does standalone method suffice?

No Single Winner! Lack of training data Needs tuning effort

Anomaly detection with Ensemble
Data X Data Window from PMU/PDC Outlier Scores 1. Base Detectors (online) Learning Data X Regression Chebyshev DBSCAN 3. MLE-Ensemble 2.Normalization of Base Detector Scores fi ,fj ,fk FNormalized D1 D2 D3 Model YMLE (α , β) Inference 4. Inference Algorithm Detection of Transient Window Using Prony Analysis 5. Unflagging Anomalies detected in Transient Window 6. Bad Data Detected

Maximum Likelihood Estimator (MLE)
No Single Winner! -> ensemble-based Normalized Scores Needs tuning effort -> learning best integration Lack of training data-> Unsupervised detection FNormalized sensitivity: fraction of “correctly” identified outliers specificity: fraction of “correctly” identified non-outliers MLE-Ensemble Data Set X Compute Sensitivity Ψ and Specificity Ƞ Ψ, Ƞ Learn Weights α and β α , β Final learned weights α , β FNormalized Using EM algorithm fit YMLE

Inference and Anomaly Detection
After the MLE-Ensemble step, weights of each base detector is learned which is YMLE . New data set using these weights and the Normalized scores of the base detectors the inference algorithm makes decision on bad data. α ,β (the learned model) Inference and Outlier Detector Normalized Score FNormalized Using YMLE and new Data Set label Outliers Outliers Detected

Performance Metrics for Ensemble Based Technique
Given a PMU detector D and PMU data X, denote the actual anomaly data set as 𝐵 𝑇 , and the anomaly reported by D as 𝐵 𝐷 , the performance of D is evaluated using three metrics as follows. Precision Precision measures the fraction of true anomaly data in the reported ones from D, defined as Recall Recall measures the ability of D in finding all outliers, defined as False Positive False positive (FP) evaluates the possibility of false anomaly data detection; the smaller, the better.

Simulation results for SyncAD
Tests on the RTDS simulated PMU data (1.5 hours) Recall Precision False positive Linear Regression 0.9021 0.8565 0.1435 DBSCAN 0.8821 0.1179 Chebyshev 0.9154 0.8754 0.1246 MLE ensemble 0.9251 0.8913 0.1087 Tests on the RTDS simulated PMU data (1.5 hours, 5% bad data points, 5%-10% range) Recall Precision False positive Linear Regression 0.7854 0.7655 0.2345 DBSCAN 0.7216 0.7015 0.2985 Chebyshev 0.8125 0.7542 0.2458 MLE ensemble 0.8912 0.9021 0.0979 This Result was presented by Dr. Wu it can be removed from the results on the earlier slide. Tests on the RTDS simulated PMU data (1.5 hours, 10% bad data points, 10%-20% range)

Results with SyncAD using Real PMU Data
This Result was presented by Dr. Wu it can be removed from the results on the earlier slide.

Synchrophasor Anomaly Detection (SyncAD) tool

Flowchart for the proposed technique

Architecture of the Event Detection technique
X Data Window from SyncAD Get New Data Window V I Fz Cluster Change Detected? DBSCAN Algorithm Computation of Active and Reactive Flows P No Q Yes Events Detected? Collection Of Cluster Change instances in V, I, P, Q and Fz Decision Tree No Level 2 Level 1 Reactive Power Active Power Fault Yes Undetected Events

Simulation Results for Event Detection
PMU Data was obtained from RTDS. PMU was placed on Bus-6 (line 6-11), Bus-8 (line 8-load), Bus-9 (line 9-7) , Bus-10 (line 9-10) and Bus 2 (line 2-Gen). Current through lines 6-11, 8-load, 9-7 , 10-9 and 2-Gen were observed. Cap bank was placed on bus 9. Three phase fault was simulated on bus 10. Load change (P,Q) was done on bus 8.

Results on Industry Data Reactive Event (PMU No.)
Sno. Time (sec) Active Event (PMU No.) Reactive Event (PMU No.) Fault Event (PMU No.) 1 44 5 2 111 3 385 1,2,3,4,5 4 465 2,5 471 6 477 7 558 8 638 Here It can be seen that the Fault event that was labelled by the western interconnect as frequency events was detected. The algorithm also detected some local Active and Reactive Power events which can be seen in the graphs and can be backed up by the RTDS simulation results.

Summary The MLE-Ensemble technique results in high Recall for Bad data detection. The SyncAD tool was successful in differentiating between the bad data and events resulting in high precision. Event detection algorithm was successful in detecting the simulated events and it worked well on the industry data as well. The DBSCAN and Decision tree based algorithm is fast for real time use. Event Detection Algorithm was applied on labelled event Data. It is possible to determine the exact location and event if the topology is known and PMUs are placed to ensure full observability.

Contact: anurag.k.srivastava@wsu.edu
9/6/2018 Contact: Template I-Aqua curve

A. Srivastava, S. Pandey, P. Banerjee, Y. Wu

Similar presentations

Presentation on theme: "A. Srivastava, S. Pandey, P. Banerjee, Y. Wu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A. Srivastava, S. Pandey, P. Banerjee, Y. Wu

Similar presentations

Presentation on theme: "A. Srivastava, S. Pandey, P. Banerjee, Y. Wu"— Presentation transcript:

Similar presentations

About project

Feedback