A. Srivastava, S. Pandey, P. Banerjee, Y. Wu

Slides:

Advertisements

Similar presentations

GHEORGHE GRIGORAS GHEORGHE CARTINA MIHAI GAVRILAS

Advertisements

Introduction to IRRIIS testing platform IRRIIS MIT Conference ROME 8 February 2007 Claudio Balducelli.

Unsupervised Learning

Marianna Vaiman, V&R Energy

1 1 Office of Science Lawrence Berkeley National Laboratory Distribution µ PMU Applications Joe Eto Emma Stewart

Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.

Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.

Basic Data Mining Techniques

Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.

1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.

CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Synchrophasor: Implementation,Testing & Operational Experience

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.

CpSc 810: Machine Learning Evaluation of Classifier.

Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.

1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.

APL: Autonomous Passive Localization for Wireless Sensors Deployed in Road Networks IEEE INFOCOM 2008, Phoenix, AZ, USA Jaehoon Jeong, Shuo Guo, Tian He.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

LIGO-G Z The Q Pipeline search for gravitational-wave bursts with LIGO Shourov K. Chatterji for the LIGO Scientific Collaboration APS Meeting.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Experience Report: System Log Analysis for Anomaly Detection

ISO-NE Synchrophasor Related Projects

Evaluating Classifiers

SCE Utility Update JSIS Meeting March, 2015.

Software Testing.

AP Statistics Chapter 14 Section 1.

Data Mining: Concepts and Techniques

The Q Pipeline search for gravitational-wave bursts with LIGO

PMU Emulator for Power System Dynamics Simulators

Computer Vision Lecture 13: Image Segmentation III

Data Mining, Distributed Computing and Event Detection at BPA

Signal processing.

Outlier Processing via L1-Principal Subspaces

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Computer Vision Lecture 12: Image Segmentation II

Vijay Srinivasan Thomas Phan

Data Mining Cluster Analysis: Advanced Concepts and Algorithms

Intro to Machine Learning

Outlier Discovery/Anomaly Detection

Baselining PMU Data to Find Patterns and Anomalies

Mitchell Kossoris, Catelyn Scholl, Zhi Zheng

Peak’s Synchronized Measurements and Advanced Real-time Tools (SMART) Working Group (initiated in Oct-2017) Focus on operationalizing Synchrophasor tools.

Electric Power Group Presents Maximizing Use of Synchrophasor Technology for Everyday Tasks Welcome! The meeting will begin at 2:00 p.m. EDT / 11:00.

Discrete Event Simulation - 4

NASPI PMU Data Application Classification

Ruisheng Diao, Renke Huang, Pavel Etingov, Henry Huang PNNL

iSRD Spam Review Detection with Imbalanced Data Distributions

Classification and Prediction

CSCI N317 Computation for Scientific Applications Unit Weka

Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.

MIS2502: Data Analytics Clustering and Segmentation

Intro to Machine Learning

Ruisheng Diao, Renke Huang, Pavel Etingov, Henry Huang PNNL

NASPI PMU Data Application Classification

Data Mining, Distributed Computing and Event Detection at BPA

Topological Signatures For Fast Mobility Analysis

Data Mining Anomaly Detection

Modeling IDS using hybrid intelligent systems

Machine Learning for Optics Measurements and Corrections

Data Mining Anomaly Detection

Presentation transcript:

SyncAD: Ensemble Based Data Mining Tool for Anomaly Detection In PMU data and Event Detection A. Srivastava, S. Pandey, P. Banerjee, Y. Wu Smart Grid Demonstration and Research Investigation Lab (SGDRIL) Energy System Innovation center, Washington State University Contact: anurag.k.srivastava@wsu.edu JSIS October 2017

Synchrophasor Data Quality Data quality Issues may develop from: Dropouts/packet loss Latency Repeated values Measurement bias Noise in measurement system Bad/missing timestamps Loss of GPS synchronization Incorrect signal meta data Poor server performance Improper device configurations Incorrect phase sequence Missing phase connection Cyber attack How to detect? Anomaly detection PMU Data Quality Conformance testing In lab testing Remote testing after installation Statistical and data mining approaches Cleaning streaming data Cleaning archival data Physics based bad data detection Hybrid or linear state estimation Substation level state estimation 2

Options? Linear regression Chebyshev method find straight line 𝑦=𝛼+𝛽𝑥 to provide a "best" fit for the data points w.r.t least-squares Chebyshev method Determine a lower bound of the percentage of data that exists within k standard deviations from the mean. μ: mean, σ: standard deviation, k: number of standard deviations from the mean. Amidan, Brett G., Thomas A. Ferryman, and Scott K. Cooley. "Data outlier detection using the Chebyshev theorem." Aerospace Conference, 2005 IEEE. IEEE, 2005.

Does standalone method suffice? DBSCAN DBSCAN uses two thresholds radius ε and min. A data point is a center node if it has more than min ε-neighbors (points within distance ε); Two centers are reachable if they are in ε-neighbor of each other; a cluster is a sequence of reachable centers and their ε-neighbors New clusters is formed after the event ends. Points far away from any cluster are outliers. Current During events the data points get out of reach of a cluster i.e. its greater than radius ε of the boundary points. – this is changed. Time Does standalone method suffice?

No Single Winner! Lack of training data Needs tuning effort

Anomaly detection with Ensemble Data X Data Window from PMU/PDC Outlier Scores 1. Base Detectors (online) Learning Data X Regression Chebyshev DBSCAN 3. MLE-Ensemble 2.Normalization of Base Detector Scores fi ,fj ,fk FNormalized D1 D2 D3 Model YMLE (α , β) Inference 4. Inference Algorithm Detection of Transient Window Using Prony Analysis 5. Unflagging Anomalies detected in Transient Window 6. Bad Data Detected

Maximum Likelihood Estimator (MLE) No Single Winner! -> ensemble-based Normalized Scores Needs tuning effort -> learning best integration Lack of training data-> Unsupervised detection FNormalized sensitivity: fraction of “correctly” identified outliers specificity: fraction of “correctly” identified non-outliers MLE-Ensemble Data Set X Compute Sensitivity Ψ and Specificity Ƞ Ψ, Ƞ Learn Weights α and β α , β Final learned weights α , β FNormalized Using EM algorithm fit YMLE

Inference and Anomaly Detection After the MLE-Ensemble step, weights of each base detector is learned which is YMLE . New data set using these weights and the Normalized scores of the base detectors the inference algorithm makes decision on bad data. α ,β (the learned model) Inference and Outlier Detector Normalized Score FNormalized Using YMLE and new Data Set label Outliers Outliers Detected

Performance Metrics for Ensemble Based Technique Given a PMU detector D and PMU data X, denote the actual anomaly data set as 𝐵 𝑇 , and the anomaly reported by D as 𝐵 𝐷 , the performance of D is evaluated using three metrics as follows. Precision Precision measures the fraction of true anomaly data in the reported ones from D, defined as Recall Recall measures the ability of D in finding all outliers, defined as False Positive False positive (FP) evaluates the possibility of false anomaly data detection; the smaller, the better.

Simulation results for SyncAD Tests on the RTDS simulated PMU data (1.5 hours) Recall Precision False positive Linear Regression 0.9021 0.8565 0.1435 DBSCAN 0.8821 0.1179 Chebyshev 0.9154 0.8754 0.1246 MLE ensemble 0.9251 0.8913 0.1087 Tests on the RTDS simulated PMU data (1.5 hours, 5% bad data points, 5%-10% range) Recall Precision False positive Linear Regression 0.7854 0.7655 0.2345 DBSCAN 0.7216 0.7015 0.2985 Chebyshev 0.8125 0.7542 0.2458 MLE ensemble 0.8912 0.9021 0.0979 This Result was presented by Dr. Wu it can be removed from the results on the earlier slide. Tests on the RTDS simulated PMU data (1.5 hours, 10% bad data points, 10%-20% range)

Results with SyncAD using Real PMU Data This Result was presented by Dr. Wu it can be removed from the results on the earlier slide.

Synchrophasor Anomaly Detection (SyncAD) tool

Flowchart for the proposed technique

Architecture of the Event Detection technique X Data Window from SyncAD Get New Data Window V I Fz Cluster Change Detected? DBSCAN Algorithm Computation of Active and Reactive Flows P No Q Yes Events Detected? Collection Of Cluster Change instances in V, I, P, Q and Fz Decision Tree No Level 2 Level 1 Reactive Power Active Power Fault Yes Undetected Events

Simulation Results for Event Detection PMU Data was obtained from RTDS. PMU was placed on Bus-6 (line 6-11), Bus-8 (line 8-load), Bus-9 (line 9-7) , Bus-10 (line 9-10) and Bus 2 (line 2-Gen). Current through lines 6-11, 8-load, 9-7 , 10-9 and 2-Gen were observed. Cap bank was placed on bus 9. Three phase fault was simulated on bus 10. Load change (P,Q) was done on bus 8.

Results on Industry Data Reactive Event (PMU No.) Sno. Time (sec) Active Event (PMU No.) Reactive Event (PMU No.) Fault Event (PMU No.) 1 44 5 2 111 3 385 1,2,3,4,5 4 465 2,5 471 6 477 7 558 8 638 Here It can be seen that the Fault event that was labelled by the western interconnect as frequency events was detected. The algorithm also detected some local Active and Reactive Power events which can be seen in the graphs and can be backed up by the RTDS simulation results.

Summary The MLE-Ensemble technique results in high Recall for Bad data detection. The SyncAD tool was successful in differentiating between the bad data and events resulting in high precision. Event detection algorithm was successful in detecting the simulated events and it worked well on the industry data as well. The DBSCAN and Decision tree based algorithm is fast for real time use. Event Detection Algorithm was applied on labelled event Data. It is possible to determine the exact location and event if the topology is known and PMUs are placed to ensure full observability.

Contact: anurag.k.srivastava@wsu.edu 9/6/2018 Contact: anurag.k.srivastava@wsu.edu Template I-Aqua curve