Data Mining, Distributed Computing and Event Detection at BPA Tony Faris JSIS Meeting October, 2017
Traditional Data Mining Open individual files in chronological order Parse, process, compute on one file at a time Works well for batch processes of short duration Post-event analysis, quasi-real-time computation Long-term analytics unrealistic Database extraction can be extremely slow PMU data is embarrassingly parallel
Distributed Storage at BPA Hadoop file structure Hierarchical data format, version 5 (HDF5) Generic time-series information – support for unlimited data types Can store PMU, SCADA, Oscillography, weather, etc. in same archive with same format Maintain one-minute file duration Built-in lossless compression 20-25% on BPA PMU data
Distributed Computing at BPA Process multiple files in parallel on cluster Single “master” node with backup (secondary), multiple “worker” nodes Apache Spark computing platform on Linux OS Open source, community of users Initial data mining software written in Python, inherently supported by Spark
Current Implementation 12 compute nodes 9.6 TB SSDs per node (115 TB total) 10 Gbps local area network
Results
Data Mining Next Steps Integrate non-PMU data sets into HDF5 DFR, SCADA, weather – eliminate silos Three-year angle baselining with weather Sliding window algorithms Frequency event detection Integrate distributed MATLAB with Spark Transition full .pdat archive to distributed environment
Event Detection
Event Detection Develop platform for performing event detection Frequency event detection as proof of concept Modular software in MATLAB, adaptable for new algorithm development Access to multiple Synchrophasor data sets Internal BPA PMUs (redundant pairs) and WECC partner PMUs Compare results to algorithms running in operational environment Goal: capture more events, improve performance in operations Flexibility for some false positives in development, iterative refining of parameters
Frequency Event Detection Step 1: Identify periods of interest Compute maximum ROCOF per minute, per signal If ROCOF > threshold, pull data during period of interest Step 2: Run event detection algorithm Calculate 30-second running average for each PMU Compare “current value” with average If difference > threshold, count = count+1 If count > threshold (minimum number of PMUs to detect event), flag as event Step 3: Retrieve data for permanent storage Pull .pdat files around event (e.g., 5 minutes before, 10 minutes after) and store in separate archive for post-event analysis
Frequency Event Detection
Frequency Event Detection
Frequency Event Detection
Frequency Calculation PMU-reported frequency Two-point derivative of phase angle 9-point linear regression of phase angle (MATLAB) anglet-4 : anglet+4 Frqcalculated = Slope of regression line 11-point linear regression of phase angle (MATLAB) anglet-5 : anglet+5 Apply wrapping/unwrapping as necessary
Frequency Event Example
Frequency Event Example
Frequency Event Example
Event Detection Next Steps Experiment with parameter changes (window lengths, number of positive samples, thresholds, etc.) Iterative tuning, settle on final parameters Expand algorithms to other measurements Voltage, phase angle, etc. Combine PMU measurements with other data Digital Fault Recordings, SCADA analogs and digitals, weather, etc. Automated post-processing of events, and correlation of similar event types
Contact Tony Faris Bonneville Power Administration Measurement Systems ajfaris@bpa.gov