Data Mining, Distributed Computing and Event Detection at BPA

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Marianna Vaiman, V&R Energy
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Advanced Phasor Measurement Units for the Real-Time Monitoring
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
Dream Report: Secure and Reliable Reporting Renee Sikes Applications Engineer Dream Report Brand Manager.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
The Online World ONLINE DOCUMENTS. Online documents Online documents (such as text documents, spreadsheets, presentations, graphics and forms) are any.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Predictive Analytics derived from HVAC and PMU data at UCSD Chuck Wells Industry Principal OSIsoft, LLC 1.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
What is a database? (a supplement, not a substitute for Chapter 1…) some slides copied/modified from text Collection of Data? Data vs. information Example:
A Tutorial on Hadoop Cloud Computing : Future Trends.
James A. Senn’s Information Technology, 3rd Edition
Compute and Storage For the Farm at Jlab
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
2017 WECC JSIS Report March 21, 2017.
Big Data Analytics and HPC Platforms
Big Data is a Big Deal!.
Big Data Enterprise Patterns
PROTECT | OPTIMIZE | TRANSFORM
Data Transformation: Normalization
Hadoop Aakash Kag What Why How 1.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Introduction to Distributed Platforms
Big Data A Quick Review on Analytical Tools
Distributed Network Traffic Feature Extraction for a Real-time IDS
PRISM: PROCESSING AND REVIEW INTERFACE FOR STRONG MOTION DATA SOFTWARE
Spark Presentation.
A. Srivastava, S. Pandey, P. Banerjee, Y. Wu
Database Database is a large collection of related data that can be stored, generally describes activities of an organization. An organised collection.
Computing Infrastructure for DAQ, DM and SC
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
MapReduce Simplied Data Processing on Large Clusters
Baselining PMU Data to Find Patterns and Anomalies
湖南大学-信息科学与工程学院-计算机与科学系
Peak’s Synchronized Measurements and Advanced Real-time Tools (SMART) Working Group (initiated in Oct-2017) Focus on operationalizing Synchrophasor tools.
CS110: Discussion about Spark
Introduction to Teradata
Coherence-based Oscillation Detection
TIM TAYLOR AND JOSH NEEDHAM
Tony Faris JSIS Meeting October 2017
2017 WECC JSIS Report March 21, 2017.
WISP Follow on Reporting.
Coherence-based Oscillation Detection
Data Mining, Distributed Computing and Event Detection at BPA
Introducing Citilabs’ Scenario Based Master Network Data Model
MapReduce: Simplified Data Processing on Large Clusters
Lecture 29: Distributed Systems
Convergence of Big Data and Extreme Computing
UpgradeX and CloudSuite
Presentation transcript:

Data Mining, Distributed Computing and Event Detection at BPA Tony Faris JSIS Meeting October, 2017

Traditional Data Mining Open individual files in chronological order Parse, process, compute on one file at a time Works well for batch processes of short duration Post-event analysis, quasi-real-time computation Long-term analytics unrealistic Database extraction can be extremely slow PMU data is embarrassingly parallel

Distributed Storage at BPA Hadoop file structure Hierarchical data format, version 5 (HDF5) Generic time-series information – support for unlimited data types Can store PMU, SCADA, Oscillography, weather, etc. in same archive with same format Maintain one-minute file duration Built-in lossless compression 20-25% on BPA PMU data

Distributed Computing at BPA Process multiple files in parallel on cluster Single “master” node with backup (secondary), multiple “worker” nodes Apache Spark computing platform on Linux OS Open source, community of users Initial data mining software written in Python, inherently supported by Spark

Current Implementation 12 compute nodes 9.6 TB SSDs per node (115 TB total) 10 Gbps local area network

Results

Data Mining Next Steps Integrate non-PMU data sets into HDF5 DFR, SCADA, weather – eliminate silos Three-year angle baselining with weather Sliding window algorithms Frequency event detection Integrate distributed MATLAB with Spark Transition full .pdat archive to distributed environment

Event Detection

Event Detection Develop platform for performing event detection Frequency event detection as proof of concept Modular software in MATLAB, adaptable for new algorithm development Access to multiple Synchrophasor data sets Internal BPA PMUs (redundant pairs) and WECC partner PMUs Compare results to algorithms running in operational environment Goal: capture more events, improve performance in operations Flexibility for some false positives in development, iterative refining of parameters

Frequency Event Detection Step 1: Identify periods of interest Compute maximum ROCOF per minute, per signal If ROCOF > threshold, pull data during period of interest Step 2: Run event detection algorithm Calculate 30-second running average for each PMU Compare “current value” with average If difference > threshold, count = count+1 If count > threshold (minimum number of PMUs to detect event), flag as event Step 3: Retrieve data for permanent storage Pull .pdat files around event (e.g., 5 minutes before, 10 minutes after) and store in separate archive for post-event analysis

Frequency Event Detection

Frequency Event Detection

Frequency Event Detection

Frequency Calculation PMU-reported frequency Two-point derivative of phase angle 9-point linear regression of phase angle (MATLAB) anglet-4 : anglet+4 Frqcalculated = Slope of regression line 11-point linear regression of phase angle (MATLAB) anglet-5 : anglet+5 Apply wrapping/unwrapping as necessary

Frequency Event Example

Frequency Event Example

Frequency Event Example

Event Detection Next Steps Experiment with parameter changes (window lengths, number of positive samples, thresholds, etc.) Iterative tuning, settle on final parameters Expand algorithms to other measurements Voltage, phase angle, etc. Combine PMU measurements with other data Digital Fault Recordings, SCADA analogs and digitals, weather, etc. Automated post-processing of events, and correlation of similar event types

Contact Tony Faris Bonneville Power Administration Measurement Systems ajfaris@bpa.gov