Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

On-line learning and Boosting
D ON ’ T G ET K ICKED – M ACHINE L EARNING P REDICTIONS FOR C AR B UYING Albert Ho, Robert Romano, Xin Alice Wu – Department of Mechanical Engineering,
Imbalanced data David Kauchak CS 451 – Fall 2013.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Online learning, minimizing regret, and combining expert advice
Boosting Approach to ML
Ranking Electrical Feeders of the New York City Power Grid Phil Gross Ansaf Salleb-Abouissi Haimonti Dutta Albert Boulanger Problem Primary electricity.
Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Reduced Support Vector Machine
Packet Score: Statistics-based Overload Control against Distributed Denial-of- service Attacks: Yoohwan Kim,Wing Cheong Lau,Mooi Choo Chauh, H. Jonathan.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Online Learning Algorithms
Software Evolution Planning CIS 376 Bruce R. Maxim UM-Dearborn.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Data Mining Techniques
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Winrunner Usage - Best Practices S.A.Christopher.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Chapter 9 – Classification and Regression Trees
Bug Localization with Machine Learning Techniques Wujie Zheng
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Benk Erika Kelemen Zsolt
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
NTU & MSRA Ming-Feng Tsai
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Transformation: Normalization
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Data Mining Practical Machine Learning Tools and Techniques
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Chapter 2: Evaluative Feedback
Intro to Machine Learning
Model generalization Brief summary of methods
Retrieval Performance Evaluation - Measures
Modeling IDS using hybrid intelligent systems
Chapter 2: Evaluative Feedback
Presentation transcript:

Predicting Electricity Distribution Feeder Failures using Machine Learning Marta Arias 1, Hila Becker 1,2 1 Center for Computational Learning Systems 2 Computer Science Columbia University LEARNING ‘06

Overview of the Talk Introduction to the Electricity Distribution Network of New York City  What are we doing and why? Early solution using MartiRank, a boosting- like algorithm for ranking Current solution using Online learning Related projects

Overview of the Talk Introduction to the Electricity Distribution Network of New York City  What are we doing and why? Early solution using MartiRank, a boosting- like algorithm for ranking Current solution using Online learning Related projects

The Electrical System 1. Generation2. Transmission 3. Primary Distribution 4. Secondary Distribution

Electricity Distribution: Feeders

Problem Distribution feeder failures result in automatic feeder shutdown  called “Open Autos” or O/As O/As stress networks, control centers, and field crews O/As are expensive ($ millions annually) Proactive replacement is much cheaper and safer than reactive repair

Our Solution: Machine Learning Leverage Con Edison’s domain knowledge and resources Learn to rank feeders based on susceptibility to failure How?  Assemble data  Train model based on past data  Re-rank frequently using model on current data

New York City

Some facts about feeders and failures About 950 feeders:  568 in Manhattan  164 in Brooklyn  115 in Queens  94 in the Bronx

Some facts about feeders and failures About 60% of feeders failed at least once On average, feeders failed 4.4 times (between June 2005 and August 2006)

Some facts about feeders and failures mostly 0-5 failures per day more in the summer strong seasonality effects

Feeder data Static data  Compositional/structural  Electrical Dynamic data  Outage history (updated daily)  Load measurements (updated every 5 minutes) Roughly 200 attributes for each feeder  New ones are still being added.

Feeder Ranking Application Goal: rank feeders according to likelihood to failure (if high risk place near the top) Application needs to integrate all types of data Application needs to react and adapt to incoming dynamic data  Hence, update feeder ranking every 15 min.

Application Structure Static data SQL Server DB ML Engine ML Models Rankings Decision Support GUI Action Driver Action Tracker Decision Support App Outage data Xfmr Stress data Feeder Load data

Goal: rank feeders according to likelihood to failure

Overview of the Talk Introduction to the Electricity Distribution Network of New York City  What are we doing and why? Early solution using MartiRank, a boosting-like algorithm for ranking  Pseudo ROC and pseudo AUC  MartiRank  Performance metric  Early results Current solution using Online learning Related projects

(pseudo) ROC sorted by score outagesfeeders

(pseudo) ROC Number of feeders Number of outages

Fraction of outages (pseudo) ROC 1 1 Area under the ROC curve Fraction of feeders

Some observations about the (p)ROC Adapted to positive labels (not just 0/1) Best pAUC is not always 1 (actually it almost never is..) E.g.: pAUC = 11/15 = 0.73  “Best” pAUC with this data is 14/15 = 0.93 corresponding to ranking ranking outages

MartiRank Boosting-like algorithm by [Long & Servedio, 2005] Greedy, maximizes pAUC at each round Adapted to ranking Weak learners are sorting rules  Each attribute is a sorting rule  Attributes are numerical only If categorical, then convert to indicator vector of 0/1

MartiRank feeder list begins in random order sort list by “best” variable divide list in two: split outages evenly divide list in three: split outages evenly choose separate “best” variables for each part, sort choose separate “best” variables for each part, sort continue…

MartiRank Advantages:  Fast, easy to implement  Interpretable  Only 1 tuning parameter “nr of rounds” Disadvantages:  1 tuning parameter “nr of rounds” Was set to 4 manually..

Using MartiRank for real-time ranking of feeders MartiRank is a “batch” algorithm, hence must deal with changing system by:  Continually generate new datasets with latest data Use data within a window, aggregate dynamic data within that period in various ways (quantiles, counts, sums, averages, etc.)  Re-train new model, throw out old model Seasonality effects not taken into account  Use newest model to generate ranking Must implement “training strategies”  Re-train daily, or weekly, or every 2 weeks, or monthly, or…

Performance Metric Normalized average rank of failed feeders Closely related to (pseudo) Area-Under-ROC-Curve when labels are 0/1:  avgRank = pAUC + 1 / #examples Essentially, difference comes from 0-based pAUC to 1-based ranks

Performance Metric Example rankingoutages pAUC=17/24=0.7

How to measure performance over time Every ~15 minutes, generate new ranking based on current model and latest data Whenever there is a failure, look up its rank in the latest ranking before the failure After a whole day, compute normalized average rank

MartiRank Comparison: training every 2 weeks

Using MartiRank for real-time ranking of feeders MartiRank seems to work well, but..  User decides when to re-train  User decides how much data to use for re-training  …. and other things like setting parameters, selecting algorithms, etc. Want to make system 100% automatic!  Idea: Still use MartiRank since it works well with this data, but keep/re-use all models

Overview of the Talk Introduction to the Electricity Distribution Network of New York City  What are we doing and why? Early solution using MartiRank, a boosting-like algorithm for ranking Current solution using Online learning  Overview of learning from expert advice and the Weighted Majority Algorithm  New challenges in our setting and our solution  Results Related projects

Learning from expert advice Consider each model as an expert Each expert has associated weight (or score)  Reward/penalize experts with good/bad predictions  Weight is a measure of confidence in expert’s prediction Predict using weighted average of top- scoring experts

Learning from expert advice Advantages  Fully automatic No human intervention needed  Adaptive Changes in system are learned as it runs  Can use many types of underlying learning algorithms  Good performance guarantees from learning theory: performance never too far off from best expert in hindsight Disadvantages  Computational cost: need to track many models “in parallel”  Models are harder to interpret

Weighted Majority Algorithm [Littlestone & Warmuth ‘88] Introduced for binary classification  Experts make predictions in [0,1]  Obtain losses in [0,1] Pseudocode:  Learning rate as main parameter, ß in (0,1]  There are N “experts”, initially weight is 1 for all  For t=1,2,3, … Predict using weighted average of each experts’ prediction Obtain “true” label; each expert incurs loss l i Update experts’ weights using w i,t+1 = w i,t pow(ß,l i )

In our case, can’t use WM directly Use ranking as opposed to binary classification More importantly, do not have a fixed set of experts

Dealing with ranking vs. binary classification Ranking loss as normalized average rank of failures as seen before, loss in [0,1] To combine rankings, use a weighted average of feeders’ ranks

Dealing with a moving set of experts Introduce new parameters  B: “budget” (max number of models) set to 100  p: new models weight percentile in [0,100]   : age penalty in (0,1] When training new models, add to set of models with weight corresponding to p th percentile (among current weights) If too many models (more than B), drop models with poor q-score, where  q i = w i pow( , age i )  I.e.,  is rate of exponential decay

Other parameters How often do we train and add new models?  Hand-tuned over the course of the summer Every 7 days  Seems to achieve balance of generating new models to adapt to changing conditions without overflowing system  Alternatively, one could train when observed performance drops.. not used yet How much data do we use to train models?  Based on observed performance and early experiments 1 week worth of data, and 2 weeks worth of data

Performance

Failures’ rank distribution

Daily average rank of failures

Other things that I have not talked about but took a significant amount of time DATA  Data is spread over many repositories. Difficult to identify useful data Difficult to arrange access to data  Volume of data. Gigabytes of data accumulated on a daily basis. Required optimized database layout and the addition of a preprocessing stage  Had to gain understanding of data semantics Software Engineering (this is a deployed application)

Current Status Summer 2006: System has has been debugged, fine-tuned, tested and deployed Now fully operational Ready to be used next summer (in test mode) After this summer, we’re going to do systematic studies of  Parameter sensitivity  Comparisons to other approaches

Related work-in-progress Online learning:  Fancier weight updates with better guaranteed performance in “changing environments”  Explore “direct” online ranking strategies (e.g. the ranking perceptron) Datamining project:  Aims to exploit seasonality  Learn “mapping” from environmental conditions to good performing experts’ characteristics When same conditions arise in the future, increase weights of experts that have those characteristics Hope to learn it as system runs, continually updating mappings MartiRank:  In presence of repeated/missing values, sorting is non-deterministic and pAUC takes different values depending on permutation of data  Use statistics of the pAUC to improve basic learning algorithm Instead of input nr of rounds, stop when AUC increase is not significant Use better estimators of pAUC that are not sensitive to permutations of the data

Other related projects within collaboration with Con Edison Finer-grained component analysis  Ranking of transformers  Ranking of cable sections  Ranking of cable joints  Merging of all systems into one Mixing ML and Survival Analysis

Acknowledgments Con Edison:  Matthew Koenig  Mark Mastrocinque  William Fairechio  John A. Johnson  Serena Lee  Charles Lawson  Frank Doherty  Arthur Kressner  Matt Sniffen  Elie Chebli  George Murray  Bill McGarrigle  Van Nest team Columbia:  CCLS: Wei Chu Martin Jansche Ansaf Salleb Albert Boulanger David Waltz Philip M. Long (now at Google) Roger Anderson  Computer Science: Philip Gross Rocco Servedio Gail Kaiser Samit Jain John Ioannidis Sergey Sigelman Luis Alonso Joey Fortuna Chris Murphy  Stats: Samantha Cook