Danielle Dean (Microsoft), Data Science Lead @danielleodean

Slides:



Advertisements
Similar presentations
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Reprint Outstanding Transactions Report © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Purchase Requisitions - Requester © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Windows 7 Training Microsoft Confidential. Windows ® 7 Compatibility Version Checking.
Feature: Purchase Order Prepayments II © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: OLE Notes Migration Utility
Feature: Web Client Keyboard Shortcuts © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Feature: SmartList Usability Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Print Remaining Documents © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Connect with life Connect with life
Windows Azure Connect Name Title Microsoft Corporation.
NEXT: Overview – Sharing skills & code.
A Windows Azure application runs multiple instances of each role A Windows Azure application behaves correctly when.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Document Attachment –Replace OLE Notes © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

customer.
03 | Word Templates Brian Meier| Senior Lead Program Manager.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Feature: Suggested Item Enhancements – Analysis and Assignment © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and.
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

IoCompleteRequest (Irp);... p = NULL; …f(p);
MIX 09 4/17/2018 4:41 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Examine information management in Cortana Intelligence
Predicting Azure Consumption using Ensemble Learning
Deployment Planning Services
Возможности Excel 2010, о которых следует знать
Dive into Predictive Maintenance using Cortana Intelligence Suite
Title of Presentation 11/22/2018 3:34 PM
Baseline: How Are We Doing Now?
Title of Presentation 12/2/2018 3:48 PM
28 days.
Building SaaS Solutions on Windows Azure
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
4/27/17, Bell #8 What amount of net pay has been earned this period?
SharePoint 2013 Authentication with Azure – Part 1
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
WINDOWS AZURE A LAP AROUND PLATFORM THE Steve Marx
PENSACOLA ENERGY WORK PLAN OCTOBER 10, 2016
Title of Presentation 5/12/ :53 PM
Шитманов Дархан Қаражанұлы Тарих пәнінің
SharePoint 2013 Authentication with Azure – Part 2
Title of Presentation 5/24/2019 1:26 PM
5/24/2019 6:44 PM 1/8/18 Bell #10 In a world governed by the gods, is there any room for human will? Do human choices make a difference? EXPLAIN © 2007.
日本初公開!? Vista の新機能を実演 とっちゃん わんくま同盟 7/23/2019 9:09 AM
Title of Presentation 7/24/2019 8:53 PM
Microsoft Virtual Academy
Microsoft Virtual Academy
What’s New in Visual Studio 2012 for Web Developers
Microsoft Virtual Academy
5/6/19, Bell #6 12/11/2019 8:26 PM Explain the relationship between this picture and the events that took place in Chapter 7 in Animal Farm. © 2007 Microsoft.
Presentation transcript:

Evaluating models for a needle in a haystack: Applications in predictive maintenance Danielle Dean (Microsoft), Data Science Lead @danielleodean Shaheen Gauher (Microsoft), Data Scientist @Shaheen_Gauher

Outline Predictive Maintenance Use Cases Data Science Process Modeling & Evaluation Random Guess, Weighted Guess (by distribution, by threshold), Majority Class Cost factor

Predictive Maintenance Concepts   Predictive Maintenance in IoT Traditional Predicative Maintenance Goal Improve production and/or maintenance efficiency Ensure the reliability of machine operation Data Data stream (time varying features), Multiple data sources Very limited time varying features Scope Component level, System level Parts level Approach Data driven Model driven Tasks Failure prediction, fault/failure detection & diagnosis, maintenance actions recommendation, etc. Essentially any task that improves production/maintenance efficiency Failure prediction (prognosis), fault/failure detection & diagnosis (diagnosis)

Predictive Maintenance Use Cases 5/6/2018 11:10 AM Predictive Maintenance Use Cases Predict the possibility of failure of an asset in the near future so that the assets can be monitored proactively to take action before the failures occur. Aerospace Utilities Manufacturing Transportation & Logistics What is the likelihood of delay due to mechanical issues? When is my solar panel or wind turbine going to fail next? Will the component pass the next stage of testing on factory floor or do I need to rework? Should I replace the brakes in my car now, or can I wait for another month? When is this aircraft component likely to fail next? Which circuit breakers in my system are likely to fail in the next month? What is the root cause of the test failure? What maintenance task should I perform on my elevator? © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. Is the ATM going to dispense the next 5 notes without failing?

Predictive Maintenance – Qantas Airways ~24,000 sensors 50% /year per A380 Technical Delays Technical Delays have potential for predictive modelling 12 Fault/warning messages/day Qantas A380 Fleet 400-700 How to identify the right messages to focus limited resources and reduce costly downtime? Sample Existing Predictive Maintenance Journey Develop ML model (MATLAB) alongside local university Optimise code Reduce runtime Develop user web front end Build evaluation module Refine model parameters Years Microsoft Azure ML Predictive Maintenance Journey Configure model in AML PM template Evaluate & refine model data & parameters Visualize results in Power BI Months Orchestrate data pipeline in Azure Data Factory

Aka.ms/StrataPDM

Ready for ML approach? Question is sharp. 5/6/2018 11:10 AM Ready for ML approach? The better the raw materials, the better the product. Question is sharp. Data measures what they care about. Data is accurate. Data is connected. A lot of data. © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. E.g. Predict whether component X will fail in the next Y days; clear path of action with answer E.g. Identifiers at the level they are predicting E.g. Failures are really failures, human labels on root causes; domain knowledge translated into process E.g. Machine information linkable to usage information E.g. Will be difficult to predict failure accurately with few examples

Data Sources FAILURE HISTORY REPAIR HISTORY MACHINE CONDITIONS 5/6/2018 11:10 AM Data Sources FAILURE HISTORY REPAIR HISTORY MACHINE CONDITIONS The failure history of a machine or component within the machine. The repair history of a machine, e.g. previous maintenance records, components replaced, maintenance activities performed. The operating characteristics of a machine, e.g. data collected from sensors. MACHINE FEATURES OPERATING CONDITIONS OPERATOR ATTRIBUTES The features of machine or components, e.g. production date, technical specifications. Environmental features that may influence a machine’s performance, e.g. location, temperature, other interactions. The attributes of the operator who uses the machine, e.g. driver. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Feature Engineering Better features are better than better algorithms… 5/6/2018 11:10 AM Feature Engineering Better features are better than better algorithms… Rolling Aggregates Tumbling Aggregates Static Features © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. E.g. Mean, Min, Max over the last 3 hours E.g. Mean, Min, Max for every hour in the last 3 hours E.g. Years in service, model

BINARY CLASSIFICATION MULTICLASS CLASSIFICATION Modelling Techniques BINARY CLASSIFICATION REGRESSION Predict failures within a future period of time E.g. Is the engine going to fail in the next 24 hours? Predict remaining useful life, the amount of time before the next failure E.g. How long will an aircraft engine last before it fails? MULTICLASS CLASSIFICATION ANOMALY DETECTION Predict failures with their causes within a future time period. Predict remaining useful life within ranges of future periods Identify change in normal trends to find anomalies

Tips & Tricks How to split into training and validation sets 5/6/2018 11:10 AM Tips & Tricks How to split into training and validation sets Be careful of “leakage” Best practice to consider : time based split etc. Imbalanced Data Cost-sensitive learning Sampling methodologies Report appropriate metrics © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How do you know if the model is good? 99% accuracy? Is 0.2 recall good?

Establishing Baseline Metrics for Classification Models Building trivial classifiers – 4 different ways Random Guess Majority Class Weighted Guess by distribution Weighted Guess by decision threshold Compute Metrics – Accuracy, Precision, Recall Examples

Model Evaluation 5/6/2018 11:10 AM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Ground Truth 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑖𝑧𝑒 =𝑛 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑖𝑧𝑒 =𝑛 𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =𝑥 𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 = 1−𝑥 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃 =𝑥𝑛 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑠 𝑙𝑎𝑏𝑒𝑙𝑙𝑒𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑁 = 1−𝑥 𝑛 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦= 𝑇𝑃+𝑇𝑁 𝑛 𝑅𝑒𝑐𝑎𝑙𝑙= 𝑇𝑃 𝑇𝑃+𝐹𝑁 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛= 𝑇𝑃 𝑇𝑃+𝐹𝑃 𝑇𝑃𝑅= 𝑇𝑃 𝑃 , 𝐹𝑃𝑅= 𝐹𝑃 𝑁 For derivation please refer to the paper here.

Three steps from baseline.. Step 1 : Express assumptions mathematically Step 2: Create a Confusion Matrix Step 3: Compute Baseline Metrics

Machine Learning, Analytics, & Data Science Conference Random Guess 5/6/2018 11:10 AM randomly assign half of the labels to P and the other half as N 𝑇𝑃+𝐹𝑃= 𝑛 2 𝑇𝑁+𝐹𝑁= 𝑛 2 𝑇𝑃=𝐹𝑁, 𝐹𝑃=𝑇𝑁 𝑻𝑷, 𝑻𝑵, 𝑭𝑷,𝑭𝑵 −𝑪𝒐𝒏𝒇𝒖𝒔𝒊𝒐𝒏 𝑴𝒂𝒕𝒓𝒊𝒙 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑒𝑐𝑎𝑙𝑙𝑖 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 (𝑇𝑃𝑅 , 𝐹𝑃𝑅) Binary Classification 0.5 𝑥 (0.5 , 0.5) Multiclass Classification 1 𝑘 𝑥 𝑖 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 𝑥 𝑖 is fraction of instances belonging to class 𝑖 (𝑖 = 1 to 𝑘) 𝑘 = No. of classes

Machine Learning, Analytics, & Data Science Conference 5/6/2018 11:10 AM Majority Class assigning all the labels to negative assuming that is the majority class. 𝑇𝑁+𝐹𝑁=𝑛 𝑇𝑃=0 𝐹𝑃=0 𝑻𝑷, 𝑻𝑵, 𝑭𝑷,𝑭𝑵 −𝑪𝒐𝒏𝒇𝒖𝒔𝒊𝒐𝒏 𝑴𝒂𝒕𝒓𝒊𝒙 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑒𝑐𝑎𝑙𝑙𝑖 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 (𝑇𝑃𝑅 , 𝐹𝑃𝑅) Binary Classification (1−𝑥) 𝑁𝑎𝑁 (0,0) Multiclass Classification 1−𝑥 𝑖=𝑚 1𝑖𝑚 0𝑖𝑚 𝑥 𝑖=𝑚 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 𝑥 𝑖=𝑚 is fraction of instances belonging to majority class (negative class is majority class here) 𝑥 𝑖 is fraction of instances belonging to class 𝑖 (𝑖 = 1 to 𝑘) 𝑘 = No. of classes

Weighted Guess by Distribution Machine Learning, Analytics, & Data Science Conference Weighted Guess by Distribution 5/6/2018 11:10 AM x% of actual positives are assigned as positive by this model and (1-x) % of actual negatives are assigned as negatives 𝑇𝑃=𝑥𝑃 𝑇𝑁= 1−𝑥 𝑁 𝑻𝑷, 𝑻𝑵, 𝑭𝑷,𝑭𝑵 −𝑪𝒐𝒏𝒇𝒖𝒔𝒊𝒐𝒏 𝑴𝒂𝒕𝒓𝒊𝒙 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑒𝑐𝑎𝑙𝑙𝑖 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 (𝑇𝑃𝑅 , 𝐹𝑃𝑅) Binary Classification 𝑥 2 + (1−𝑥) 2 𝑥 (𝑥,𝑥) Multiclass Classification 𝑖=1 𝑘 𝑥 𝑖 2 𝑥 𝑖 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 𝑥 𝑖 is fraction of instances belonging to class 𝑖 (𝑖 = 1 to 𝑘) 𝑘 = No. of classes

Weighted Guess by Threshold Machine Learning, Analytics, & Data Science Conference 5/6/2018 11:10 AM (1-t)% of actual positives (P) are assigned as positive by this model and t% of actual negatives (N) are assigned as negatives 𝑇𝑃=(1−𝑡)𝑃 𝑇𝑁=𝑡𝑁 𝑻𝑷, 𝑻𝑵, 𝑭𝑷,𝑭𝑵 −𝑪𝒐𝒏𝒇𝒖𝒔𝒊𝒐𝒏 𝑴𝒂𝒕𝒓𝒊𝒙 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅𝑒𝑐𝑎𝑙𝑙𝑖 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑖 (𝑇𝑃𝑅 , 𝐹𝑃𝑅) Binary Classification 𝑃+𝑡(𝑁−𝑃) 𝑛 (1−𝑡) 𝑃 𝑛 (1−𝑡,1−𝑡) © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Predict machine failing … Red = Actual failure points (label = 1) Blue = Non-failure points (label = 0) Threshold of 0.23 Window of 5 days for labeling Example: Should be predicted to be failure, but given 0.23 threshold, predicted as 0 and thus false negative Failure point

Machine Learning, Analytics, & Data Science Conference Custom R module 5/6/2018 11:10 AM https://gallery.cortanaintelligence.com/Experiment/Baseline-Metrics-for-Binary-Classifier-1 © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning, Analytics, & Data Science Conference Custom R module Machine Learning, Analytics, & Data Science Conference 5/6/2018 11:10 AM https://gallery.cortanaintelligence.com/Experiment/Baseline-Metrics-for-Binary-Classifier-1 Full code in Github as well © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

The Cost Factor Classification – Cost sensitive methods Classification – Cost in-sensitive methods, manipulate the predictions FP Error Rate = FP/n FN Error Rate = FN/n Error Rate = (FP + FN) / n FP Cost = FP Error Rate * $ cost of FP FN Cost = FN Error rate * $ cost of FN Cost = FP Cost + FN Cost

Predictive Maintenance Scenarios Manufacturing Line less than 1% of failure data available tolerant of false positive even at the expense of false negatives tuned the model parameters for high AUC chose a high recall ROC operation point - caught failures 75% of the time Wind farm only 1% of data constituting failure tolerant of a false negative over a false positive tuned the model parameters for high F1 score. F1 score = 0.07 ! Still 3 X what a random model would have produced (0.02)

Predict machine failing … Red = Actual failure points (label = 1) Blue = Non-failure points (label = 0) Threshold of 0.23 Window of 5 days for labeling Example: Should be predicted to be failure, but given 0.23 threshold, predicted as 0 and thus false negative Failure point

Think about cost/benefit ratio and remember than “99% accuracy” means nothing without context!

Learn and try yourself! Learn from Cortana Intelligence Gallery 5/6/2018 Learn and try yourself! Learn from Cortana Intelligence Gallery Solution package material – deploy by hand to learn here Try Cortana Intelligence Solution Template – Predictive Maintenance for Aerospace Try Azure IOT pre-configured solution for Predictive Maintenance Read the Predictive Maintenance Playbook for more details on how to approach these problems Run the Modelling Guide R Notebook for a DS walk-through Baseline Metrics overview Performance Metrics overview For more details on metrics: Blogs and paper http://blog.revolutionanalytics.com/2016/03/classification-models.html http://blog.revolutionanalytics.com/2016/03/com_class_eval_metrics_r.html https://github.com/shaheeng/ClassificationModelEvaluation/blob/master/Baseline%20Metrics_Shaheen_article.pdf © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.