IoT and ML for Predictive Maintenance By : Mukul Joshi & Mitra Daram Nitor Infotech Pvt. Ltd. Global Summit 2018
Evolution of Maintenance “Change is inevitable – except from a vending machine” – Robert Gallagher Since the invention of first machine, we had to deal with maintenance and repair. Thankfully we have evolved with the approach. Reactive Maintenance Use of Vibration analysis Infrared imaging Oil analysis etc. To do predictive maintenance Fix it when it breaks Preventive Maintenance Using IoT to gather data Using ML to understand behavior/patterns ML models are used for predictive maintenance Not to wait till things break. Periodic checks. Creates Unnecessary work Labor intensive Issues arise between schedules Predictive Maintenance IoT based Predictive Maintenance
Business Problems in predictive maintenance Detect anomalies in equipment or system performance or functionality. Predict whether an asset may fail in the near future. Estimate the remaining useful life of an asset. Identify the main causes of failure of an asset. Identify what maintenance actions need to be done, by when, on an asset
Qualification of problem Target or an outcome to predict. Clear path of action to prevent failures when detected. Problem has to be predictive in nature. Record of the operational history of the equipment with both good & bad outcomes… Error reports, maintenance logs, repair logs etc. should be available. The recorded history should be reflected in relevant data - sufficient enough quality to support the use case. Finally, the business should have domain experts who have a clear understanding of the problem.
Sample use cases Aviation : Flight delay and cancellation, Engine part failure Finance : ATM failure Energy : Wind turbine failure, circuit breaker failure Transportation and logistics: wheel failure, subway train door failure
Five steps for predictive maintenance solution 5 Learn & Act Smart 1 Identify Assets 2 Gather Data 3 Design Predictive Model 4 Manage Work
Data Preparation Data should be Relevant, Sufficient & Quality Relevant data sources are Failure history Maintenance/repair history Machine operating conditions Equipment metadata There are two major types of these Temporal data Static data
Static vs. Temporal treatment Visualize the data as table of records with row as training instance. The columns as “predictor” features. Final column as “target” Maintenance records: Asset identifier, maintenance action, time etc. Failure records: Failure reasons as error codes. Correlation between conditions and codes. Machine & Operation metadata: Merged together to associate with assets. Missing value handling and normalization Temporal data is divided into units. Time unit for asset offers distinct information.
Feature engineering Tumbling Aggregates Rolling Aggregates Feature is a “predictive” attribute such as temperature, pressure, vibration etc. To predict the future, decide how much can be looked back. (Lag) Rolling Aggregates Tumbling Aggregates
Examples Flight delay: count of error codes over the last day/week. Aircraft engine part failure: rolling means, standard deviation, and sum over the past day, week etc. This metric should be determined along with the business domain expert. ATM failures: rolling means, median, range, standard deviations, count of outliers beyond three standard deviations Subway train door failures: Count of events over past day, week, two weeks etc. Circuit breaker failures: Failure counts over past week, year, three years etc.
Modeling : Binary Classification Probability that piece of equipment fails within future time period. (Future horizon period X) About to fail as label 1 (for period)
Examples Flight delays: X may be chosen as 1 day, to predict delays in the next 24 hours. Then all flights that are within 24 hours before failures are labeled as 1. ATM cash dispense failures: A goal may be to determine failure probability of a transaction in the next one hour. In that case, all transactions that happened within the past hour of the failure are labeled as 1. To predict failure probability over the next N currency notes dispensed, all notes dispensed within the last N notes of a failure are labeled as 1. Train door failures: X may be chosen as two days. Wind turbine failures: X may be chosen as two months.
Regression Compute Remaining Useful Life (RUL) Assets that have not failed can’t be used in modelling Survival Analysis
Multi class classification Predict two outcomes Range of time to failure Likelihood of failure What is the probability that asset will fail in next nZ units of time?
Multi class classification What is the probability that the asset will fail in next X unit due to root cause/problem Pi?
Training, Validation and Testing Failures are rare Under-representation in training data Cost sensitive learning : High cost to mis-classification of minority class. Sampling techniques
Transformer Monitoring Nitor case study Transformer Monitoring
Transformer Monitoring Business Situation 18,000+ Transformers More than 250 Models / Variants Located Across the Globe (Different Geo Locations with different conditions) An Average of 10+ sensors capturing various readings (Oil temperature, Oil Volume, Load etc.) Challenges Data Generated from the transformers are in different frequencies Collection of Historic Data including Events (Failure, Breakdown, Down time etc.) and Data Quality Domain Expertise Integration with External Data and Macro Environmental Factors Data Storage, Processing and Archival Mechanism Validating the Model
Solution Approach Test Models Training Data Device 1 Device 2 Step 1: Model Creation & Training with Historic Dataset Azure Batch AI 1 2 Test Models Training Data ML Models Job1 Job2 Job3 Anomaly detection & Failure Prediction algorithms Step 2: Production Stage Save model per device Device 1 Real time Data Ingestion Device 2 Time series data OR Aggregated data at time interval Azure Batch AI Azure IoT Hub Device 3 Stream Analytics Device 4 Other Related External Data Sources Azure SQL Visualization Layer Data Collection Device n React Predict Should be available over internet to connect
Possible Impact & Benefits Business Impact Anticipating at least 30% reduction in Maintenance Costs Transformers Productivity in terms of uptime / availability can go up by 10% Number of Breakdowns / Total Failures can be avoided in at least 50% of the cases Optimization of Spare Parts Inventory Reduce Fuel Costs and Extend Life Worker Safety can be improved Maintenance Actions can be triggered based on the threat level rather than ‘Routine Checks’
Transformer Performance Dashboard
Thank you Mukul Joshi & Mitra Daram Nitor Infotech Pvt. Ltd.