Download presentation
Presentation is loading. Please wait.
Published byJerome Hubbard Modified over 5 years ago
1
Predicting Overflow: Using Machine Learning to Optimize Latrine Servicing
Nick Turman-Bryant PhD Candidate, Systems Science Portland State University JSM 2018 Vancouver, BC
2
Collection Strategies
Three Strategies: Daily Collection (BAU) Demand Driven Collection (DDC) Fixed Schedule: Sanergy’s current strategy with weekly schedules Predictive Collection Dynamic Schedule: Sensor- enabled strategy with daily schedules Two Conflicting Goals: Minimize overflow events Maximize collection efficiency
3
How full is the latrine based on all available data?
Weights vs. Skips First Question: How full is the latrine based on all available data? Second Question: What is the probability of an overflow event based on all available data? Needed a different model that could account for frequency of latrine use, variations in time, and the most recent sensor activity We realized that there were really two questions related to predicting a skip. The figure on the right provides an illustration of this challenge. Each of the dots represents the actual solid waste fill level for each day for one of the medium use latrines. The color of the dot indicates whether or not the latrine would have experienced an overflow event if a collection was skipped, where an orange dot indicates an overflow event and a green dot indicates no overflow event. As you can see, even if we know the fill level of the latrines exactly, it is difficult to determine the threshold that optimizes when latrines can and cannot be skipped. There are some instances when the latrine could have been skipped and the capacity was well over 50%. There are also instances when the latrine was under 50% capacity but would have experienced an overflow even due to higher than normal usage the next day. As a result, we realized that we needed two predictors: one for the weights and one for the probability of an overflow event.
4
Features Used in the Sanergy Prediction Models
Relative importance is determined by the incremental node purity in the random Forest model. Without going too much into the weeds, basically random forest can measure the level of uncertainty before and after each feature is integrate into a particular tree. So the incremental node purity is the measure of the residual sum of squares before and after the tree is split at a particular feature.
5
Model Performance Performance for Solid Waste Predictions
Accuracy of Overflow Predictions for Solid Waste Confusion Matrix Actual “No Overflow” “Overflow” Predicted 834 True Negatives = successful skips 12 False Negatives = overflow events Predicted “Overflow” 428 False Positives = unneeded collections 887 True Positives = successful collections Model Performance Total number of collections BAU: 4,760 from October – January DDC: 102 skips (2% fewer collections) PC: 770 skips (16% fewer collections) Collection efficiency 16% scheduled reduction in January 12% actual reduction due to WC’s failsafe servicing Collection efficacy Potential increases for lower-use latrines Performance for Solid Waste Predictions Overall Accuracy: 79.6% Sensitivity: 98.7% Specificity: 66.1% Low cutoff reduces overflow events Once we have model that seems to be predicting with sufficient accuracy, we then tune the model to avoid the least desireable outcomes. In the case of latrine servicing, it is much worse to have an overflow even than it is to have an unnecessary servicing event. An overflow event can happen when there is a false negative: that is, the algorithm thought that the latrine could be skipped but it was wrong. A false positive is when the algorithm thought that the latrine needed to be serviced, but it didn’t actually need to be serviced. In order to balance false negatives against false positives, we generate a plot that shows us the sensitivity and specificity of the model. Sensitivity is related to the number of false negatives and specificity is related to the number of false positivies. As you can see, if we moved the cut off to the right, we would have a higher overall accuracy, but we would also have more false negatives (or overflow events). Another way to read this is: sanergy would have saved 834 servicing events but they would have risked 12 overflow events.
6
Portland State University
THANK YOU Nick Turman-Bryant SweetSense, Inc. Portland State University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.