Trusting Machine Learning Algorithms for Safeguards Applications Nathan Shoman SAND2019-2539 C
Many different commercial machine learning applications Images/logos copyright to their respective owners
Machine learning can be applied to domains relevant to safeguards Anomaly detection in multivariate data sets Zhang (2018) https://arxiv.org/abs/1811.08055 Anomaly detection in images Neural Network Prediction Actual frame Anomaly UCSD Dataset
Understanding neural networks Explain forward-pass, back-prop, classification, supervision http://cs231n.github.io/neural-networks-1/ Stanford CS231n (2018) http://cs231n.github.io/
More complex networks become difficult to interpret Inception network Szegedy et al (2015) https://arxiv.org/abs/1409.4842
Practical considerations for evaluating NN performance Precision / recall / F1 Importance of validation and test data Exploring intermediate layers Layer 5 Layer 3 https://arxiv.org/abs/1311.2901 http://cs231n.github.io/understanding-cnn/ https://arxiv.org/pdf/1312.6034.pdf Zeiler (2013) https://arxiv.org/abs/1311.2901
Using LIME (Local Interpretable Model-agnostic Explanations) General algorithm to explain predictions of classifiers or regressors by approximating it locally with an interpretable model Fidelity – Interpretability Trade-off Riberio, et al. (2016) https://arxiv.org/abs/1602.04938
Using LIME with CNNs for image recognition and classification Riberio, et al. (2016) https://arxiv.org/abs/1602.04938
One Pixel Attack for Fooling Deep Neural Networks Su et al. (2017) https://arxiv.org/abs/1710.08864
Conclusions ML algorithms are powerful tools that could improve existing safeguards and security systems Trust of machine learning algorithms is essential to acceptance by the safeguards community Analysis with tools such as LIME is important when presenting results Newly developed strategies such as Layer-wise Relevance Propagation (Binder, et al. 2016) and Testing with Concept Activation Vectors (Kim et al. 2018) can provide further insight into ML classification logic One pixel attack is detectable, even when not perceived via human eye, but require extra pre-processing (Xu et al. 2017, Liang et al. 2017) Binder – 1604.00825 Kim - http://proceedings.mlr.press/v80/kim18d/kim18d.pdf Xu-1704.01155 Liang-1705.08378