Investigating the role of individual neurons as outlier detectors

Investigating the role of individual neurons as outlier detectors
Carlos López-Vázquez Laboratorio LatinGEO SGM+Universidad ORT del Uruguay September 15th, 2015

Agenda Motivation for Outlier detection stage ANN as a regression tool
Formulation of the rule Case 1 of application: small dataset Case 2 of application: large dataset

Why to worry with outliers?
Outliers are unusual (in some sense) events Might adversely affect further calculations OR Might be the most valuable result! Usually ANN produces an output given an input Always! What about the consequences? We might want to detect spurious inputs

Example #1: Medical From Lucila Ohno-Machado, 2004 Given some inputs, detect/classify a possible coronary disease Lucila Ohno-Machado Decision Systems Group Brigham and Women’s Hospital Department of Radiology

Myocardial Infarction Network
Duration Intensity Elevation Pain Pain ECG: ST Smoker Age Male Answer: just a number y=“Probability” of MI 0.8 No room for I DON'T KNOW!

Example #2: Autonomous Land Vehicle
NN learns to steer an autonomous vehicle. 960 input units, 4 hidden units, 30 output units Driving at speeds up to 70 miles per hour ALVINN System Image of a forward - mounted camera From bi.snu.ac.kr/Courses/g-video12s/files/NN_suppl.ppt Biointelligence Laboratory Department of Computer Engineering Seoul National University Weight values for one of the hidden units

Coming soon?

Goal Identify unlikely coming events
And thus (maybe) refuse to estimate outputs! Supplement ANN answer (numerical, categorical) with some credibility flag How? Showing unlikely events during training (supervised) Relying on already trained ANN (unsupervised)

Multi Layer Perceptron (MLP)
y=18.4*v1-22.1*v2+10.2*v3 y=10.4*v1+5.12*v2+8.9*v3 y=20.2*v1+0.18*v2-9.1*v3 X1 adjustable weights 18.4 -22.1 10.2 10.4 5.12 8.9 20.2 0.18 -9.1 X2 v1 X3 v2 y v3 X4 X5

Why weights are so different?
Conjecture: It might denote a specific role for the neuron Such role can be connected to outliers Wow! Which one are candidates? Large weights? Small weights? Preliminary analysis suggested that Large WeightsOutlier detectors But... convince me!

Two different problems
1) Does the rule indeed works? If so: 2) How it performs when compared with other outlier detection procedures?

Example #3: Iris Flower Classification
3 species of Iris – SETOSA, VERSICOLOR, VIRGINICA Each flower has parts called sepal & petal Length and Width of sepal & petal can be used to determine iris type Data collected on large number of iris flowers For example, in one flower petal length=6.7mm and width=4.3mm also sepal length=22.4mm & sepal width =62.4mm. Iris type was SETOSA An ANN can be trained to determine specie of iris for given set of petal and sepal width and length

Iris training and testing data
Sepal Length Sepal Width Petal Length Petal Width Iris Class 0.224 0.624 0.067 0.043 Setosa 0.749 0.502 0.627 0.541 Veracolor 0.557 0.847 1.000 Virginica 0.110 0.051 0.722 0.459 0.663 0.584 0.776 0.416 0.831 0.196 0.667 0.612 0.333 0.812 0.875 0.055 0.082 0.165 0.208 0.592 0.027 0.376 0.639 0.498 0.710 0.306 0.086 0.000 0.424 0.694 0.792 0.137

Classification using regression
Somewhat unusual paper: used regression with a single output instead of the common three binary outputs! Unusual paper: internal weights of the ANN were published! petal width petal length sepal width sepal length v1 v2 y v3 From Benítez et al., 1997

Classification using regression
From Benítez et al., 1997 petal width petal length sepal width sepal length v1 v2 y v3 The ANN can be simplified...

Pruned ANN and the classification is still good despite not exact
sepal length sepal width petal length petal width Which role had the other two?

All misclassifications now announced by z=1!
Modified version z=“credibility flag” y=2.143v3 All misclassifications now announced by z=1!

Example #4: daily rain dataset
Weather records typically have missing values Many applications require complete databases Well established linear methods for interpolate spatial observations exist Their performance is poor for daily rain records Why not ANN?

Data and test area description
30 years of daily records for 10 stations were available 30 % of the events have missing values More than 80% of the readings are of zero rain, evenly distributed in the year Annual averages ranges from 1600 to 500 mm/day; time correlation is low

Non-linear interpolants: ANN
We used ANN as interpolators, with 9 inputs and 1 output The training was performed with one third of the dataset using backpropagation and minimizing the RMSE Some different architectures were considered (one and two hidden layers; different number of neurons, etc.) as well as some transformations of the data

Skipping other details…
We applied our rule to each of the 10 ANN Run a Monte Carlo experiment, seeding known outliers at random and locating them afterwards Thorough comparison against state-of-the-art alternatives (details in the paper) The ANN-based outlier detection tool performed very well Best, when outlier size (Mozilla effect) was ignored Satisfactory otherwise

Pros… Training stage is as usual; no special routine is required
We inspect the internal weights; no need to retraining Unsupervised classifications: outliers are not declared as such in advance Might offer an objective criteria to suspect underfitting

Cons… Weights might be sensible to outliers (masking effect) which in turn might prevent to detect them Which outliers are located? Only some suitable ones?

SGM+Universidad ORT del Uruguay
Questions? Carlos López-Vázquez Laboratorio LatinGEO SGM+Universidad ORT del Uruguay September 15th, 2015

Investigating the role of individual neurons as outlier detectors

Similar presentations

Presentation on theme: "Investigating the role of individual neurons as outlier detectors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Investigating the role of individual neurons as outlier detectors

Similar presentations

Presentation on theme: "Investigating the role of individual neurons as outlier detectors"— Presentation transcript:

Similar presentations

About project

Feedback