Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outlier Detection Exception Mining

Similar presentations


Presentation on theme: "Outlier Detection Exception Mining"— Presentation transcript:

1 Outlier Detection Exception Mining
School of Computing Science Simon Fraser University Vancouver, Canada Anomaly Detection Outlier Detection Exception Mining

2 Profile-Based Outlier Detection for Relational Data
Individual Database Profile, Interpretation, egonet e.g. Brad Pitt’s movies Population Database e.g. IMDB Goal: Identify exceptional individual databases cite Blockeel on learning from interpretations like learning from interpretations Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational Data: A Case Study in Geographical Information Systems', Expert Systems With Applications 39(5), 4718—4728.

3 Example: population data
gender = Man country = U.S. gender = Man country = U.S. gender = Woman country = U.S. gender = Woman country = U.S. False n/a False n/a True $500K False n/a False n/a True $5M False n/a True $2M ActsIn salary functors: gender, genre, who acts in what. runtime = 98 min drama = true action = true runtime = 111 min drama = false action = true

4 Example: individual data
gender = Man country = U.S. False n/a False n/a functors: gender, genre, who acts in what. runtime = 98 min drama = true

5 Model-Based Relational Outlier Detection
Model-based: Leverage result of Bayesian network learning Feature generation based on BN model Define outlierness metric using BN model Population Database Individual Database cite Blockeel on learning from interpretations like learning from interpretations Class-level Bayesian network Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational Data: A Case Study in Geographical Information Systems', Expert Systems With Applications 39(5), 4718—4728.

6 Model-Based Feature Generation
Learning Bayesian Networks for Complex Relational Data

7 Model-Based Outlier Detection for Relational Data
Population Database Individual Database Class-level Bayesian network Individual Feature Vector cite Blockeel on learning from interpretations like learning from interpretations Propositionalization/Relation Elimination/ETL: Feature vectors summarize the individual data leverage outlier detection for i.i.d. feature matrix data Riahi, F. & Schulte, O. (2016), Propositionalization for Unsupervised Outlier Detection in Multi-Relational Data, in 'Proceedings FLAIRS 2016.', pp

8 Example: Class Bayesian Network
gender(A) ActsIn(A,M) Drama(M) Learning Bayesian Networks for Complex Relational Data

9 Example: Feature Matrix
1 1/2 Each feature corresponds to a family configuration in the Bayesian network Similar to feature matrix for classification For step-by-step construction, see supplementary slides on website Learning Bayesian Networks for Complex Relational Data

10 Feature Generation/Propositionalization for Outlier Detection
Similar to feature generation for classification Main difference: include all first-order random variables, not just the Markov blanket of the class variable Bayesian network learning discovers relevant conjunctive features Related work: The Oddball system also extracts a feature matrix from relational information based on network analysis (Akoglu et al. 2010) Leverages existing i.i.d. outlier detection methods does not define a “native” relational outlierness metric Akoglu, L.; Mcglohon, M. & Faloutsos, C. (2010), OddBall: Spotting Anomalies in Weighted Graphs, in 'PAKDD', pp Akoglu, L.; Tong, H. & Koutra, D. (2015), 'Graph based anomaly detection and description: a survey', Data Mining and Knowledge Discovery 29(3),

11 Relational Outlierness Metrics
Learning Bayesian Networks for Complex Relational Data

12 Exceptional Model Mining for Relational Data
EMM approach (Knobbe et al. 2011) for subgroup discovery in i.i.d. data Fix a model class with parameter vector θ. Learn parameters θc for the entire class. Learn parameters θg for a subgroup g. Measure difference between θc and θg quality measure for subgroup g. For relational data, an individual o = subgroup g of size 1. Compare random individual against target individual Knobbe, A.; Feelders, A. & Leman, D. (2011), Exceptional Model Mining'Data Mining: Foundations and Intelligent Paradigms.', Springer Verlag, Heidelberg, Germany . “Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI.

13 EMM-Based Outlier Detection for Relational Data
Population Database Individual Database Class Bayesian network (for random individual) Individual Bayesian network related to subgroup discovery: anomaly is like subgroup of size 1 Outlierness Metric (quality measure) = Measure of dissimilarity between class and individual BN e.g. KLD, ELD (new) “Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI.

14 Example: class and individual Bayesian network parameters
Gender (A) Drama(M) Cond. Prob. of ActsIn(A,M)=T M T 1/2 F W 1 P(gender(A)=M) = 0.5 P(Drama(M)=T) = 0.5 gender(A) ActsIn(A,M) Drama(M) P(gender(bradPitt)=M) = 1 P(Drama(M)=T) = 0.5 Gender (bradPitt) Drama (M) Cond. Prob. of ActsIn(A,M)=T M T F assumes maximum likelihood estimation for P_B implicitly uses the instantiation principle Raedt, L. D. (1998), Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract), in David Page, ed., 'ILP', Springer, , pp. 1-8. gender(BradPitt) ActsIn(BradPitt,M) Drama(M)

15 Outlierness Metric = Kulback-Leibler Divergence
where Bc models the class database distribution Bo model the individual database distribution Do Assuming that PBo=PDo (MLE estimation), the KLD is the individual data log-likelihood ratio: KLD = expected log-difference in probabilities Learning Bayesian Networks for Complex Relational Data

16 Brad Pitt Example total KLD = 0.69 + 0.35 = 1.04 KLD for Drama(M) = 0
gender(A) individual joint individual cond class cond ln(ind.cond.) ln(class cond.) KLD M 1 0.5 -0.69 0.69 ActsIn(A,M) gender(A) Drama(M) individual joint individual cond class cond ln(ind.cond.) ln(class cond.) KLD F M T 1/2 1 0.5 -0.69 0.35 0.00 total total KLD = = 1.04 KLD for Drama(M) = 0 omitted rows with individual probability = 0 Learning Bayesian Networks for Complex Relational Data

17 Mutual Information Decomposition
The interpretability of the metric can be increased by a mutual information decomposition of KLD KLD wrt marginal single-variable distributions lift of parent condition in individual distribution lift of parent condition in class distribution The first sum measures single-variable distribution difference The second sum measures difference in strength of associations Learning Bayesian Networks for Complex Relational Data

18 ELD = Expected Log-Distance
A problem with KLD: some log ratios are positive, some negative  cancelling of differences, reduces power Can fix by taking log-distances Learning Bayesian Networks for Complex Relational Data

19 Two Types of Outliers Feature Outlier: unusual distribution over single attribute in isolation DribbleEfficiency Correlation Outlier: unusual relevance of parent for children (mutual information, lift) DribbleEfficiency  Win Learning Bayesian Networks for Complex Relational Data

20 Example: Edin Dzeko, Marginals
Data are from Premier League Season Low Dribble Efficiency in 16% of his matches. Random Striker: Low DE in 50% of matches. ELD contribution for marginal sum: 16% x |ln(16%/50%)| = 0.18 so Edin Dzeko has a stronger association Learning Bayesian Networks for Complex Relational Data

21 Example: Edin Dzeko, Associations
Association: Shotefficiency = high, TackleEfficiency = medium DribbleEffiency = low For Edin Dzeko: confidence = 50% lift = ln(50%/16%)=1.13 support (joint prob) = 6% For random striker confidence = 38% lift = ln(38%/50%) =-0.27 ELD contribution for association 10% x |1.13-(-0.27)|= 6% x 1.14 = 0.068 so Edin Dzeko has a stronger association and in the opposite direction Learning Bayesian Networks for Complex Relational Data

22 Evaluation Metrics Use precision as evaluation metric
Set the percentages of outliers to be 1% and 5%. How many outliers were correctly recognized Similar results with AUC, recall. cite Jiawei Han Gao, J.; Liang, F.; Fan, W.; Wang, C.; Sun, Y. & Han, J. (2010), On Community Outliers and Their Efficient Detection in Information Networks, in ‘SIGKDD, pp

23 Methods Compared Outlierness metrics Aggregation Methods KLD
|KLD|: replace log-differences by log-distances ELD LOG = -log-likelihood of generic class model on individual database FD: |KLD| with respect to marginals only Aggregation Methods Use counts of single feature values to form data matrix Apply standard single-table methods (LOF, KNN, OutRank) Learning Bayesian Networks for Complex Relational Data

24 Synthetic Datasets Synthetic Datasets: Should be easy!
Two Features per player per match Samples below high correlation ShotEff Match Result Normal 1 Outlier low correlation ShotEff Match Result Normal 1 Outlier we also know the ground truth about the Bayesian network We generated three synthetic datasets for a soccer domain with normal and outlier players using the distributions represented in the three Bayesian networks of Figure 2. Each player participates in 38 matches. Each match assigns a value to each attribute Fi; i = 1; 2 for each player. High Correlation Normal individuals exhibit a strong association between their attributes, outliers no association. Both normals and outliers have a close to uniform distribution over single attributes. See Figure 2(a). Low Correlation Normal individuals exhibit no association between their at- tributes, outliers have a strong association. Both normals and outliers have a close to uniform distribution over single attributes. See Figure 2(b). Single attributes Both normal and outlier individuals exhibit a strong association between their attributes. In normals, 50% of the time, attribute 1 has value 0. For outliers, attribute 1 has value 0 only 10% of the time. See Figure 2(c).

25 Synthetic Data Results
Use precision as evaluation metric Set the percentages of outliers to be 1% and 5% How many outliers were correctly recognized

26 1D Scatter-Plots Red points are outliers and blue points are normal class points eld maps outliers to largest range

27 Case Study: Strikers and Movies
Player Name Position ELD Rank Max Node FD Max Value Object Probability Class Probability Edin Dzeko Striker 1 Dribble Efficiency DE = Low 0.16 0.50 Paul Robinson Goalie 2 SavesMade SM = Medium 0.30 0.04 Michel Vorm 3 0.37 Striker = Normal MovieTitle Genre ELD Rank ELD Max Node FD Max feature Value Object Probability Class Probability Brave Heart Drama 1 Actor_Quality a_quality=4 0.93 0.42 Austin Powers Comedy 2 Cast_position cast_num=3 0.78 0.49 Blue Brothers 3 0.88 Drama = Normal

28 Conclusion Relational outlier detection: two approaches for leveraging BN structure learning Propositionalization BN structure defines features for single-table outlier detection Relational Outlierness metric Use divergence between database distribution for target individual and random individual Novel variant of Kullback-Leibler divergence works well: interpretable accurate Learning Bayesian Networks for Complex Relational Data

29 Tutorial Conclusion: First-Order Bayesian Networks
Many organizations maintain structured data in relational databases. First-order Bayesian networks model probabilistic associations across the entire database. Halpern/Bacchus probabilistic logic unifies logic and probability. random selection semantics for Bayesian networks: can query frequencies across the entire database. Learning Bayesian Networks for Relational Data

30 Conclusion: Learning First-Order Bayesian Networks
Extend Halpern/Bacchus random selection semantics to statistical concepts new random selection likelihood function tractable parameter and structure learning can also be used to learn Markov Logic Networks relational Bayesian network classification formula log-linear model whose predictors are the proportions of Bayesian network features. New approach to relational anomaly detection compare probability distribution of potential outlier with distribution for reference class Learning Bayesian Networks for Relational Data


Download ppt "Outlier Detection Exception Mining"

Similar presentations


Ads by Google