Download presentation
Presentation is loading. Please wait.
Published byMadison McLaughlin Modified over 9 years ago
1
A simple method for multi-relational outlier detection Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada With tools that you probably have around the house lab.
2
2/13 A simple method for multi-relational outlier detectiona
3
3/13 System Flow Flach, P. A. (1999), Knowledge representation for inductive learning 'Symbolic and Quantitative Approaches to Reasoning and Uncertainty', Springer, pp. 160--167. Complete Database Population Parameter Values restrict to target individual vector norm outlier score Individual Profile Individual Parameter Values Parameter Learning Algorithm Parameter Learning Algorithm Model.... Input: Model, database, target individual. Output: an outlier score
4
4/13 Example A simple method for multi-relational outlier detection Model = Markov Logic Network learned for Premier League Season 2011-2012 FormulasEstimated Population Parameters Estimated Parameters for P=van Persie SavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=low 0.020.56 SavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff(P,M)=high 3.550.36... (331 formulas)....
5
5/13 Evaluation: Synthetic Data A simple method for multi-relational outlier detection Two Features. Designed so that outliers are easy to distinguish from normals (sanity check). 1. Normals have a strong correlation, outliers none. 2. Outliers have a strong correlation, normals none. 3. Correlations are the same, but marginals are very different.
6
6/13 Bayesian Network Representation F1=ShotEfficiency F2 =Match_Resullt P(F1=1)= % 50 P(F2=0|F1=0)= % 90 P(F2=1|F1=1)= % 90 Normal=Striker P(F1=1)= % 50 P(F2=1)= % 50 Outlier=MidFielder P(F1=1)= % 50 (a) (b) P(F2=1)= % 50 P(F2=0|F1=0)= % 90 P(F2=1|F1=1)= % 90 F1=ShotEfficiency F2=Match_Resullt Normal=Striker F1=TackleEfficiency F2=Match_Resullt F1=TackleEfficiency F2=Match_Resullt Outlier=MidFielder
7
7/13 Results AD = Breunig, M.; Kriegel, H.-P.; Ng, R. T. & Sander, J. (2000), LOF: Identifying Density-Based Local Outliers, in ‘ACM SIGMOD'. LOG = Riahi, F.; Schulte, O. & Liang, Q. (2014), 'A Proposal for Statistical Outlier Detection in Relational Structures', AAAI-StarAI Workshop on Statistical-Relational AI. Metric = Area Under Curve ELD = average L1-norm KLD = average difference AD = use single feature marginals only (unit clauses) LOG = outlier score = log-likelihood
8
8/13 A simple method for multi-relational outlier detection Case Study: Single Features Which formulas/rules influence outlier score the most? interpretability Which unit clauses influence outlier score the most?
9
9/13 Novak, P. K.; Webb, G. I. & Wrobel, S. (2009), 'Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining', Journal of Machine Learning Research. Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational Data: A Case Study', Expert Systems and Applications Case Study: Correlations Which formulas/rules influence outlier score the most? interpretability Which associations influence outlier score the most? Related to exception mining (Novak et al. 2009) IndividualRuleConfidenc e Individual Confidenc e Class Edin DzekoShotEff = high AND TackleEff = medium DribbleEff = low 50%38% Van PersieShotEff = high AND TimePlayed = high ShotsOnTarget = high 70%50% Confidence = conditional probability
10
10/13 Distribution Divergence Perspective Halpern, “An analysis of first-order logics of probability”, AI Journal 1990. de Raedt, L. (2008), Logical and Relational Learning, Springer. Ch.9 Joint Value AssignmentsFrequency for Random Striker Frequency for P=van Persie SavesMade(P,M)=low AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=low 22%10% SavesMade(P,M)=low AND shotsOnTarget(P,M)=high AND ShotEff(P,M)=high 30%62% ….... Outlier Score = Dissimilarity measure between Random Individual and Target Individual. In our work, dissimilarity measure = distribution divergence. Could leverage other distance-type metrics as well.
11
11/13 Propositionalization for Outlier Detection Lippi, M.; Jaeger, M.; Frasconi, P. & Passerini, A. (2011), 'Relational information gain', Machine Learning 83(2), 219—239. PlayersSavesMade(P,M)=med AND shotsOnTarget(P,M)=low AND ShotEff(P,M)=low SavesMade(P,M)=med AND shotsOnTarget(P,M)=high AND ShotEff(P,M)=high (331 more) Wayne Rooney 13%10%... van Persie 50%62%... ….... Construct 331-dimensional attribute vector for each individual. One frequency/count value for each formula pseudo-i.i.d data view. Like n-grams. Apply standard single-table analysis methods. Could also use learned weights instead of sufficient statistics.
12
12/13 Propositionalization Results A simple method for multi-relational outlier detection LowCor = Normals have low correlation. HighCor = Normals have high correlation.
13
13/13 Summary Outlier detection based on a statistical-relational model. Basic Idea: compare individual profile to entire population. Leverage parameter learning: 1. Learn parameter values for individual. 2. Learn parameter values for entire population. 3. Outlier score = parameter vector difference. E.g. average L1-distance. Leverage relational distance between individuals. In our work, distance ≈ distribution divergence. Outlier score = divergence between individual distribution and population distribution. Another approach: Model-based propositionalization for outlier detection. Attribute-values = frequency counts for patterns in model structure. A simple method for multi-relational outlier detectiona
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.