Modeling and Finding Abnormal Nodes (chapter 2) 駱宏毅 Hung-Yi Lo Social Network Mining Lab Seminar July 18, 2007.

Slides:



Advertisements
Similar presentations
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
Advertisements

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Evaluating Search Engine
Correlation and Autocorrelation
Mutual Information Mathematical Biology Seminar
Introduction to Inference Estimating with Confidence Chapter 6.1.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Novel Self-Configurable Positioning Technique for Multihop Wireless Networks Authors : Hongyi Wu Chong Wang Nian-Feng Tzeng IEEE/ACM TRANSACTIONS ON NETWORKING,
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Inferences About Process Quality
CHAPTER 8 Estimating with Confidence
Determining the Size of
Inferential Statistics
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Modeling, Searching, and Explaining Abnormal Instances in Multi-Relational Networks Chapter 1. Introduction Speaker: Cheng-Te Li
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Inductive learning Simplest form: learn a function from examples
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
CHAPTER 8 Estimating with Confidence
Chapter 8 Introduction to Inference Target Goal: I can calculate the confidence interval for a population Estimating with Confidence 8.1a h.w: pg 481:
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical inference. Distribution of the sample mean Take a random sample of n independent observations from a population. Calculate the mean of these.
Chapter 6 Probability. Introduction We usually start a study asking questions about the population. But we conduct the research using a sample. The role.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
First topic: clustering and pattern recognition Marc Sobel.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Section 10.1 Confidence Intervals
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
MARLAP Chapter 20 Detection and Quantification Limits Keith McCroan Bioassay, Analytical & Environmental Radiochemistry Conference 2004.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Review of statistical modeling and probability theory Alan Moses ML4bio.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
STATISTICS Random Variables and Distribution Functions
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CSE572: Data Mining by H. Liu
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

Modeling and Finding Abnormal Nodes (chapter 2) 駱宏毅 Hung-Yi Lo Social Network Mining Lab Seminar July 18, 2007

2007/07/18 2 Mutual Information (MI) (1/2) MI measures the mutual dependence of two random variables.  The higher it is, the more dependent the two random variables are with each other.  Its value is always positive. The equation of MI:

2007/07/18 3 Mutual Information (MI) (2/2) For example, say a discrete random variable X represents visibility at a certain moment in time and random variable Y represents wind speed at that moment. The mutual information between X and Y:

2007/07/18 4 Pointwise Mutual Information (PMI) PMI measures the mutual dependence of between two instances or realizations of random variables.  Positive => high correlated  Zeros => no information (independence)  Negative => opposite correlated The equation of PMI:

2007/07/18 5 Mutual Information and Pointwise Mutual Information: Examples (1/5) Discrete random variable X: (visibility) X=Good X=Bad

2007/07/18 6 Mutual Information and Pointwise Mutual Information: Examples (2/5) Discrete random variable Y: (wind speed) Y=High Y=Low

2007/07/18 7 Mutual Information and Pointwise Mutual Information: Examples (3/5) Mutual Information of X and Y: X=Good X=Bad Y=High Y=Low

2007/07/18 8 Mutual Information and Pointwise Mutual Information: Examples (4/5) Mutual Information of X and Y:

2007/07/18 9 Mutual Information and Pointwise Mutual Information: Examples (5/5) Mutual Information of X and Y:

2007/07/18 10 Pointwise Mutual Information Model for Nodes and Paths in an MRN (1/4) We model the dependency of a node s and a path type p t as the pointwise mutual information: The marginal probabilities and joint probability depended on which random experiment one performs.

2007/07/18 11 Pointwise Mutual Information Model for Nodes and Paths in an MRN (2/4) For RE1: The conditional probability was defined as the contribution 1, so:

2007/07/18 12 Pointwise Mutual Information Model for Nodes and Paths in an MRN (3/4) For RE2:

2007/07/18 13 Pointwise Mutual Information Model for Nodes and Paths in an MRN (4/4) For RE3:

2007/07/18 14 Mutual Information for MRN (1/2) Unlike PMI, which models the dependency of two instances of random variables, mutual information is generally used to compute the dependency of two random variables. However, as can be inferred from the examples stated above, the PMI models we proposed focus only on the positive dependency and ignore other situations. Redefining S and PT as two binary random variables:  S: whether the path starts from the node s  PT: whether the path belongs to the path type pt

2007/07/18 15 Mutual Information for MRN (2/2) Note that the major difference of these two random variables compared with the previous ones is that both s and p t are included in the definition of random variables and their values can only be true of false.

2007/07/18 16 Meta Constraints (1/4) Path types can be generated from paths by variable relaxation. Goal: systematically selected a set of path types and adapt to users’ preferences. Use some meta-constraints as parameters (setting)

2007/07/18 17 Meta Constraints (2/4) Meta-constraint 1: maximum path length  The farther away a node/link is to the source node, the less impact it has on the semantics of the source node.  The longer a path is, the harder to make sense of it.  In our experiment we chose the maximum path length somewhere between four and six. Meta-constraint 2: relation-only constraints  Treat paths with the same sequence of relations (links) as of the same path type. Based on meta-constraints 1 and 2, the system can fully automatically extract a set of path types from an MRN to represent the semantics of the nodes.

2007/07/18 18 Meta Constraints (3/4) Meta-constraint 3: node and link type constraints  One can specify that at least one of the nodes in a path type needs to be of type person, or that one link in the path needs to be a murder link. Meta-constraint 4: exclusion constraint  One can state that paths that contain the rob relationship should not be considered in the analysis. Users can express their preference and biases. This is particularly useful in situations where users are very confident about what kind of links or nodes are important and what are useless.

2007/07/18 19 Meta Constraints (4/4) Meta-constraint 5: structure constraints  In one of our experiments we ask the system to only consider paths whose source and target nodes are the same, which we call the “loop constraint”.  We also define a “uniform link constraint” to consider only paths with only one single link type such as [A cites B cites C cites D]. Meta-constraint 6: guiding the computation of feature values  We can tell the system to ignore all paths without a certain type of nodes while performing the path-choosing random experiments.  Meta-constraints 3, 4, and 5 can be reused again here

2007/07/18 20 Finding Abnormal Nodes in the MRN (1/2) The semantic profiles of a node: represented by using path types as features and the dependencies (i.e., contribution or PMI or MI) of each node with respect to each path type as feature values. Then we want to identify abnormal instances in our set of nodes. We can  Extract nodes that have high dependency values for many path types (e.g., highly connected nodes would fall into this category).  Identify the ones that have low dependency values such as isolated or boundary nodes. Our goal is not to find important nodes, but instead we are trying to find nodes that look relatively different from others.

2007/07/18 21 Finding Abnormal Nodes in the MRN (2/2) We transform the question of identifying abnormal nodes in a semantic graph into the question identifying nodes with abnormal semantic profiles. Outlier detection techniques that can assist us to identify abnormal nodes:  clustering-based outliers  distribution-based outliers  distance-based outliers

2007/07/18 22 Outlier Detection (1/3) Clustering-based outlier detection  CLARANS, BIRCH and CLUE.  Extract outliers as those points that cannot be clustered into any clusters based on a given clustering method. Distribution-based outlier detection  Assumes some statistical distribution of the data and identifies deviating points as outliers But no guarantee that either a learnable distribution or distinguishable clusters exist.

2007/07/18 23 Outlier Detection (2/3) Distance-based outlier detection  Identifies outlier points simply as those that look very different from their neighbor points.  Distance-based outlier detectors look for outliers from a local point of view instead of a global one. That is, they do not look for a point that is significantly different from the rest of the world, but for a point that is significantly different from its closest points. A distribution-based outlier detector might not deem a researcher who published three papers per year as very abnormal, while a distance-based outlier detector can identify it as an abnormal one, if finds that the other researchers in the same area (i.e., the neighbor points) published on the average one paper every two years.

2007/07/18 24 Outlier Detection (3/3) Distance-based outlier detection is useful in a security domain, since malicious individuals who try to disguise themselves by playing certain roles but fail to get everything right are likely to still be different from genuine players of that role. We chose Ramaswamy’s distance-based outlier algorithm, which ranks outlier points by their distance to the k-th nearest neighborhood.

2007/07/18 25 UNICORN: An Unsupervised Abnormal Node Discovery Framework UNICORN, as an abnormal animal, is the abbreviation for “UNsupervised Instance disCOvery in multi-Relational Networks”.

2007/07/18 26 UNICORN Pseudo-Code (1/2)

2007/07/18 27 UNICORN Pseudo-Code (2/2)

2007/07/18 28 Local Node Discovery Goal: identify a node that is abnormally connected to a given node s. We consider only the paths that start from s, and can be done by simply adding one meta-constraint of type 6 during the feature value generation stage.

2007/07/18 29 謝謝大家