Download presentation
Presentation is loading. Please wait.
1
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Presented By Bhavana Dalvi Presented By Bhavana Dalvi
2
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
3
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a
4
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ?
5
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ?
6
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ? 0.8 b b
7
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ? 0.8 0.6 0.2 0.4 b b
8
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which is the uncommon paper written by ‘a’ ? 0.8 0.6 0.2 0.4
9
B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which is the uncommon paper written by ‘a’ ? 0.8 0.6 0.2 0.4
10
B IPARTITE GRAPHS AND INTERESTING QUESTIONS P2P Network 10 users files Which users have similar preferences as a particular user? Jimeng Sun’s presentation at ICDM 2005 Which files are downloaded by users with very different preferences?
11
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
12
Neighborhood formation (NF) Input : query node q in V 1 Output : relevance scores of all the nodes in V 1 to q Anomaly detection (AD) Input : query node q in V 1, Output : normality scores for nodes in V 2 that link to q P ROBLEM DEFINITION V1V2 q E
13
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
14
N EIGHBORHOOD FORMATION Relevance (b, q) (# short length paths from q to b) b q The connection that links only b and q brings more relevance than the connection which links b, q and other nodes. The connection that links only b and q brings more relevance than the connection which links b, q and other nodes. b q
15
E XACT NF ALGORITHM : RANDOM WALK WITH RESTART Input : a graph G and a query node q Output : relevance scores to q Construct the transition matrix where every node in the graph becomes a state every state has a restart probability c to jump back to the query node q. transition probability Find the steady-state probability u which is the relevance score of all the nodes to q q c cc c c Jimeng Sun’s presentation at ICDM 2005
16
F INDING S TEADY S TATE P ROBABILITIES |V 1 | = k, |V 2 | = n M : k*n matrix representing weighted graph G Adjacency matrix : P A = col_norm(M A ) q A : transform query node ‘a’ to (k+n)*1 vector where only a th column has 1 and rest are 0. u A : steady state probability vector with restart probability c Bipartite structure : k << n then savings are significant
17
E XTENSIONS TO NF A LGORITHM Parallel NF If multiple queries, computation can be done in parallel. Approximate NF Cluster the nodes in to k partitions (preprocessing) Given query node q, find partition G i it belongs to Run Exact NF algorithm only on G i Set relevance = 0 for nodes not in G i
18
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
19
A NOMALY D ETECTION A node x in V 2 is normal if Nodes in V 1 that links to x are in same neighbourhood. e.g. V 1 V 2 V 1 V 2 low normalityhigh normality x x
20
A NOMALY D ETECTION A LGORITHM Input : node t in V 2, Bipartite transition matrix P, Output : Normality score(t) 1. Set S t = neighbours of t in V 1 2. RS t : Pairwise relevance scores for nodes in S t 3. Normality score ns(t) = function (RS t ) e.g. mean over non-diagonal elements in RS t
21
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
22
D ATASETS datasets|V 1 ||V 2 ||E| Avgdeg (V 1 ) Avgdeg (V 2 ) Conference- Author (CA) 2687288K662K5105 Author- Paper (AP) 316K472K1M32 IMDB553K204K2.2M411
23
D O THE NEIGHBORHOODS MAKE SENSE ? relevance score most relevant neighbors The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.
24
H OW ACCURATE IS THE APPROXIMATE NF? neighborhood size = 20 num of partitions = 10 Precision = fraction of overlaps between ApprNF and NF among top k neighbors The precision drops slowly while increasing the number of partition The precision remain high for a wide range of neighborhood size Precision = fraction of overlaps between ApprNF and NF among top k neighbors The precision drops slowly while increasing the number of partition The precision remain high for a wide range of neighborhood size
25
D O THE ANOMALIES MAKE SENSE ? avg. normality score Injection : Inject 100 nodes in V 2 connecting k nodes each in V 1 where k = avg. degree of nodes in V 2 Nodes in V 1 are randomly picked such that degree = 10 * avg. degree of nodes in V 1 Assumption : will induce connections across neighbourhoods Injection : Inject 100 nodes in V 2 connecting k nodes each in V 1 where k = avg. degree of nodes in V 2 Nodes in V 1 are randomly picked such that degree = 10 * avg. degree of nodes in V 1 Assumption : will induce connections across neighbourhoods
26
W HAT ABOUT THE COMPUTATIONAL COST ?
27
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
28
R ELATED W ORK Random walk on Graphs Page-Rank [ISDN 1998], Topic Sensitive Page-Rank [WWW 2002] Outlier detection Outlier detection in high dimensional data : Aggarwal and Yu [SIGMOD 2001] Outlier Detection Using Random Walks [ICTAI 2006] Find outlier clusters Graph partitioning : METIS package Spectral clustering methods Neighbourhoods can become personalized clusters
29
O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
30
C ONCLUSIONS AND F UTURE W ORK Solution to two problems for Bipartite Graphs Neighborhood Formation (NF) Anomaly Detection (AD) Random walk with restart along with graph partitioning can be used to solve NF efficiently. AD can be done based on relevance scores generated by NF Experiments on real datasets show good results. Proximity Tracking on Time-Evolving Graphs (SIAM 2008 paper) Defines proximity scores in dynamic setting. Efficient incremental updates
31
T HANK YOU
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.