Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.

Slides:

Advertisements

Similar presentations

Social network partition Presenter: Xiaofei Cao Partick Berg.

Advertisements

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,

1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

Reference Reconciliation in Complex Information Spaces Xin (Luna) Dong, Alon Halevy, Jayant Sigmod 2005 University of Washington.

NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005

Evaluating Search Engine

N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.

Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.

Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Fast Random Walk with Restart and Its Applications

School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.

GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)

Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.

Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.

ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.

Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.

Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

A Local Seed Selection Algorithm for Overlapping Community Detection 1 A Local Seed Selection Algorithm for Overlapping Community Detection Farnaz Moradi,

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Kijung Shin Jinhong Jung Lee Sael U Kang

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.

Community detection via random walk Draft slides.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.

Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.

SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.

GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.

TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

Finding Dense and Connected Subgraphs in Dual Networks

Large Graph Mining: Power Tools and a Practitioner’s guide

NetMine: Mining Tools for Large Graphs

Community detection in graphs

Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad

Large Graph Mining: Power Tools and a Practitioner’s guide

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Discovering Functional Communities in Social Media

Approximating the Community Structure of the Long Tail

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

GANG: Detecting Fraudulent Users in OSNs

Graph and Link Mining.

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.

Presentation transcript:

Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

2 Bipartite Graphs G={V 1 +V 2, E} such that edges are between V 1 and V 2 Many applications can be modeled using bipartite graphs The key is to utilize these links across two natural groups for data mining

3 Problem Definition Neighborhood formation (NF) Given a query node a in V 1, what are the relevance scores of all the nodes in V 1 to a ? Anomaly detection (AD) Given a query node a in V 1, what are the normality scores for nodes in V 2 that link to a ? V1V2 a

4 Application I: Publication network Authors vs. papers in research communities Interesting queries: Which authors are most related to Dr. Carman? Which is the most unusual paper written by Dr. Carman?

5 Application II: P2P network Users vs. files in P2P systems Interesting queries: Find the users with similar preferences to me Locate files that are downloaded by users with very different preferences users files

6 Application III: Financial Trading Traders vs. stocks in stock markets Interesting queries: Which are the most similar stocks to company A? Find most unusual traders (i.e., cross sectors)

7 Application IV: Collaborative filtering collaborative filtering recommendation system CustomersProducts

8 Outline Problem Definition Motivation Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

9 Outline Problem Definition Motivation Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

10 Neighborhood formation – intuition Input: a graph G and a query node q Output: relevance scores to q random-walk with restart from q in V 1 record the probability visiting each node in V 1 the nodes with higher probability are the neighbors V1V2 q

11 Exact neighborhood formation Input: a graph G and a query node q Output: relevance scores to q Construct the transition matrix P where every node in the graph becomes a state every state has a restart probability c to jump back to the query node q. transition probability Find the steady-state probability u which is the relevance score of all the nodes to q q c cc c (1-c) c

12 Approximate neighborhood formation Scalability problem with exact neighborhood formation: too expensive to do for every single node in V 1 Observation: Nodes that are far away from q have almost 0 relevance scores. Idea: Partition the graphs and apply neighborhood formation for the partition containing q.

13 Outline Problem Definition Motivation Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

14 Anomaly detection - intuition t in V 2 is normal if all a in V 1 that link to t belong to the same neighborhood e.g. low normalityhigh normality t t

15 S Anomaly detection - method Input: a query node q from V 2 Output: the normality score of q Find the set of nodes connected to q, say S Compute relevance scores of elements in S, denoted as rs Apply score function f(rs) to obtain normality scores: e.g. f(rs) = mean(rs) q

16 Outline Problem Definition Motivation Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

17 Datasets datasets|V 1 ||V 2 ||E| Avgdeg(V 1 )Avgdeg(V 2 ) Conference- Author (CA) K662K5105 Author- Paper (AP) 316K472K1M32 IMDB553K204k2.2M411

18 Goals [Q1]: Do the neighborhoods make sense? (NF) [Q2]: How accurate is the approximate NF? [Q3]: Do the anomalies make sense? (AD) [Q4]: What about the computational cost?

19 [Q1] Exact NF The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node. The relevance scores can quantify how close/related the node is to the query node. relevance score most relevant neighbors relevance score most relevant neighbors ICDM (CA) Robert DeNiro (IMDB)

20 [Q2] Approximate NF Precision = fraction of overlaps between ApprNF and NF among top k neighbors The precision drops slowly while increasing the number of partition The precision remain high for a wide range of neighborhood size neighborhood size = 20 num of partitions = 10 # of partitions Precision neighborhood size

21 [Q3] Anomaly detection Randomly inject some nodes and edges (biased towards high-degree nodes) The genuine ones on average have high normality score than the injected ones normality score

22 [Q4] Computational cost Even with a small number of partitions, the computational cost can be reduced dramatically. Approximate NF Time(sec) # of Partitions

23 Related Work Random walk [Brin & Page98] [Haveliwala WWW02] Graph partitioning [Karypis and Kumar98] [Kannan et al. FOCS00] Collaborative filtering [Shardanand&Maes95] … Anomaly detection [Aggarwal&Yu. SIMOD01] [Noble&Cook KDD03] [Newman03]

24 Conclusion Two important queries on bipartite graphs: NF and AD An efficient method for NF using random- walk with restart and graph partitioning techniques Based the result of NF, we can also spot anomalies (AD) Effectiveness is confirmed on real datasets

25 Future work and Q & A Future work What about time-evolving graphs? Contact: Jimeng Sun