1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

Slides:



Advertisements
Similar presentations
ADAPTIVE FASTEST PATH COMPUTATION ON A ROAD NETWORK: A TRAFFIC MINING APPROACH Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag.
Advertisements

Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
Advanced Data Structures
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Graph Data Management Lab School of Computer Science , Bristol, UK.
Graph & BFS.
Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop.
Graph Data Management Lab, School of Computer Science Put conference information here: The 12-th International Conference.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Finding dense components in weighted graphs Paul Horn
Graph Data Management Lab, School of Computer Science Add title here: Large graph processing
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
1/16/20161 Introduction to Graphs Advanced Programming Concepts/Data Structures Ananda Gunawardena.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Kijung Shin Jinhong Jung Lee Sael U Kang
Overlapping Community Detection in Networks
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Graph Indexing From managing and mining graph data.
Presented by: Omar Alqahtani Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Sampling Based Range Partition for Big Data Analytics + Some Extras Milan Vojnović Microsoft Research Cambridge, United Kingdom Joint work with Charalampos.
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
Mining Social Ties Beyond Homophily Hongwei Liang * Ke Wang * Feida Zhu # * Simon Fraser University, Canada # Singapore Management University, Singapore.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
An Algorithm for Enumerating SCCs in Web Graph Jie Han, Yong Yu, Guowei Liu, and Guirong Xue Speaker : Seo, Jong Hwa.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Cohesive Subgraph Computation over Large Graphs
Groups of vertices and Core-periphery structure
Jiawei Han Department of Computer Science
CS200: Algorithm Analysis
Latent Space Model for Road Networks to Predict Time-Varying Traffic
Clustering Social Networks
Finding Subgraphs with Maximum Total Density and Limited Overlap
Scaling up Link Prediction with Ensembles
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Lecture 10 Graph Algorithms
Distance-Constraint Reachability Computation in Uncertain Graphs
Presentation transcript:

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Online Search of Overlapping Communities Wanyun Cui, Fudan University Yanghua Xiao, Fudan University Haixun Wang, Microsoft Research Asia Yiqi Lu, Fudan University Wei Wang, Fudan University Presenter. Wanyun Cui

2/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Motivation  Model  Algorithm  Experiments  Applications

3/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Motivation  Model  Algorithm  Experiments  Applications

4/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network  Complex network is everywhere. Social Network

5/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network  Complex network is everywhere. Internet

6/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network  Complex network is everywhere. Protein Network

7/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Complex network  Complex network is everywhere. InternetSocial NetworkProtein Network

8/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Community structures  Complex network is everywhere.  Most real life networks have community structures. The graph can be divided into different groups such that the vertices within each group are closely connected and the vertices between different groups are sparsely connected InternetSocial NetworkProtein Network

9/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Overlapping community structure  Overlapping community: a vertex may belong to multiple communities

10/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Overlapping community structure  Overlapping community: a vertex may belong to multiple communities C1: small boat C2: meaning of bucket C3: big boat C4: table wares

11/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Finding community structures  Two possible ways to find the community structure OCD: overlapping community detection OCS: overlapping community search

12/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  OCD: divides the entire network to find communities

13/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Facebook network: over 800 million nodes and 100 billion links algorithmcomplexity Girvan–Newman algorithm O(|E| 3 ) LPAAlmost linear LAO(|C||E|+|V|)

14/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  A fixed parameter or criterion is not appropriate for all vertices and queries. Communities of a student Communities of Barack Obama

15/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Graphs in real life are always evolving over time.  We cannot afford to run OCD very frequently.  OCD loses its freshness and effectiveness

16/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Disadvantages of OCD Too costly Global criterion Unfriendly to dynamic graph  Usually performed in an offline fashion

17/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCS: problem definition  OCS: Given graph G, a query vertex v Return: all communities that v belong to Given:Return:

18/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Light weight  We just need to find communities within the local neighborhoods of the vertex.  Our OCS solution only needs several milliseconds to find answer

19/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Friendly to dynamic graph

20/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science OCD vs. OCS  Advantages of OCS: More efficient Personalized criterion Light weight  A good choice to find communities in an online fashion

21/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Applications of OCS Friend recommendation on Facebook. Semantic expansion. Infectious disease control. Etc.

22/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Challenges of OCS Modeling Complexity and scalability  A community should be dense enough  Overlapping aware  Generality

23/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Challenges of OCS Modeling Complexity and scalability  OCS in the worst case may need to enumerate an exponential number of valid communities. Computational hard  Approximate approach

24/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Introduction  Model  Algorithm  Experiments  Applications

25/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality  The inner edges of a community should be dense  Clique as the unit of community A clique of 6 vertices

26/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality  Two k-cliques are adjacent if they share k-1 vertices  A community is a component in the k-clique graph Original graphClique graph (k=4)

27/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality

28/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality It’s ok if a few edges are missing in the clique

29/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality If two cliques share at least vertices, they are adjacent.

30/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Model  Community structure awareness  Overlapping awareness  Generality Original graph

31/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science k=4

32/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Alpha-gamma ocs k=3

33/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Parameter selection

34/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Introduction  Model  Algorithm  Experiments  Applications

35/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Algorithm  Exact algorithm  Approximate algorithm

36/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Exact Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob

37/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Exact Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Drawback exponential enumerations

38/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Approximate Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Approximate the new clique contains at least one new vertex

39/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Approximate Algorithm  Example k=4, (3,1)-OCS Query vertex = Bob  Approximate the new clique contains at least one new vertex

40/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Introduction  Model  Algorithm  Experiments  Applications

41/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Experiments  Setup  Dataset  Intel Core2 2.13GHz  4GB memory  64 bit windows 7

42/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Experiments  Setup  Dataset Dataset|V||E| WordNet DBLP Google Livejournal

43/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness  It successfully unveils multiple research interests  Example Jiawei Han K=6 Jiawei Han C1: multimedia data mining C2: stream data mining C3: information network

44/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness  Our model is flexible to support different parameters.  Example Jiawei Han K=9 Jiawei Han

45/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Effectiveness  For most vertices, OCS model can find non-trivial results.

46/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Performance  OCS is more efficient than OCD.  Competitors: LA OSLOM  Amortized time (Total time of OCD)/n

47/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Performance: influence of parameters

48/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Accuracy of approximate algorithm  More than 70% accuracy can be consistently achieved, in some cases almost 90% accuracy can be achieved

49/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Outline  Introduction  Model  Algorithm  Experiments  Applications

50/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Diversity-based Social Network Analysis  What is the distribution of diversity?  Can we find people with really large diversity?

51/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Name disambiguation  Ambiguous names with a significant number of entities also have a large number of communities.  Real person’s communities is smaller than these ambiguous names.

52/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Contributions  Problem definition  Model  Guide for parameter selection  Algorithms  Extensive experiments and applications

53/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science Q&A Thank you!