Download presentation
Presentation is loading. Please wait.
Published byMckenna Heighton Modified over 9 years ago
1
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.orghttp://www.mmds.org
2
2 Nodes: Football Teams Edges: Games played Can we identify node groups? (communities, modules, clusters) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
3
3 NCAA conferences Nodes: Football Teams Edges: Games played J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
4
4 Can we identify functional modules? Nodes: Proteins Edges: Physical interactions J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
5
5 Functional modules Nodes: Proteins Edges: Physical interactions J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
6
6 Can we identify social communities? Nodes: Facebook Users Edges: Friendships J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
7
7 High school Summer internship Stanford (Squash) Stanford (Basketball) Social communities Nodes: Facebook Users Edges: Friendships J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
8
Non-overlapping vs. overlapping communities J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org8
9
9 Network Adjacency matrix Nodes J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
10
What is the structure of community overlaps: Edge density in the overlaps is higher! 10 Communities as “tiles” J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
11
11 This is what we want! Communities in a network J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
12
1) Given a model, we generate the network: 2) Given a network, find the “best” model J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org12 C A B D E H F G C A B D E H F G Generative model for networks
13
Goal: Define a model that can generate networks The model will have a set of “parameters” that we will later want to estimate (and detect communities) Q: Given a set of nodes, how do communities “generate” edges of the network? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org13 C A B D E H F G Generative model for networks
14
Generative model B(V, C, M, { p c }) for graphs: Nodes V, Communities C, Memberships M Each community c has a single probability p c Later we fit the model to networks to detect communities 14 Model Network Communities, C Nodes, V Model pApA pBpB Memberships, M J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
15
15 Model Network Communities, C Nodes, V Community Affiliations pApA pBpB Memberships, M Think of this as an “OR” function: If at least 1 community says “YES” we create an edge J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
16
16 Model Network J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
17
AGM can express a variety of community structures: Non-overlapping, Overlapping, Nested 17J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
19
Detecting communities with AGM: 19 C A B D E H F G 1) Affiliation graph M 2) Number of communities C 3) Parameters p c J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
20
20
21
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org21
22
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org22 00.10 0.04 0.1000.020.06 0.100.0200.06 0.040.06 0 0100 1011 0101 0110 Flip biased coins
23
Given graph G(V,E) and Θ, we calculate likelihood that Θ generated G: P(G|Θ) 00.9 0 0 0 0 00 0 Θ= B(V, C, M, {p c }) 0110 1010 1101 0010 G P(G|Θ ) G 23 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org AB
24
24 P(|) AGM arg max
25
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org25
26
26 u v J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
27
27 j Communities Nodes
28
28 Node community membership strengths J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 01.200.2 0.5000.8 01.810
29
29J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
30
30J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
31
31J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
32
BigCLAM takes 5 minutes for 300k node nets Other methods take 10 days Can process networks with 100M edges! 32J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
34
34J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
35
Extension: Make community membership edges directed! Outgoing membership: Nodes “sends” edges Incoming membership: Node “receives” edges 35J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
36
36
37
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org37
38
38J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
39
Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach by J. Yang, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2013. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks by J. Yang, J. McAuley, J. Leskovec. ACM International Conference on Web Search and Data Mining (WSDM), 2014. Detecting Cohesive and 2-mode Communities in Directed and Undirected Networks Community Detection in Networks with Node Attributes by J. Yang, J. McAuley, J. Leskovec. IEEE International Conference On Data Mining (ICDM), 2013. Community Detection in Networks with Node Attributes J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org39
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.