gApprox: Mining Frequent Approximate Patterns from a Massive Network

Slides:



Advertisements
Similar presentations
Chapter 4 Partition I. Covering and Dominating.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
More Efficient Generation of Plane Triangulations Shin-ichi Nakano Takeaki Uno Gunma University National Institute of JAPAN Informatics, JAPAN 23/Sep/2003.
Putting genetic interactions in context through a global modular decomposition Jamal.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
A scalable multilevel algorithm for community structure detection
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
What is the next line of the proof? a). Assume the theorem holds for all graphs with k edges. b). Let G be a graph with k edges. c). Assume the theorem.
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.
The Maximum Independent Set Problem Sarah Bleiler DIMACS REU 2005 Advisor: Dr. Vadim Lozin, RUTCOR.
Graph Coloring.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Fixed Parameter Complexity Algorithms and Networks.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Representing and Using Graphs
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Graph Coloring. Introduction When a map is colored, two regions with a common border are customarily assigned different colors. We want to use a small.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
P-Rank: A Comprehensive Structural Similarity Measure over Information Networks CIKM’ 09 November 3 rd, 2009, Hong Kong Peixiang Zhao, Jiawei Han, Yizhou.
2-1 Sample Spaces and Events Random Experiments Figure 2-1 Continuous iteration between model and physical system.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Chapter 6: Graphs 6.1 Euler Circuits
Chapter 4 Partition (1) Shifting Ding-Zhu Du. Disk Covering Given a set of n points in the Euclidean plane, find the minimum number of unit disks to cover.
M Clements Formal Network Theory. Introduction Practical problem – The Seven Bridges of Königsberg Network graphs Nodes & edges Degrees Rules/ axioms.
Overlapping Community Detection in Networks
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Graphing Quadratic Functions using Transformational Form The Transformational Form of the Quadratic Equations is:
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Hongyu Liang Institute for Theoretical Computer Science Tsinghua University, Beijing, China The Algorithmic Complexity.
Mining Closed Relational Graphs with Connectivity Constraints Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05 ’ 報告者:蔡明瑾 2005/12/09.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Graph Coloring.
Mining in Graphs and Complex Structures
DATA MINING © Prentice Hall.
Polynomial integrality gaps for
Network Motif Discovery using Subgraph Enumeration and Symmetry-Breaking by Grochow and Kellis Wooyoung Kim 4/3/2009 CSc 8910 Analysis of Biological Network,
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
June 2017 High Density Clusters.
NetMine: Mining Tools for Large Graphs
Graph Database Mining and Its Applications
Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs
Discovering Larger Network Motifs
On the effect of randomness on planted 3-coloring models
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Efficient Subgraph Similarity All-Matching
FP-Growth Wenlong Zhang.
Graph Coloring.
SEG5010 Presentation Zhou Lanjun.
Resource Allocation in a Middleware for Streaming Data
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Ch09 _2 Approximation algorithm
Approximate Graph Mining with Label Costs
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
Presentation transcript:

gApprox: Mining Frequent Approximate Patterns from a Massive Network Chen Cheny, Xifeng Yanz, Feida Zhuy, Jiawei Han [ICDM 2007] reporter: Che-Wei, Liang 10/16 1

Outline Introduction Problem Formulation Algorithm Experiment Pattern Space Exploration Support Counting Experiment Conclusions 2

Introduction A set of graphs vs. a single network Recently, a large number of graphs with massive sizes and complex structures in many applications. Biological networks, social networks, Web. demanding powerful data mining methods. Now interested in patterns that frequently appear at many different places of a single network. 3

Introduction Protein-Protein Interaction (PPI) network △= degree of approximation = 5 4

Two major complications 1. Mining frequent patterns in a single network Partition it into regions Each contains one occurrence of the pattern 2. Due to various inherent noise or data diversity, it is crucial to account for approximations so that all potentially interesting patterns can be captured. 5

Outline Introduction Problem Formulation Algorithm Experiment Pattern Space Exploration Support Counting Experiment Conclusion 6

Problem Formulation 7

Approximate Pattern Occurrences Injective function m: Vp → VG mapping each vertex v Vp to m(v) VG Quantify the degree of approximation m incurs i.e., approximations can only happen within the matchable list. 8

Approximate Pattern Occurrences 9

Approximate Pattern Occurrences 10

Approximate Pattern Occurrences 11

Pattern Support with Approximation 12

Pattern Support with Approximation 13

Pattern Support with Approximation 14

Outline Introduction Problem Formulation Algorithm Experiment Pattern Space Exploration Support Counting Experiment Conclusion 15

Algorithm Two major issues: 1. Pattern Space Exploration 2. Support Counting Enumerate approximate occurrences of each pattern in the network. Decide the maximal number of disjoint occurrences. 16

Pattern Space Exploration Decompose pattern space Find all connected vertex sets in G that contain 1. Remove 1 from G, and find all connected vertex sets in the new graph G’ that contain 2. And so on so forth … 17

Pattern Space Exploration Example: Generating all connected vertex sets starting from 1. Stage1. Start from 1 and mark 1. Stage2. Expand from 1 to reach 2, 5, 6. Mark 2, 5, 6. There are totally seven connected vertex sets in this stage. {1,2}, {1,5}, {1,6}, {1,2,5}, {1,2,6}, {1,5,6}, {1,2,5,6} Stage3. Taking each of the seven connected vertex sets in stage 2 as a starting point, continue expansion. Stage4. Until there are no more unmarked vertices. 18

19

20

21

Theorem 1 Explore() in Algorithm 1 is both complete and redundancy-free, i.e., given a network G (1) it only generates connected vertex sets in G. (2) it can generate all connected vertex sets in G. (3) it does not generate the same connected vertex set more than once. 22

Support Counting A pattern P’s support is defined to be the maximal number of “disjoint” ones that can be chosen from P’s approximate occurrences in the network. — NP-Complete maximal independent set. Use algorithm 2 can provide an upperbound. 23

Support Counting 24

gApprox gApprox Combine with pattern space exploration and support counting. Conditional branch on the 3rd line of Algorithm 1’s DFS_horizontal() function. 25

Experiment 26

Conclusions Give an approximation measure and show its impact on mining. count a pattern’s support based on its approximate occurrences in the network. The techniques is general can be applied to networks from other domains. Can be modified to reach bigger, more interesting patterns even faster with some sacrifice on the completeness of mining results. 27