Yinghui Wu LFCS Lab Lunch 2010.8.17 Homomorphism and Simulation Revised for Graph Matching.

Slides:



Advertisements
Similar presentations
Approximation algorithms for geometric intersection graphs.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Combinatorial Algorithms
Zheng Zhang, Exact Matchingslide 1 Exact (Graph) Matching Zheng Zhang e Presentation for VO Structural Pattern Recognition.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
February 23, 2015CS21 Lecture 201 CS21 Decidability and Tractability Lecture 20 February 23, 2015.
16:36MCS - WG20041 On the Maximum Cardinality Search Lower Bound for Treewidth Hans Bodlaender Utrecht University Arie Koster ZIB Berlin.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Computability and Complexity 15-1 Computability and Complexity Andrei Bulatov NP-Completeness.
HCS Clustering Algorithm
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Analysis of Algorithms CS 477/677
Data Flow Analysis Compiler Design Nov. 8, 2005.
Algorithms for Maximum Induced Matching Problem Somsubhra Sharangi Fall 2008 CMPT 881.
9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Approximation Algorithms
Theory of Computing Lecture 19 MAS 714 Hartmut Klauck.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Approximation Algorithms
CSE 024: Design & Analysis of Algorithms Chapter 9: NP Completeness Sedgewick Chp:40 David Luebke’s Course Notes / University of Virginia, Computer Science.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Data Structures & Algorithms Graphs
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Unit 9: Coping with NP-Completeness
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
NP-Complete problems.
Pipelining and Retiming
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Given this 3-SAT problem: (x1 or x2 or x3) AND (¬x1 or ¬x2 or ¬x2) AND (¬x3 or ¬x1 or x2) 1. Draw the graph that you would use if you want to solve this.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
Approximation algorithms
Cohesive Subgraph Computation over Large Graphs
Outline Introduction State-of-the-art solutions
New Characterizations in Turnstile Streams with Applications
Polynomial-Time Reduction
CPT-S 415 Big Data Yinghui Wu EME B45 1.
CS154, Lecture 16: More NP-Complete Problems; PCPs
Simulation based approach Shang Zechao
Subtree Isomorphism in O(n2.5)
Chapter 8 NP and Computational Intractability
Efficient Subgraph Similarity All-Matching
Graph Homomorphism Revisited for Graph Matching
CS154, Lecture 16: More NP-Complete Problems; PCPs
Presentation transcript:

Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching

Outline  Graph Matching Problem  State of Art  Homomorphism Revised  Bounded Simulation  Graph Queries  Conclusion

Real life graphs  Real life graphs everywhere… Web graph, social graph, food web…

Graph Matching in Real life graphs  Application Web mirror, schema matching, information retrieval, pattern recognition, plagiarism detection, social pattern, key work search, proximity search, web service composition…  Graph matching problem Input: two graphs, a similarity metric Output: matching relation

Graph Matching in Real life graphs  “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)  Very long mean path length of 4.75 for a network less than 20 nodes.  Relation type: bank, business, telephone, real estate, vehicle sale, school, kinship…

Graph matching: state of art  Structural-based Graph homomorphism Subgraph isomorphism/Maximum common subgraph Edit distance Graph simulation  Not capable for capturing graph similarity in real life applications

Outline  Graph Matching Problem  State of Art  Homomorphism Revised  Bounded Simulation  Graph Queries  Conclusion

Graph Homomorphism Revisited  Graph homomorphism A graph homomorphism (resp. subgraph isomorphism) f from a graph G = (V,E) to a graph G' = (V',E'), is a mapping (resp. 1-1 mapping) from V to V' such that (u,v) in E implies (f(u),f(v)) in E’. The maximum common subgraph isomorphism is to find the largest subgraph of G isomorphic to a subgraph of G’.

Website Matching: Example A.index B.index booksaudio textbookabookalbum bookssportsdigital categorie artsschoolbooksaudiobooks booksetDVDCD featuresgenres albums

Website Matching: Example (cont.) A.index B.index booksaudio textbookabookalbum bookssportsdigital categorie artsschoolbooksaudiobooks booksetDVDCD featuresgenres albums

A.index B.index booksaudio textbookabookalbum bookssportsdigital categorie artsschoolbooksaudiobooks booksetDVDCD featuresgenres albums

Homomorphism revised: a first step  Notations G = (V, E, L), labeled directed graph Similarity matrix M over V 1 and V 2, a matrix of size |V 1 ||V 2 |, with M(u,v) the similarity score of node u and v. Similarity threshold ξ

P-homomorphism  G 1 is P-homomorphism to G 2 w.r.t a similarity matrix M and threshold ξ, denoted by G 1 ≤ (e,p) G 2, if there exists a mapping ρ from V 1 to V 2 such that for each v ∈ V 1, if ρ(v)=u, then M(u,v) ≥ ξ; and for each (v,v’) in E 1, there is a nonempty path u/…/u’ in G 2 s.t. ρ(v’)=u’.  Graph homomorphism is a special case of P-homomorphism

1-1 P-homomorphism  G 1 is 1-1 P-homomorphism to G 2 denoted by G 1 ≤ 1-1 (e,p) G 2, if there exists a 1-1 (injective) P-hom mapping ρ from V 1 to V 2, i.e., for any distinct nods v 1, v 2 in G 1, ρ(v 1 ) ≠ ρ(v 2 ).  Subgraph isomorphism is a special case of 1-1 P-homomorphism.

Measuring graph similarity  Let ρ be a P-hom mapping from a subgraph G 1 ’= (V 1 ’,E 1 ’,L 1 ’) of G 1 to G 2.  Maximum cardinality: Card(ρ) = |V 1 ’|/|V| Maximum cardinality problem CPH (resp. CPH 1-1 ): find P-hom (resp. 1-1 P-hom) ρ having the maximum Card(ρ). Maximum Common Subgraph(MCS) is a special case of CPH 1-1  Overall similarity: Sim(ρ) = ∑(w(v) * M(v, ρ(v)) / ∑w(v) Maximum overall similarity SPH (resp. CPH 1-1 ): find P- hom (resp. 1-1 P-hom) ρ having the maximum Sim(ρ).

Complexity results  Intractability P-Hom and 1-1 P-Hom are NP-complete. ○ reduction from 3SAT CPH, CPH 1-1, SPH, SPH 1-1 are NP-hard. ○ reduction from X3C  Approximation hardness Unless P=NP, CPH, CPH 1-1, SPH, SPH 1-1 are not approximable within O(1/n 1-ε ) for any constant ε, with n the node number of input graphs. approximation factor preserving reduction (AFP- reduction) from maximum weighted independent set problem

Approximation Algorithms  Approximation ratio CPH, CPH 1-1, SPH, SPH 1-1 are all approximable within O(log 2 (|V 1 ||V 2 |)/ (|V 1 ||V 2 |))  Proof: AFP-reduction to WIS.  greedy based approximation algorithm: O (|V 1 | 3 |V 2 | 2 +|V 1 ||E 1 ||V 2 | 3 )

Approximation Algorithm for CPH  Algorithm compMaxCard(G 1,G 2,M, ξ) Initialize matching list for each node in G 1 Start from a match pair, recursively chooses and include new matches to the match set until it can no longer be extended, via a greedy strategy. Intuitively, compMaxCard approximately finds the maximum clique in a revised product graph of G 1 and the transitive closure of G 2 without constructing it directly.

Running example A.index B.index books audio textbookabook album books sportsdigital categorie arts schoolbooksaudiobooks bookset DVDCD featuresgenres albums

Running example(cont) A.index B.index books audio textbookabook album books sportsdigital categorie arts schoolbooksaudiobooks DVDCD featuresgenres albums bookset

A.index B.index books audio abook album books sportsdigital categorie arts audiobooks bookset DVDCD featuresgenres albums textbook schoolbooks

A.index B.index books audio album books sportsdigital categorie arts bookset DVDCD featuresgenres albums textbook schoolbooks abook audiobooks

Experiment Results

Outline  Graph Matching Problem  State of Art  Homomorphism Revised  Bounded Simulation  Conclusion

Graph pattern matching: Example AI CS Bio DB Soc Med Gen Chem SocEco * 3 * Collaboration Network Pattern Matching

Graph pattern matching: Example CS Bio DB Soc Med Gen SocEco * 3 * Collaboration Network Pattern Matching AI Chem

Graph Pattern Matching  pattern graph P = (V p, E p, f v, f e ) f v = (A op a) f e : interger k or  data graph G = (V, E, f A ) f A : assigns attribute/value list to each node in data graph ‘*’‘*’

Simulation revised  Bounded Simulation data graph G = (V, E, f A ) matches the pattern P = (V p, E p, f v, f e ), denoted by P G, if there exists a binary relation S from V p to V such that for each (u, v) ∈ S, ○ f A (v) satisfies f v (u), ○ for each (u,u’) in E p, there is a nonempty path ρ = v/…/v’ in G s.t. (u’,v’) ∈ S, and len(ρ) ≤ k if f e (u,u’) = k ▽

Maximum match  For any graph G and pattern P, if P G, then there is a unique maximum match in G for P. ▽

Result Graph CS Bio DB SocMed GenSocEco * 3 * Collaboration network: Result graph

Computing Bounded Simulation  The graph pattern matching problem: given any data graph G and pattern graph P, find the maximum match in G for P if P G.  The graph pattern matching problem can be solved in cubic time. ▽

Computing Bounded Simulation  Algorithm Match (P,G) compute the distance matrix M of G Initialize candidate matches for each pattern node u Iteratively refine the candidate set of u according to each edge (v,u) in P until a fixpoint is reached, in a bottom up way collect the matching result  Match (P,G) runs in O(|V||E| + |E p ||V| 2 + |V p ||V|)

Running example CS Bio DB Soc Med Gen SocEco * 3 * Step 1: Initialize candidate sets for each pattern node AI Chem

Running example (cont.) CS Bio DB Soc Med Gen SocEco * 3 * Step 2: for each edge (u,v) in P, refine candidate set of u w.r.t v, fe(u,v) and candidates of v AI Chem

Running example (cont.) Step 2: for each edge (u,v) in P, refine candidate set of u w.r.t v, fe(u,v) and candidates of v CS Bio DB Soc Med Gen SocEco * 3 * Chem AI

Running example (cont.) Step 2: for each edge (u,v) in P, refine candidate set of u w.r.t v, fe(u,v) and candidates of v CS Bio DB Soc Med Gen SocEco * 3 * AI Chem

Running example (cont.) Step 2: for each edge (u,v) in P, refine candidate set of u w.r.t v, fe(u,v) and candidates of v CS Bio DB Soc Med Gen SocEco * 3 * AI Chem

Running example (cont.) CS Bio DB Soc Med Gen SocEco * 3 * AI Chem Step 3: result collection

Experiment Results

Experiment Results (cont.)

Conclusion  Traditional homomorphism and simulation based graph matching is not capable for capturing real life graph similarity  (1-1) P-homomorphism, edge to path matching, provable guarantees on match quality;  Bounded simulation, specifying bounded connectivity, PTIME

Thank you !