New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Problems and Their Classes
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline.
JAYASRI JETTI CHINMAYA KRISHNA SURYADEVARA
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
The Theory of NP-Completeness
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
Complexity of Bellman-Ford Theorem. The message complexity of Bellman-Ford algorithm is exponential. Proof outline. Consider a topology with an even number.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
May 1, 2002Applied Discrete Mathematics Week 13: Graphs and Trees 1News CSEMS Scholarships for CS and Math students (US citizens only) $3,125 per year.
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Stochastic Multicast with Network Coding Ajay Gopinathan, Zongpeng Li Department of Computer Science University of Calgary ICDCS 2009, June , Montreal.
Complexity of Bellman-Ford
Shuai Ma Graph Search & Social Networks. 2 Graphs are everywhere, and quite a few are huge graphs!
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Algorithmic Game Theory and Internet Computing Vijay V. Vazirani Georgia Tech Primal-Dual Algorithms for Rational Convex Programs II: Dealing with Infeasibility.
Week 1 – Introduction to Graph Theory I Dr. Anthony Bonato Ryerson University AM8002 Fall 2014.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Binary Decision Diagrams Prof. Shobha Vasudevan ECE, UIUC ECE 462.
Construction We constructed the following graph: This graph has several nice properties: Diameter Two Graph Pebbling Tim Lewis 1, Dan Simpson 1, Sam Taggart.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Answering pattern queries using views
COMPLEXITY THEORY IN PRACTICE
Michael Langberg: Open University of Israel
CPT-S 415 Big Data Yinghui Wu EME B45 1.
RE-Tree: An Efficient Index Structure for Regular Expressions
Probabilistic Data Management
Effective Social Network Quarantine with Minimal Isolation Costs
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Finding Fastest Paths on A Road Network with Speed Patterns
Internet of Things A Process Calculus Approach
Simulation based approach Shang Zechao
Discrete Mathematics and its Applications Lecture 1 – Graph Theory
MCN: A New Semantics Towards Effective XML Keyword Search
Graph Homomorphism Revisited for Graph Matching
Switching Lemmas and Proof Complexity
Approximate Graph Mining with Label Costs
Complexity Theory: Foundations
Presentation transcript:

New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )

Food Web: Predator-Prey Interactions

Social Networks: Relationships Real-life graph data processing is challenging!

Outline  Graph pattern matching  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation

Graph Pattern Matching  Given two graphs G1 (pattern graph) and G2 (data graph),  decide whether G1 matches G2 (Boolean queries)  identify “subgraphs” of G2 that match G1  Applications  Web mirror detection/ Web site classification  Complex object identification  Software plagiarism detection  Social network/biology analyses  …  Challenges  Identifying matching models (matching semantics)  Balance between complexity and expressive power A variety of emerging real -life applications!

Outline  Graph pattern matching  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation

Traditional Subgraph Isomorphism  Pattern graph Q(V Q, E Q ), subgraph Gs(V S, E S ) of data graph G  Q matches Gs if there exists a bijective function f: V Q → V S satisfying  for each node u in Q, u and f(u) have the same label; and  an edge (u, u‘) in Q iff (f(u), f(u')) is an edge in Gs  Goodness  Keep structure topology between Q and Gs  Badness  May return exponential number of matched subgraphs  Decision problem: NP-complete - low efficiency  In emerging applications, too restrictive to find sensible matches New matching models are needed in practice!

P-Homomorphism A.Home B.Index books textbooks audio abooks albums sports digital categories arts school audio books booksets DVDs CDs features genres albums G1G1 Subgraph isomorphism/graph homomorphism is too restrictive! G2G2 Edge-to-path mappings

P-Homomorphism  A new matching model referred to as P-homomorphism  Label matching is enforced  Edges are allowed to be mapped to nonempty paths  Complexity bounds of decision and optimization problems  NP-hardness  Approximation hardness  Approximation algorithms with performance guarantees  Publication on P-homomorphism (alphabetic order)  Wenfei Fan, Jianzhong Li, Shuai Ma, Hongzhi Wang, and Yinghui Wu, Graph Homomorphism Revisited for Graph Matching, VLDB 2010 A first step towards revising conventional notions of graph matching

Outline  Graph pattern matching  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation

Traditional Graph Simulation  Pattern graph Q(V Q, E Q ) matches data graph G(V, E), via graph simulation, if there exists a binary relation S ⊆ V Q ╳ V such that  for each (u, v) ∈ S, u and v have the same label; and  for each node u in Q, there exists v in G such that  (u, v) ∈ S, and for each edge (u, u‘) in Q, there exists an edge (v, v‘) in G such that (u',v') ∈ S  Goodness  Quadratic time solvable  Badness  Lose structure topology (however there are applications that do not need strong restrictions) Graph simulation is in PTIME!

Traditional Graph Simulation Set up a team to develop a new software product Subgraph isomorphism is too strict for emerging applications

Terrorist Collaboration Network “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Osama Bin Laden, 2001)

Bounded Graph Simulation 3 FW AM B S S B A1 Am/S W W W W W W W W 3 1 Drug trafficking: Pattern and Data Graphs Identify all suspects in the drug ring Subgraph isomorphism is too strict for emerging applications

A departure from traditional graph simulation Bounded Graph Simulation  G=(V, E) matches P=(V p, E p ) via bounded simulation, if there exists a binary relationS ⊆ V p × V such that:  for each u ∈ V p, there exists v ∈ V such that (u,v) ∈ S  for each (u,v) ∈ S, the attributes f A (v) satisfies the predicate f v (u)  each (u,u’) in E p is mapped to a bounded path from v to v’ in G, (u’,v’) ∈ S  Graph simulation  A special case of bounded graph simulation

Bounded Graph Simulation  A new matching model referred to as bounded simulation  A cubic-time algorithm for bounded simulation  Incremental algorithms with performance guarantees  Analyses of incremental complexity  Publication on bounded simulation (alphabetic order)  Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu, Graph Pattern Matching: From Intractable to Polynomial Time, VLDB 2010 A second step towards revising conventional notions of graph matching: from intractable to PTIME

Outline  Graph pattern matching  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation

Graph Pattern Queries  A further extension of graph simulation, by  allowing edge types;  enforcing node matching conditions;  mapping edges to paths specified with regular expressions;  changing node mapping to edge matching.  Reachability queries and bounded simulation are special cases of graph pattern queries Further extensions of graph simulation, but remains in PTIME

Graph Pattern Queries  A new matching model referred to as graph pattern queries  Fundamental problems  Query containment, query equivalence, query minimization  All are solvable in cubic time  Two cubic time algorithms for graph pattern queries  Publication on graph pattern queries (alphabetic order)  Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu, Adding Regular Expressions to Graph Reachability and Pattern Queries, ICDE 2011 A third step towards revising conventional notions of graph matching

Outline  Graph pattern matching  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation

Strong Simulation  Subgraph isomorphism  Goodness  Keep (strong) structure topology  Badness  May return exponential number of matched subgraphs  Decision problem: NP-complete  In certain scenarios, too restrictive to find sensible matches  Graph simulation  Goodness  Solvable in quadratic time  Badness  Lose structure topology (how much? open question)  Only return a single matched subgraph Balance between complexity and the capability to capturing topology!

Strong Simulation  Graph simulation loses graph structures Disconnected Tree Long cycle

Strong Simulation  Duality (dual simulation)  Both child and parent relationships  Simulation considers only child relationships  Locality  Restricting matches within a ball  When social distance increases, the closeness of relationships decreases and the relationships may become irrelevant  The semantics of strong simulation is well defined  The results are unique Strong simulation: bring duality and locality into graph simulation

Strong Simulation Topology preservation and bounded matches Subgraph Isomorphism Strong Simulation Dual Simulation Graph Simulation

Strong Simulation  A new matching model referred to as strong simulation  A cubic time algorithm  Three main optimization techniques  Query minimization  An O(n 2 ) algorithm  Dual simulation filtering  First compute the match graph of dual simulation, then project on each ball of the data graph  Connectivity pruning  Based on the connectivity theorem  A distributed algorithm  Data locality property  Boundary nodes and radius  Publication on strong simulation (alphabetic order)  Yang Cao Wenfei Fan, Jinpeng Huai, Shuai Ma, and Tianyu Wo, Capturing Topology in Graph Pattern Matching. VLDB 2012 A fourth step towards revising conventional notions of graph matching

Summary  Weakness of traditional matching models  Subgraph isomorphism  Graph simulation  New matching models for emerging applications  P-homomorphism  Bounded graph simulation  Graph pattern queries  Strong simulation  Well-balanced between complexity and expressive power  Future work  More to be done … New models that capture the need of emerging applications!

Questions? OR Homepage: