Yinghui Wu, ICDE 2011 1 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation.
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
1 Networking through Linux Partha Sarathi Dasgupta MIS Group Indian Institute of Management Calcutta.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Novel Self-Configurable Positioning Technique for Multihop Wireless Networks Authors : Hongyi Wu Chong Wang Nian-Feng Tzeng IEEE/ACM TRANSACTIONS ON NETWORKING,
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Transparency No. 4-1 Formal Language and Automata Theory Chapter 4 Patterns, Regular Expressions and Finite Automata (include lecture 7,8,9) Transparency.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Graph Indexing From managing and mining graph data.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Construction of Optimal Data Aggregation Trees for Wireless Sensor Networks Deying Li, Jiannong Cao, Ming Liu, and Yuan Zheng Computer Communications and.
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Outline Introduction State-of-the-art solutions
Answering pattern queries using views
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
A Study of Group-Tree Matching in Large Scale Group Communications
CPT-S 415 Big Data Yinghui Wu EME B45 1.
RE-Tree: An Efficient Index Structure for Regular Expressions
Computing Full Disjunctions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Design of Declarative Graph Query Languages: On the Choice between Value, Pattern and Object based Representations for Graphs Hasan Jamil Department of.
Simulation based approach Shang Zechao
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
G-CORE: A Core for Future Graph Query Languages
Efficient Subgraph Similarity All-Matching
CSE 589 Applied Algorithms Spring 1999
Approximate Graph Mining with Label Costs
Presentation transcript:

Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh Jianzhong Li Harbin Institute of Technology

Yinghui Wu, ICDE 2011 Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu University of Edinburgh Harbin Institute of Technology Real-life networks are huge and complex. Traditional function-based querying model – capable enough? Reachability Queries and Graph Pattern Queries: novel query model and method for querying large, complex networks “ Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden) Terrorist Collaboration Network

Yinghui Wu, ICDE 2011 Outline Real-life graphs bear multiple edge types traditional models and methods may not be capable enough Reachability Queries and Graph Pattern Queries nodes carrying predicates edges carrying regular expressions Fundamental problems query containment and equivalence query minimization Query evaluation Join-based and Split-based algorithms Conclusion 3 A first step towards revising simulation for graph pattern matching

Yinghui Wu, ICDE 2011 Graph Pattern Matching: the problem Given a pattern graph (a query) P and a data graph G, decide whether G matches P, and if so, find all the matches of P in G. Applications social queries, social matching biology and chemistry network querying key work search, proximity search, … 4 Widely employed in a variety of emerging real life applications How to define?

Yinghui Wu, ICDE 2011 Subgraph isomorphism and Graph Simulation Node label equivalence Edge-to-edge function/relation 5 Identical label matching, edge-to-edge function/relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P P A B DEED BB A G v1v1v1v1 v2v2v2v2

Yinghui Wu, ICDE 2011 Considering edge types… 6 Real life graphs have multiple edge types Essembly: a social voting network friends-allies friends-nemeses strangers-nemeses strangers-allies Biologist Businessman Doctors Alice the journalist

Yinghui Wu, ICDE 2011 Querying Essembly network: an example 7 Pattern queries with multiple edge types Essembly Network Biologists supporting cloning fa <=2 sn Alice Doctors against cloning fa <=2 sa <=2 fn Pattern fa+ friends-allies friends-nemeses strangers-nemeses strangers-allies …

Yinghui Wu, ICDE 2011 Graph reachability and pattern queries Real life graphs usually bear different edge types… data graph G = (V, E, f A, f C ) Reachability query (RQ) : (u 1, u 2, f u1, f u2, f e ) where f e is a subclass of regular expression of:  F ::= c | c ≤k | c + | FF Q r (G): set of node pairs (v 1, v 2 ) that there is a nonempty path from v 1 to v 2, and the edge colors on the path match the pattern specified by f e. 8 Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn

Yinghui Wu, ICDE 2011 Graph pattern queries 9  graph pattern queries PQ Q p =(V p, E p, f v, f e ) where for each edge e=(u,u’), Q e =(u 1, u 2, f v (u), f v (u’), f e (e)) is an RQ.  Q p (G) is the maximum set (e, S e ) (unique!) for any e 1 (u 1,u 2 ) and e 2 (u 2,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 2,v 3 ) is in S e2. for any two edges e 1 (u 1,u 2 ) and e 2 (u 1,u 3 ), if (v 1,v 2 ) is in S e1, then there is a v 3 that (v 1,v 3 ) is in S e2  PQ vs. simulation  search condition on query nodes  mapping edges to paths  constrain the edges on the path with a regular expression RQ and simulation are special cases of PQ Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+

Yinghui Wu, ICDE 2011 Reachability and graph pattern query: examples 10 fa fn sn sa Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ fa <=2 fn fa fn fa fa fn fa fn fa fa fn Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ fa fa fa fa sa fn fa sn

Yinghui Wu, ICDE 2011 Fundamental problems: query containment  PQ Q 1 (V 1, E 1, f v1, f e1 ) is contained in Q 2 (V 2, E 2, f v2, f e2 ) if there exists a mapping λ from E 1 to E 2 s.t for any data graph G and e in E 1, S e is a subset of S λ(e), i.e., λ is a renaming function that Q 1 (G) is mapped to Q 2 (G).  Query containment and equivalence problems can all be determined in cubic time Query similarity based on a revision of graph simulation Determine the query similarity in cubic time 11 Query containment and equivalence for PQs can be solved efficiently

Yinghui Wu, ICDE 2011 Query containment: example 12 B1B1 C1C1 Q1Q1 C3C3 C2C2 h <=1 h <=2 h <=3 B2B2 Q2Q2 C4C4 h <=1 B3B3 C5C5 Q3Q3 C6C6 h <=3 Q 2 is contained in Q 1 and Q 3 Q 1 and Q 3 are equivalent

Yinghui Wu, ICDE 2011 Fundamental problems: query minimization size of a query: |Vp| + |Ep| Query minimization problem input: a PQ Q p output: a minimized PQ Q m equivalent to Q p Query minimization problem can be solved in cubic time in the size of the query: compute the maximum node equivalent classes based on a revision of graph simulation; determine the number of redundant nodes and edges based on the equivalent classes; remove redundant and isolated nodes and edges 13 Query minimization for PQs can be solved efficiently

Yinghui Wu, ICDE 2011 query minimization: example 14 R B Q1Q1 B C f h <=2 g <=3 g CCC h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 g <=3 R B B f g CC h <=2 g <=3 h <=2 Q2Q2 Q3Q3

Yinghui Wu, ICDE 2011 Evaluating graph pattern queries 15 PQ can be answered in cubic time. Join-based Algorithm JoinMatch  Matrix index vs distance cache  join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order) Split-based Algorithm SplitMatch  blocks: treating pattern node and data node uniformly  partition-relation pair Graph pattern matching can be solved in polynomial time

Yinghui Wu, ICDE 2011 Example of JoinMatch 16 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 1: identify the candidates for each query node

Yinghui Wu, ICDE 2011 Example of JoinMatch 17 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

Yinghui Wu, ICDE 2011 Example of JoinMatch 18 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

Yinghui Wu, ICDE 2011 Example of JoinMatch 19 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 2: filter the candidate sets for each query edge

Yinghui Wu, ICDE 2011 Example of JoinMatch 20 fa fn sn sa Id=‘Alice’ Job=‘biologist’, sp=‘cloning’ Job=‘doctors’ dsp=‘cloning’ fa<=2 sa<=2 fn fa<=2 sn fa+ Step 3: return the final result

Yinghui Wu, ICDE 2011 Experimental results – effectiveness of PQs 21 Effectiveness of PQs: edge to path relations

Yinghui Wu, ICDE 2011 Experimental results – querying real life graphs 22 Evaluation algorithms are sensitive to pattern edges Varying |Vp|Varying |Ep| Size of query in average (8,15,3,4,5) for (|V|,|E|,|pred|,|c|,|b|)

Yinghui Wu, ICDE 2011 Experimental results – querying real life graphs 23 The algorithms are sensitive to the number of predicates Varying |pred|Varying b

Yinghui Wu, ICDE 2011 Experimental results – querying synthetic graphs 24 The algorithms scale well over large synthetic graphs Varying |V| (x10 5 ) Varying b

Yinghui Wu, ICDE 2011 Experimental results – querying synthetic graphs 25 The algorithms scale well over large synthetic graphs Varying αVarying cr E=V α |sim(u)|<=V*cr

Yinghui Wu, ICDE 2011 Conclusion Simulation revised for graph pattern matching Reachability Queries and Graph Pattern Queries  query containment and minimization – cubic time  query evaluation – cubic time Future work extending RQs and PQs by supporting general regular expressions incremental evaluation of RQs and PQs 26 Simulation revised for graph pattern matching

Yinghui Wu, ICDE “ Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden) Terrorist Collaboration Network ( ) Thank you! Q&A