Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.

Slides:



Advertisements
Similar presentations
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
Advertisements

Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
Distance-Constraint Reachability Computation in Uncertain Graphs Ruoming Jin, Lin Liu Kent State University Bolin Ding UIUC Haixun Wang MSRA.
1 Union-find. 2 Maintain a collection of disjoint sets under the following two operations S 3 = Union(S 1,S 2 ) Find(x) : returns the set containing x.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Chapter 4 The Greedy Approach. Minimum Spanning Tree A tree is an acyclic, connected, undirected graph. A spanning tree for a given graph G=, where E.
GRAIL: Scalable Reachability Index for Large Graphs VLDB2010 Vineet Chaoji Mohammed J. Zaki.
Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
SASH Spatial Approximation Sample Hierarchy
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Accelerating Inferencing. Assertion Efficient inferencing using taxonomies require fast computation of subsumption, disjointness, least common ancestors,
CS Lecture 9 Storeing and Querying Large Web Graphs.
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Spring 2010CS 2251 Graphs Chapter 10. Spring 2010CS 2252 Chapter Objectives To become familiar with graph terminology and the different types of graphs.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus.
Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU),
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Trees Featuring Minimum Spanning Trees HKOI Training 2005 Advanced Group Presented by Liu Chi Man (cx)
Directed graphs Definition. A directed graph (or digraph) is a pair (V, E), where V is a finite non-empty set of vertices, and E is a set of ordered pairs.
TEDI: Efficient Shortest Path Query Answering on Graphs Author: Fang Wei SIGMOD 2010 Presentation: Dr. Greg Speegle.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
An Introduction to Network Science and Network Data Management Ruoming Jin Department of Computer Science Kent State University.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
1 22c:31 Algorithms Union-Find for Disjoint Sets.
Distance sensitivity oracles in weighted directed graphs Raphael Yuster University of Haifa Joint work with Oren Weimann Weizmann inst.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
알고리즘 설계 및 분석 Foundations of Algorithm 유관우. Digital Media Lab. 2 Chap4. Greedy Approach Grabs data items in sequence, each time with “best” choice, without.
Fast and practical indexing and querying of very large graphs Silke Triβl, Ulf Leser Humboldt-Universitat zu Berlin Presenter: Liwen Sun (Stephen) SIGMOD’07.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Minimum Spanning Trees Featuring Disjoint Sets HKOI Training 2006 Liu Chi Man (cx) 25 Mar 2006.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Graphs Upon completion you will be able to:
Graph Representations And Traversals. Graphs Graph : – Set of Vertices (Nodes) – Set of Edges connecting vertices (u, v) : edge connecting Origin: u Destination:
Introduction to Multiple-multicast Routing Chu-Fu Wang.
Graphs David Kauchak cs302 Spring Admin HW 12 and 13 (and likely 14) You can submit revised solutions to any problem you missed Also submit your.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Graph Search Applications, Minimum Spanning Tree
Directed Graphs 12/7/2017 7:15 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Chapter 5. Greedy Algorithms
Directed Graphs 9/20/2018 1:45 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
I206: Lecture 15: Graphs Marti Hearst Spring 2012.
Directed Graphs Directed Graphs 1 Shortest Path Shortest Path
Discrete Maths 9. Graphs Objective
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Graph Algorithm.
Random Sampling over Joins Revisited
Randomized Algorithms CS648
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Autumn 2015 Lecture 10 Minimum Spanning Trees
Chapter 15 Graphs © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Directed Graphs Directed Graphs Directed Graphs 2/23/ :12 AM BOS
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Algorithm Course Dr. Aref Rashad
Reachability Queries on Large Dynamic Graphs: A Total Order Approach
Distance-Constraint Reachability Computation in Uncertain Graphs
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong

Reachability Query u v Query(u,v): Is there a directed path from vertex u to vertex v in graph G?

Reachability Query u v yes Query(u,v): Is there a directed path from vertex u to vertex v in graph G?

Reachability Query u v Query(u,v): Is there a directed path from vertex u to vertex v in graph G?

Reachability Query u v no Query(u,v): Is there a directed path from vertex u to vertex v in graph G?

Graph Indexing for Reachability Queries Given a directed graph G, build an index to support reachability queries ←(u 1,v 1 ), (u 2,v 2 ), (u 3,v 3 ), … Reachability Index

Applications XML with IDREF/ID reference links  Modelled as directed graph (not just tree) Semantic Webs  Ontology graph Bioinformatics  Data such as metabolic pathways, protein interaction, etc. modelled as directed graph

The challenge Graph is huge  DFS/BFS at query time too slow, O(m) time  Pre-computing the Transitive Closure requires too much storage, O(n 2 ) storage Goal: to design an index with  small size (better than TC),  fast query time (better than DFS/BFS), and  reasonable construction time

Assumptions & Terminologies Assume G is a Directed Acyclic Graph (DAG) – a standard assumption (u,v) is called a reachable pair iff u can reach v Set of all reachable pairs = Transitive Closure

Previous Work: Tree Cover Find a tree cover (spanning tree) Label vertices with an interval labeling  Reachable pairs (u,v) where u is a tree-ancestor of v covered Uncovered reachable pairs compressed using tree cover (to be explained…) [Agarwal et al. SIGMOD’89]

Previous Work: Tree Cover Find a tree cover (spanning tree) Label vertices with an interval labeling  Reachable pairs (u,v) where u is a tree-ancestor of v covered Uncovered reachable pairs compressed using tree cover (to be explained…) [Agarwal et al. SIGMOD’89]

Previous Work: Tree Cover Find a tree cover (spanning tree) Label vertices with an interval labeling  Reachable pairs (u,v) where u is a tree-ancestor of v covered Uncovered reachable pairs compressed using tree cover (to be explained…) [1,13] [2,4] [3,3] [5,12] [6,11] [7,7] [8,10] [9,9] [Agarwal et al. SIGMOD’89]

Previous Work: Tree Cover Find a tree cover (spanning tree) Label vertices with an interval labeling  Reachable pairs (u,v) where u is a tree-ancestor of v covered Uncovered reachable pairs compressed using tree cover (to be explained…) [1,13] [2,4] [3,3] [5,12] [6,11] [7,7] [8,10] [9,9] [Agarwal et al. SIGMOD’89]

Previous Work: Chain Cover Find a chain cover (collection of disjoint chains covering all vertices) Label each vertex by ChainID & Sequence No. in its chain  Reachable pairs (u,v) where u, v in same chain covered Uncovered reachable pairs compressed using chains (to be explained…) [Jagadish TODS’90]

Previous Work: Chain Cover Find a chain cover (collection of disjoint chains covering all vertices) Label each vertex by ChainID & Sequence No. in its chain  Reachable pairs (u,v) where u, v in same chain covered Uncovered reachable pairs compressed using chains (to be explained…) C1C1 C2C2 C3C3 [Jagadish TODS’90]

Previous Work: Chain Cover Find a chain cover (collection of disjoint chains covering all vertices) Label each vertex by ChainID & Sequence No. in its chain  Reachable pairs (u,v) where u, v in same chain covered Uncovered reachable pairs compressed using chains (to be explained…) C1C1 C2C2 C3C3 [Jagadish TODS’90]

Previous Work: Chain Cover Find a chain cover (collection of disjoint chains covering all vertices) Label each vertex by ChainID & Sequence No. in its chain  Reachable pairs (u,v) where u, v in same chain covered Uncovered reachable pairs compressed using chains (to be explained…) C1C1 C2C2 C3C3 [Jagadish TODS’90]

Previous Work: 2-Hop Cover Each vertex v keeps two sets: L in (v): set of some vertices that can reach v L out (v): set of some vertices that v can reach For each reachable pair (u,v): find an intermediate vertex c along some path from u to v; add c to L out (u) and L in (v) Processing query(u,v): (u,v) reachable pair iff L out (u) ∩ L in (v) ≠ Ø u c v [Cohen et al. SODA’02]

Finding a good 2-hop cover Aim: to minimize Σ u | L in (u)| + | L out (u)| Equivalent to a weighted set cover problem:  Ground Set : set of reachable pairs  Collection of candidate sets : { C in x C out | C in subset of vertices that reach w, C out subset of vertices that w reaches }  Weight of C in x C out = |C in | + |C out | Apply greedy set cover algorithm  Exponentially many candidates for each w  Consider only some of them Slow construction time, O(n 4 )

Previous Work (cont’d) Dual Labeling [Wang et al. ICDE’06]  Tree Cover plus TLC matrix Path-Tree Cover [Jin et al. SIGMOD’08]  Combine Tree and Chain Cover  Fast query time  Space-efficient for sparse graphs 3-Hop Cover [Jin et al. SIGMOD‘09]  Enhanced Chain Cover + 2-Hop  Most space-efficient, slow construction time

3-Hop Cover C1C1 C2C2 C3C3 TC Contour: corners of off-diagonal blocks Compressed using 2-hop

C1C1 C2C2 C3C3 Diagonal blocks covered by Chain ID, Sequence No

TC Contour C1C1 C2C2 C3C3 (1,5) dominated by (1,4) (2,6) dominated by (3,6)

An Observation Chain cover  CiteSeer: 50% chains are single vertices, 87% chains have ≤ 2 vertices  Yago: 87% chains are single vertices, 96% chains have ≤ 2 vertices Many short chains Can we cover the graph using graph structures other than chains?

Tree Cover & Residual TC Only roots of subtrees need to be stored

DatasetTC ContourRTC Citeseer152K61K Go46K14K Pubmed288K82K Yago30K22K Size of residual TC much smaller than size of TC Contour

Path-Hop Combine Tree Cover & 2-Hop u c v u y x v u v y x 2-hop (u,v) in TC 3-hop (u,v) in TC Contour path-hop (u,v) in Residual TC

Path-Hop Index Construction:  Find a tree cover; label vertices by an interval labeling scheme  Compute RTC  Repeat select a path-hop x → y Add x to L out (u) if (u,x) in RTC Add y to L in (v) if (y,v) in RTC Until all reachable pairs in RTC covered

Path-Hop Processing Query(u,v):  If u is tree-ancestor of v, return “YES”  For each ancestor z of v  If there is x in L out (u) and y in L in (z) such that x is tree-ancestor of y, return “YES”  return “NO”

Experiments Implemented in C++ on OpenSolaris Intel 3.4 GHz x86 / 2GB RAM 4 synthetic datasets, 4 real datasets (available online) Dataset|V DAG ||E DAG | Synthetic rand2k_8 rand2k_10 rand2k_12 Xmark Real Life Citeseer Go PubMed Yago

Methods tested  Dual labeling (Dual II)  Path-Tree (PTree I)  3-Hop labeling (3-Hop Contour)  Path-Hop

Index Size DatasetDual IIPTree I 3-Hop Contour Path-HopTC Rand2k_ Rand2k_ Rand2k_ Xmark Citeseer Go PubMed Yago (rounded to nearest Kbytes)

Query Processing Time Query Processing Time (ms) Synthetic and Real Datasets Total time for processing 100,000 random queries

Query Processing Time Query Processing Time (ms) Synthetic and Real Datasets Total time for processing 100,000 random queries Dual II is too slow

Index Construction Time

Future Directions Can we design an even more succinct index structure for reachability? Can we design a scheme that offers a tradeoff between index size and query time? Is randomization helpful? What about other queries (e.g. shortest distance/path)?

The END Questions?