Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs Mohamed Sarwat (Arizona State University) Sameh Elnikety.

Slides:



Advertisements
Similar presentations
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Fast Algorithms For Hierarchical Range Histogram Constructions
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
BLAS: An Efficient XPath Processing System Chen Y., Davidson S., Zheng Y. Νίκος Λούτας.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Aules d’Empresa 2011 Aules d’empresa 2011 DEX. Nom e la presenatació o altra info (opcional) Aules d’Empresa 2011 Contents Graph database Motivation DEX.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
System Support for Managing Graphs in the Cloud Sameh Elnikety & Yuxiong He Microsoft Research.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the Database Engine.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Chapter 13: Query Processing
SCHOOL OF ENGINEERING AND ADVANCED TECHNOLOGY Engineering Project Routing in Small-World Networks.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Efficient Evaluation of XQuery over Streaming Data
SQL Server 2017 Graph Database Inside-Out
Database Management System
Introduction | Model | Solution | Evaluation
Every Good Graph Starts With
Efficient Join Query Evaluation in a Parallel Database System
Prepared by : Ankit Patel (226)
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Probabilistic Data Management
Query-Friendly Compression of Graph Streams
Dieudo Mulamba November 2017
Logics for Data and Knowledge Representation
On Efficient Graph Substructure Selection
The Relational Algebra and Relational Calculus
Randomized Algorithms CS648
Selected Topics: External Sorting, Join Algorithms, …
Lu Xing CS59000GDM 9/21/2018.
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
G-CORE: A Core for Future Graph Query Languages
Efficient Subgraph Similarity All-Matching
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Answering Cross-Source Keyword Queries Over Biological Data Sources
Amir Kamil and Katherine Yelick
Tahsin Reza Matei Ripeanu Nicolas Tripoul
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs Mohamed Sarwat (Arizona State University) Sameh Elnikety (Microsoft Research) Yuxiong He (Microsoft Research) Mohamed Mokbel (University of Minnesota)

Motivation Social network Queries – Find Alice’s friends – How Alice & Ed are connected – Find Alice’s photos with friends 2

Data Model Attributed multi-graph Node – Represent entities – ID, type, attributes Edge – Represent binary relationship – Type, direction, weight, attrs App Horton 3

Horton+ Contributions 1.Defining reachability queries formally 2.Introducing graph operators for distributed graph engine 3.Developing query optimizer 4.Evaluating the techniques experimentally 4

Graph Reachability Queries Query is a regular expression – Sequence of node and edge predicates 1.Hello world in reachability » Photo-Tags-’Alice’ » Search for path with node: type=Photo, edge: type=Tags, node: id=‘Alice’ 2.Attribute predicate » Photo{date.year=‘2012’}-Tags-’Alice’ 3.Or » (Photo | video)-Tags-’Alice’ 4.Closure for path with arbitrary length » ‘Alice’(-Manages-Person)* » Kleene star to find Alice’s org chart 5

Declarative Query Language DeclarativeNavigational Photo-Tags-’Alice’Foreach( n1 in graph.Nodes.SelectByType(Photo) ) { Foreach( n2 in n1.GetNeighboursByEdgeType(Tags) { If(node2.id == ‘Alice’) { return path(node1, Tags, node2) } 6

Comparison to SQL & SPARQL SQL RL SQL SPARQL – Pattern matching » Find sub-graph in a bigger graph 7

‘Alice’-Tags-Photo ‘Alice’TagsPhoto Compile into Algebraic Query Plan ‘Alice’(-Manages-Person)* ‘Alice’ Manages Person 8

‘Alice’-Tags-Photo Breadth First Search Answer Paths: ‘Alice’-Tags-Photo1 ‘Alice’-Tags-Photo8 ‘Alice’ Tags Photo Centralized Query Execution 9

Distributed Query Execution Partition 2 Partition 1 ‘Alice’-Tags-Photo-Tags-’Bob’ 10

‘Alice’-Tags-Photo-Tags-‘Bob’ ‘Alice’ Tags Photo Distributed Query Execution Tags ‘Bob’ Alice Photo1Photo8 Step 1 Step 2 Step 3 Partition 1 Partition 2 Bob Partition 1 Partition 2 FSM 11

Architecture Distributed Execution Engine 12

Algebraic Operators 1.Select – Find set of starting nodes 2.Traverse – Traverse graph to construct paths 3.Join – Construct longer paths ‘Alice’-Tags-Photo ‘Alice’TagsPhoto 13

Plan Enumeration for Query Optimization 14 Query: ‘Mike’-Tags-Photo-Tags-Person-FriendOf-‘Mike’ Example plans 1.Left to right » ‘Mike’-Tags-Photo-Tags-Person-FriendOf-‘Mike’ 2.Right to left » ‘Mike’-FriendOf-Person-Tags-Photo-Tags-‘Mike’ 3.Split then join » (‘Mike’-FriendOf-Person) ⋈ (Person-Tags-Photo-Tags-‘Mike’) 4.Split then join » (‘Mike’-FriendOf-Person-Tags-Photo) ⋈ (Photo-Tags-‘Mike’) 5.…

Query: Q[1, n] = N 1 E 1 N 2 E 2 …… N n-1 E n-1 N n Selectivity of query Q[i,j] : Sel(Q[i,j]) Minimum cost of query Q[i,j] : F(Q[i,j]) Enumeration Algorithm Apply dynamic programming Store intermediate results of all F(Q[i,j]) pairs Complexity: O(n 3 ) F(Q[i,j]) = min{ SequentialCost_LR(Q[i,j]), SequentialCost_RL(Q[i,j]), min_{i<k<j} (F(Q[i,k]) + F(Q[k,j]) + Sel(Q[i,k])*Sel(Q[k,j])) } Base step: F(Q i ) = F(N i ) = Cost of matching predicate N i 15

Graphs Real dataset (codebook graph: 4M nodes, 14M edges, 20 types) Synthetic dataset (RMAT graph, 1024M nodes, 5120M edges) Machines Commodity servers Intel Core 2 Duo 2.26 GHz, 16 GB ram Experimental Evaluation 16

Q1: Short Find the person who committed checkin 400 and the WorkItemRevisions it modifies: Person-Committer-Checkin{id=400}-Modifies-WorkItemRevision Q2: Selective Find Dave’s checkins that modified a WorkItem create by Tim: ‘Dave’-Committer-Checkin-Modifies-WorkItem-CreatedBy-’Tim’ Q3: Report For each checkin, find the person (and his/her manager) who committer it as well as all the work items and their WebURLs that are modified by that checkin: Person-Manages-Person-Committer-Checkin-Modifies-WorkItemRevision-Modifies- WorkItem-Links-WebURL Q4: Closure Retrieve all checkins that any employee in Dave organizational chart (working under him) committed: ‘Dave’(-Manages-Person)*-Checkin Query Workload 17

Query Execution Time (Small Graph) 18

Query Execution Time RMAT graph – does not fit in one server, 1024 M nodes, 5120 M edges 16 partition servers Execution time dominated by computations QueryTotal ExecutionCommunicationComputation Q sec0.723 sec sec Q sec0.693 sec sec Q sec1.258 sec sec 19

Query Optimization Synthetic graphs – Vary graph size Centralized (1 Server) Execution time for queries Q1, Q2, Q3 20

Horton+ Contributions 1.Defining reachability queries formally 2.Introducing graph operators for distributed graph engine 3.Developing query optimizer 4.Evaluating the techniques experimentally 21