G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Database Systems and XML David Wu CS 632 April 23, 2001.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Query Processing Presented by Aung S. Win.
VLDB 2005 An Efficient SQL-based RDF Querying Scheme Eugene Inseok Chong Souripriya Das George Eadon Jagannathan Srinivasan New England Development Center.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
SPARQL Semantic Web - Spring 2008 Computer Engineering Department Sharif University of Technology.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Access Path Selection in a Relational Database Management System Selinger et al.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Lesley Charles November 23, 2009.
Michael Soffner A Variability Model for Query Optimizers Michael Soffner 1, Norbert Siegmund 1, Marko Rosenmüller 1, Janet Siegmund 1, Thomas.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs Mohamed Sarwat (Arizona State University) Sameh Elnikety.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
SSQSA present and future Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics Faculty of Sciences University of Novi Sad
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
GRIN: A Graph Based RDF Index Octavian Udrea 1 Andrea Pugliese 2 V. S. Subrahmanian 1 1 University of Maryland College Park 2 Università di Calabria.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Efficient Evaluation of XQuery over Streaming Data
Query Optimization Heuristic Optimization
Prepared by : Ankit Patel (226)
Probabilistic Data Management
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
G-CORE: A Core for Future Graph Query Languages
MCN: A New Semantics Towards Effective XML Keyword Search
A Framework for Testing Query Transformation Rules
Query Optimization.
CPSC-608 Database Systems
Translating Imperative Code into SQL
Presentation transcript:

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond, WA CIKM 2012 Microsoft Research Redmond, WA

Example 1: Social Network 2

Example 2: Bibliographical Network 3

Contributions 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 4

1. G-SPARQL Query Language Extends a subset of SPARQL – Based on triple pattern: (subject, predicate, object) Sub-graph matching patterns on – Graph structure – Node attribute – Edge attribute Reachability patterns on – Path – Shortest path 5

G-SPARQL Syntax 6

G-SPARQL Pattern Matching 7 Node attribute – “518” Edge attribute – “Programmer” Structural – ?Person worksAt Microsoft – ?Person ?E(worksAt) Microsoft

G-SPARQL Reachability 8 Path – Subject ??PathVar Object Shortest path – Subject ?*PathVar Object Path filters – Path length – All edges – All nodes

Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?L1. ?L2. ?Age1. ?Age2. ?X Affiliated UNSW. ?Y ?E(Affiliated) Microsoft. ?X LivesIn "Researcher". FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). FILTERPATH( Length( ??P, <= 3) ). } 9

Outline 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 10

2. Hybrid Execution Engine Reachability queries – Main memory algorithms – Example: BFS and Dijkstra’s algorithm Pattern matching queries – Relational database – Indexing » Example: B-tree – Query optimizations, » Example: selectivity estimation, and join ordering – Recursive queries » Not efficient: large intermediate results and multiple joins 11

Graph Representation 12 IDValue 1John 2Paper 2 3Alice 4Microsoft 5VLDB’12 6Paper 1 7UNSW 8Smith IDValue IDValue 8518 IDValue 3Sydney 5Istanbul IDValue 2XML 6graph IDValue 2Demo IDValue 4USA 7Australia IDValue eIDsIDdID Node Labelageofficelocationkeyword type established country authorOf eIDsIDdID affiliated eIDsIDdID published eIDsIDdID 962 citedBy eIDsIDdID 738 supervise eIDsIDdID 213 know IDValue 3Senior Researcher 8Professor title IDValue order IDValue month

Hybrid Execution Engine: interfaces 13 G-SPARQL query SQL commands Traversal operations

3. Intermediate Language & Compilation 14 Physical execution plan SQL commands Traversal operations G-SPARQL query Algebraic query plan Front-end compilation Step 2 Back-end compilation Step 1

Intermediate Language Objective – Generate query plan and chop it » Reachability part -> main-memory algorithms on topology » Pattern matching part -> relational database – Optimizations Features – Independent of execution engine and graph representation – Algebraic query plan 15

G-SPARQL Algebra Variant of “Tuple Algebra” Algebra details – Data: tuples » Sets of nodes, edges, paths. – Operators » Relational: select, project, join » Graph specific: node and edge attributes, adjacency » Path operators 16

17 Relational

18 Relational NOT Relational

Front-end Compilation (Step 1) Input – G-SPARQL query Output – Algebraic query plan Technique – Map » from triple patterns » To G-SPARQL operators – Use inference rules 19

Front-end Compilation: Inference Rules 20

Front-end Compilation: Optimizations Objective – Delay execution of traversal operations Technique – Order triple patterns, based on restrictiveness Heuristics – Triple pattern P1 is more restrictive than P2 1.P1 has fewer path variables than P2 2.P1 has fewer variables than P2 3.P1’s variables have more filter statements than P2’s variables 21

Back-end Compilation (Step 2) Input – G-SPARQL algebraic plan Output – SQL commands – Traversal operations Technique – Substitute G-SPARLQ relational operators with SPJ – Traverse » Bottom up » Stop when reaching root or reaching non-relational operator » Transform relational algebra to SQL commands – Send non-relational commands to main memory algorithms 22

Back-end Compilation: Optimizations Optimize a fragment of query plan – Before generating SQL command All operators are Select/Project/Join Apply standard techniques – For example pushing selection 23

Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?L1. ?L2. ?Age1. ?Age2. ?X affiliated UNSW. ?Y ?E(affiliated) Microsoft. ?X livesIn "Researcher" FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). } 24

Example: Query Plan 25

4. Experimental Evaluation 26 Objective – This is a good idea – Good performance from DBMS and main memory topology Data sets – Real ACM bibliographic network – Synthetic graphs » See technical report

Experimental Environment 27 Workload – Created Q1 … Q12 Process – Compare to Neo4J (non-optimized, optimized) Environment – Implementation » Main memory algorithms in C++ » IBM DB2 – PC Server

Results on Real Dataset 28

Response time on ACM Bibliographic Network 29

Conclusions G-SPARQL Language – Expresses pattern matching and reachability queries on attributed graphs Hybrid engine – Graph topology in main memory – Graph data in database Compilation into algebraic plan – Operators and optimizations Evaluation – Real and synthetic datasets – Good performance » Leveraging database engine and main memory topology 30