Download presentation
Presentation is loading. Please wait.
Published byAdam Willis Modified over 9 years ago
1
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond, WA CIKM 2012 Microsoft Research Redmond, WA
2
Example 1: Social Network 2
3
Example 2: Bibliographical Network 3
4
Contributions 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 4
5
1. G-SPARQL Query Language Extends a subset of SPARQL – Based on triple pattern: (subject, predicate, object) Sub-graph matching patterns on – Graph structure – Node attribute – Edge attribute Reachability patterns on – Path – Shortest path 5
6
G-SPARQL Syntax 6
7
G-SPARQL Pattern Matching 7 Node attribute – ?Person @officeNumber “518” Edge attribute – ?E @Role “Programmer” Structural – ?Person worksAt Microsoft – ?Person ?E(worksAt) Microsoft
8
G-SPARQL Reachability 8 Path – Subject ??PathVar Object Shortest path – Subject ?*PathVar Object Path filters – Path length – All edges – All nodes
9
Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?X @Label ?L1. ?Y @Label ?L2. ?X @Age ?Age1. ?Y @Age ?Age2. ?X Affiliated UNSW. ?Y ?E(Affiliated) Microsoft. ?X LivesIn Sydney.?E @Title "Researcher". FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). FILTERPATH( Length( ??P, <= 3) ). } 9
10
Outline 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 10
11
2. Hybrid Execution Engine Reachability queries – Main memory algorithms – Example: BFS and Dijkstra’s algorithm Pattern matching queries – Relational database – Indexing » Example: B-tree – Query optimizations, » Example: selectivity estimation, and join ordering – Recursive queries » Not efficient: large intermediate results and multiple joins 11
12
Graph Representation 12 IDValue 1John 2Paper 2 3Alice 4Microsoft 5VLDB’12 6Paper 1 7UNSW 8Smith IDValue 145 342 828 IDValue 8518 IDValue 3Sydney 5Istanbul IDValue 2XML 6graph IDValue 2Demo IDValue 4USA 7Australia IDValue 41975 71949 eIDsIDdID 112 532 636 1186 Node Labelageofficelocationkeyword type established country authorOf eIDsIDdID 314 837 1287 affiliated eIDsIDdID 425 1065 published eIDsIDdID 962 citedBy eIDsIDdID 738 supervise eIDsIDdID 213 know IDValue 3Senior Researcher 8Professor title IDValue 12 51 62 111 order IDValue 43 101 month
13
Hybrid Execution Engine: interfaces 13 G-SPARQL query SQL commands Traversal operations
14
3. Intermediate Language & Compilation 14 Physical execution plan SQL commands Traversal operations G-SPARQL query Algebraic query plan Front-end compilation Step 2 Back-end compilation Step 1
15
Intermediate Language Objective – Generate query plan and chop it » Reachability part -> main-memory algorithms on topology » Pattern matching part -> relational database – Optimizations Features – Independent of execution engine and graph representation – Algebraic query plan 15
16
G-SPARQL Algebra Variant of “Tuple Algebra” Algebra details – Data: tuples » Sets of nodes, edges, paths. – Operators » Relational: select, project, join » Graph specific: node and edge attributes, adjacency » Path operators 16
17
17 Relational
18
18 Relational NOT Relational
19
Front-end Compilation (Step 1) Input – G-SPARQL query Output – Algebraic query plan Technique – Map » from triple patterns » To G-SPARQL operators – Use inference rules 19
20
Front-end Compilation: Inference Rules 20
21
Front-end Compilation: Optimizations Objective – Delay execution of traversal operations Technique – Order triple patterns, based on restrictiveness Heuristics – Triple pattern P1 is more restrictive than P2 1.P1 has fewer path variables than P2 2.P1 has fewer variables than P2 3.P1’s variables have more filter statements than P2’s variables 21
22
Back-end Compilation (Step 2) Input – G-SPARQL algebraic plan Output – SQL commands – Traversal operations Technique – Substitute G-SPARLQ relational operators with SPJ – Traverse » Bottom up » Stop when reaching root or reaching non-relational operator » Transform relational algebra to SQL commands – Send non-relational commands to main memory algorithms 22
23
Back-end Compilation: Optimizations Optimize a fragment of query plan – Before generating SQL command All operators are Select/Project/Join Apply standard techniques – For example pushing selection 23
24
Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?X @label ?L1. ?Y @label ?L2. ?X @age ?Age1. ?Y @age ?Age2. ?X affiliated UNSW. ?Y ?E(affiliated) Microsoft. ?X livesIn Sydney.?E @title "Researcher" FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). } 24
25
Example: Query Plan 25
26
4. Experimental Evaluation 26 Objective – This is a good idea – Good performance from DBMS and main memory topology Data sets – Real ACM bibliographic network – Synthetic graphs » See technical report
27
Experimental Environment 27 Workload – Created Q1 … Q12 Process – Compare to Neo4J (non-optimized, optimized) Environment – Implementation » Main memory algorithms in C++ » IBM DB2 – PC Server
28
Results on Real Dataset 28
29
Response time on ACM Bibliographic Network 29
30
Conclusions G-SPARQL Language – Expresses pattern matching and reachability queries on attributed graphs Hybrid engine – Graph topology in main memory – Graph data in database Compilation into algebraic plan – Operators and optimizations Evaluation – Real and synthetic datasets – Good performance » Leveraging database engine and main memory topology 30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.