Presentation is loading. Please wait.

Presentation is loading. Please wait.

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,

Similar presentations


Presentation on theme: "G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,"— Presentation transcript:

1 G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond, WA CIKM 2012 Microsoft Research Redmond, WA

2 Example 1: Social Network 2

3 Example 2: Bibliographical Network 3

4 Contributions 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 4

5 1. G-SPARQL Query Language Extends a subset of SPARQL – Based on triple pattern: (subject, predicate, object) Sub-graph matching patterns on – Graph structure – Node attribute – Edge attribute Reachability patterns on – Path – Shortest path 5

6 G-SPARQL Syntax 6

7 G-SPARQL Pattern Matching 7 Node attribute – ?Person @officeNumber “518” Edge attribute – ?E @Role “Programmer” Structural – ?Person worksAt Microsoft – ?Person ?E(worksAt) Microsoft

8 G-SPARQL Reachability 8 Path – Subject ??PathVar Object Shortest path – Subject ?*PathVar Object Path filters – Path length – All edges – All nodes

9 Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?X @Label ?L1. ?Y @Label ?L2. ?X @Age ?Age1. ?Y @Age ?Age2. ?X Affiliated UNSW. ?Y ?E(Affiliated) Microsoft. ?X LivesIn Sydney.?E @Title "Researcher". FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). FILTERPATH( Length( ??P, <= 3) ). } 9

10 Outline 1.G-SPARQL language – Pattern matching – Reachability 2.Hybrid execution engine – Graph topology in main memory – Graph data in relational database 3.Algebraic transformation – Operators – Optimizations 4.Experimental evaluation 10

11 2. Hybrid Execution Engine Reachability queries – Main memory algorithms – Example: BFS and Dijkstra’s algorithm Pattern matching queries – Relational database – Indexing » Example: B-tree – Query optimizations, » Example: selectivity estimation, and join ordering – Recursive queries » Not efficient: large intermediate results and multiple joins 11

12 Graph Representation 12 IDValue 1John 2Paper 2 3Alice 4Microsoft 5VLDB’12 6Paper 1 7UNSW 8Smith IDValue 145 342 828 IDValue 8518 IDValue 3Sydney 5Istanbul IDValue 2XML 6graph IDValue 2Demo IDValue 4USA 7Australia IDValue 41975 71949 eIDsIDdID 112 532 636 1186 Node Labelageofficelocationkeyword type established country authorOf eIDsIDdID 314 837 1287 affiliated eIDsIDdID 425 1065 published eIDsIDdID 962 citedBy eIDsIDdID 738 supervise eIDsIDdID 213 know IDValue 3Senior Researcher 8Professor title IDValue 12 51 62 111 order IDValue 43 101 month

13 Hybrid Execution Engine: interfaces 13 G-SPARQL query SQL commands Traversal operations

14 3. Intermediate Language & Compilation 14 Physical execution plan SQL commands Traversal operations G-SPARQL query Algebraic query plan Front-end compilation Step 2 Back-end compilation Step 1

15 Intermediate Language Objective – Generate query plan and chop it » Reachability part -> main-memory algorithms on topology » Pattern matching part -> relational database – Optimizations Features – Independent of execution engine and graph representation – Algebraic query plan 15

16 G-SPARQL Algebra Variant of “Tuple Algebra” Algebra details – Data: tuples » Sets of nodes, edges, paths. – Operators » Relational: select, project, join » Graph specific: node and edge attributes, adjacency » Path operators 16

17 17 Relational

18 18 Relational NOT Relational

19 Front-end Compilation (Step 1) Input – G-SPARQL query Output – Algebraic query plan Technique – Map » from triple patterns » To G-SPARQL operators – Use inference rules 19

20 Front-end Compilation: Inference Rules 20

21 Front-end Compilation: Optimizations Objective – Delay execution of traversal operations Technique – Order triple patterns, based on restrictiveness Heuristics – Triple pattern P1 is more restrictive than P2 1.P1 has fewer path variables than P2 2.P1 has fewer variables than P2 3.P1’s variables have more filter statements than P2’s variables 21

22 Back-end Compilation (Step 2) Input – G-SPARQL algebraic plan Output – SQL commands – Traversal operations Technique – Substitute G-SPARLQ relational operators with SPJ – Traverse » Bottom up » Stop when reaching root or reaching non-relational operator » Transform relational algebra to SQL commands – Send non-relational commands to main memory algorithms 22

23 Back-end Compilation: Optimizations Optimize a fragment of query plan – Before generating SQL command All operators are Select/Project/Join Apply standard techniques – For example pushing selection 23

24 Example: G-SPARQL Query SELECT ?L1 ?L2 WHERE { ?X ??P ?Y. ?X @label ?L1. ?Y @label ?L2. ?X @age ?Age1. ?Y @age ?Age2. ?X affiliated UNSW. ?Y ?E(affiliated) Microsoft. ?X livesIn Sydney.?E @title "Researcher" FILTER(?Age1 >= 40). FILTER(?Age2 >= 40). } 24

25 Example: Query Plan 25

26 4. Experimental Evaluation 26 Objective – This is a good idea – Good performance from DBMS and main memory topology Data sets – Real ACM bibliographic network – Synthetic graphs » See technical report

27 Experimental Environment 27 Workload – Created Q1 … Q12 Process – Compare to Neo4J (non-optimized, optimized) Environment – Implementation » Main memory algorithms in C++ » IBM DB2 – PC Server

28 Results on Real Dataset 28

29 Response time on ACM Bibliographic Network 29

30 Conclusions G-SPARQL Language – Expresses pattern matching and reachability queries on attributed graphs Hybrid engine – Graph topology in main memory – Graph data in database Compilation into algebraic plan – Operators and optimizations Evaluation – Real and synthetic datasets – Good performance » Leveraging database engine and main memory topology 30


Download ppt "G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,"

Similar presentations


Ads by Google