Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Similar presentations


Presentation on theme: "Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,"— Presentation transcript:

1 Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2011- 2324P Triangle Finding: How Graph Theory can Help the Semantic Web Edward Jimenez, Eric Goodman

2 The Semantic Web as a Graph

3

4 Optimizing Queries with Graph Theory  Graph theory has a lot to offer the semantic web  One example: triangle finding  O(|E| 1.5 )  Much more efficient than what a typical database would do. Query2 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:GraduateStudent. ?Y rdf:type ub:University. ?Z rdf:type ub:Department. ?X ub:memberOf ?Z. ?Z ub:subOrganizationOf ?Y. ?X ub:undergraduateDegreeFrom ?Y} Query9 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:Student. ?Y rdf:type ub:Faculty. ?Z rdf:type ub:Course. ?X ub:advisor ?Y. ?Y ub:teacherOf ?Z. ?X ub:takesCourse ?Z}

5 Experiment  Compare these three approaches, finding all triangles in a graph  Sesame  Jena  MultiThreaded Graph Library (MTGL)  MTGL  Open source library of graph algorithms, targeted towards shared memory supercomputers  Used MTGL’s implementation of J. Cohen’s triangle finding algorithm  Had to modify slightly to allow for multiple edges between vertices.

6 Data  Data: An Recursive Matrix (R-MAT) graph  Specify  |V|  edge factor (average number of edges per vertex)  Probabilities a, b, c, d, where a+b+c+d=1.  Has properties similar to real-world graphs such as short diameters and small-world properties.  Used as basis of Graph500 benchmark.  Nodes are given a unique IRI and edges are given a random value.  |V| = {2 5 -2 19 }  Edge factor: {16, 32, 64} a b c d a b c d

7 Possible Triangles

8 Trying to Find Triangles via SPARQL SELECT ?X ?Y ?Z WHERE { {?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X } UNION {?Y ?a ?X ?Z ?b ?Y ?X ?c ?Z} UNION {?X ?a ?Y ?Y ?b ?Z ?X ?c ?Z} UNION {?X ?a ?Y. ?Z ?b ?Y. ?X ?c ?Z } UNION {?Y ?a ?X ?Y ?b ?Z ?X ?c ?Z} UNION {?Y ?a ?X ?Z ?b ?Y ?Z ?c ?X} UNION {?X ?a ?Y. ?Z ?b ?Y. ?Z ?c ?X } UNION {?Y ?a ?X ?Y ?b ?Z ?Z ?c ?X}} Redundant Solutions

9 The Problem: Graph Isomorphism ?X ?Z ?Y iii ?X ?Z ?Y iv ?X = Alice ?Y = Bob ?Z = Charlie Alice Bob Charlie ?X = Alice ?Y = Charlie ?Z = Bob Alice Charlie Bob

10 The Other Problem: Automorphism ?X ?Z ?Y i Alice Bob Charlie Alice Bob ?X = Alice ?Y = Bob ?Z = Charlie ?X = Charlie ?Y = Alice ?Z = Bob

11 Possible Triangles

12 The SPARQL Query SELECT ?X ?Y ?Z WHERE {{ ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?X) < STR(?Y)) FILTER (STR(?Y) < STR(?Z)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?Y) > STR(?Z)) FILTER (STR(?Z) > STR(?X)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?X ?c ?Z }}

13 Cohen’s Triangle Algorithm  Assumptions  Simplified graph  Completely connected  Map 1: O(m)  Use v 1 < v 2 < ··· < v n for tie-breaking

14 Cohen’s Triangle Algorithm  Reduce: O(m 3/2 ), … …

15 Cohen’s Triangle Algorithm  Map 2: O(m 3/2 )  Identity mapping of previous reduce step.  Map edges v8v8 v8v8 v20v20 v20v20 v1v1 v1v1 v8v8 v8v8 v20v20 v20v20 v3v3 v3v3 v8v8 v8v8 v20v20 v20v20 v2v2 v2v2 bin … v8v8 v8v8 v20v20 v20v20  Reduce 2: O(m 3/2 )  Emit triangles for the contents of each bin when the edge exists between v i and v j.

16 Results: Growth of Triangles

17 Results

18 Comparison at Larger Scales  With 1 billion edges, assuming the same constant  An O(x 1.39 ) implementation versus an O(x 1.58 ) is 50x faster  An O(x 1.39 ) implementation versus an O(x 1.83 ) is 9000x faster

19 Conclusions  The Semantic Web is a graph  Graph theory can add a lot in terms of speeding up queries  It also has other approaches for analyzing the data  SPARQL has unexpected issues when graph isomorphism or automorphisms arise.


Download ppt "Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,"

Similar presentations


Ads by Google