Download presentation
Presentation is loading. Please wait.
Published byMavis Cummings Modified over 9 years ago
1
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2011- 2324P Triangle Finding: How Graph Theory can Help the Semantic Web Edward Jimenez, Eric Goodman
2
The Semantic Web as a Graph
4
Optimizing Queries with Graph Theory Graph theory has a lot to offer the semantic web One example: triangle finding O(|E| 1.5 ) Much more efficient than what a typical database would do. Query2 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:GraduateStudent. ?Y rdf:type ub:University. ?Z rdf:type ub:Department. ?X ub:memberOf ?Z. ?Z ub:subOrganizationOf ?Y. ?X ub:undergraduateDegreeFrom ?Y} Query9 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:Student. ?Y rdf:type ub:Faculty. ?Z rdf:type ub:Course. ?X ub:advisor ?Y. ?Y ub:teacherOf ?Z. ?X ub:takesCourse ?Z}
5
Experiment Compare these three approaches, finding all triangles in a graph Sesame Jena MultiThreaded Graph Library (MTGL) MTGL Open source library of graph algorithms, targeted towards shared memory supercomputers Used MTGL’s implementation of J. Cohen’s triangle finding algorithm Had to modify slightly to allow for multiple edges between vertices.
6
Data Data: An Recursive Matrix (R-MAT) graph Specify |V| edge factor (average number of edges per vertex) Probabilities a, b, c, d, where a+b+c+d=1. Has properties similar to real-world graphs such as short diameters and small-world properties. Used as basis of Graph500 benchmark. Nodes are given a unique IRI and edges are given a random value. |V| = {2 5 -2 19 } Edge factor: {16, 32, 64} a b c d a b c d
7
Possible Triangles
8
Trying to Find Triangles via SPARQL SELECT ?X ?Y ?Z WHERE { {?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X } UNION {?Y ?a ?X ?Z ?b ?Y ?X ?c ?Z} UNION {?X ?a ?Y ?Y ?b ?Z ?X ?c ?Z} UNION {?X ?a ?Y. ?Z ?b ?Y. ?X ?c ?Z } UNION {?Y ?a ?X ?Y ?b ?Z ?X ?c ?Z} UNION {?Y ?a ?X ?Z ?b ?Y ?Z ?c ?X} UNION {?X ?a ?Y. ?Z ?b ?Y. ?Z ?c ?X } UNION {?Y ?a ?X ?Y ?b ?Z ?Z ?c ?X}} Redundant Solutions
9
The Problem: Graph Isomorphism ?X ?Z ?Y iii ?X ?Z ?Y iv ?X = Alice ?Y = Bob ?Z = Charlie Alice Bob Charlie ?X = Alice ?Y = Charlie ?Z = Bob Alice Charlie Bob
10
The Other Problem: Automorphism ?X ?Z ?Y i Alice Bob Charlie Alice Bob ?X = Alice ?Y = Bob ?Z = Charlie ?X = Charlie ?Y = Alice ?Z = Bob
11
Possible Triangles
12
The SPARQL Query SELECT ?X ?Y ?Z WHERE {{ ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?X) < STR(?Y)) FILTER (STR(?Y) < STR(?Z)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?Y) > STR(?Z)) FILTER (STR(?Z) > STR(?X)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?X ?c ?Z }}
13
Cohen’s Triangle Algorithm Assumptions Simplified graph Completely connected Map 1: O(m) Use v 1 < v 2 < ··· < v n for tie-breaking
14
Cohen’s Triangle Algorithm Reduce: O(m 3/2 ), … …
15
Cohen’s Triangle Algorithm Map 2: O(m 3/2 ) Identity mapping of previous reduce step. Map edges v8v8 v8v8 v20v20 v20v20 v1v1 v1v1 v8v8 v8v8 v20v20 v20v20 v3v3 v3v3 v8v8 v8v8 v20v20 v20v20 v2v2 v2v2 bin … v8v8 v8v8 v20v20 v20v20 Reduce 2: O(m 3/2 ) Emit triangles for the contents of each bin when the edge exists between v i and v j.
16
Results: Growth of Triangles
17
Results
18
Comparison at Larger Scales With 1 billion edges, assuming the same constant An O(x 1.39 ) implementation versus an O(x 1.58 ) is 50x faster An O(x 1.39 ) implementation versus an O(x 1.83 ) is 9000x faster
19
Conclusions The Semantic Web is a graph Graph theory can add a lot in terms of speeding up queries It also has other approaches for analyzing the data SPARQL has unexpected issues when graph isomorphism or automorphisms arise.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.