Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,

Slides:



Advertisements
Similar presentations
Conclusion Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,
Advertisements

Timothy M. Shead Sandia National Laboratories
Lindsey Bleimes Charlie Garrod Adam Meyerson
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.
On the use of Graph Search Techniques for the Analysis of Extreme-scale Combustion Simulation Data Janine Bennett 1 William McLendon III 1 Guarav Bansal.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
1 Mark A. Rumsey Wind Energy Technology Department Sandia National Laboratories (SNL) Albuquerque, NM Wind Turbine Technology Sandia is a multiprogram.
On The Edge-Graceful and Edge-Magic Maximal Outerplanar Graphs
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
What is Program Management?
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
SAND Number: P Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department.
Improving Contaminant Mixing Models For Water Distribution Pipe Networks Siri Sahib S. Khalsa University of Virginia Charlottesville, VA
Storing RDF Data in Hadoop And Retrieval Pankil Doshi Asif Mohammed Mohammad Farhan Husain Dr. Latifur Khan Dr. Bhavani Thuraisingham.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Roger Hill Technical Director of GeoPowering the West Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
CASS-MT Review: 6-Apr-2011 Task 3: Semantic Databases on the XMT PNNL:David Haglin, Bob Adolf, Sinan al-Saffar, Cliff Joslyn Cray: David Mizell SNL: Eric.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Dax: Rethinking Visualization Frameworks for Extreme-Scale Computing DOECGF 2011 April 28, 2011 Kenneth Moreland Sandia National Laboratories SAND P.
TEDI: Efficient Shortest Path Query Answering on Graphs Author: Fang Wei SIGMOD 2010 Presentation: Dr. Greg Speegle.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
CASS-MT Review: 6-Apr-2011 Task 3: Semantic Databases on the XMT PNNL:David Haglin, Bob Adolf, Sinan al-Saffar, Cliff Joslyn Cray: David Mizell SNL: Eric.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Data Intensive Computing at Sandia September 15, 2010 Andy Wilson Senior Member of Technical Staff Data Analysis and Visualization Sandia National Laboratories.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
LAMMPS Users’ Workshop
Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.
STK (Sierra Toolkit) Update Trilinos User Group meetings, 2014 R&A: SAND PE Sandia National Laboratories is a multi-program laboratory operated.
Sandia is a multi-program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
Trilinos Strategic (and Tactical) Planning Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United.
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
2-0 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 2 Theoretical.
Research Meeting Jaeseok Myung. Copyright  2009 by CEBT Summary  TA DB : project 3, midterm(24 명 응시 ) WEC : report, project (android), classroom,
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Monitored Natural Attenuation of Metals and Radionuclide-Contaminated Sites Pat Brady Sandia National Laboratories Mike Truex Pacific Northwest National.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
On the Path to Trinity - Experiences Bringing Codes to the Next Generation ASC Platform Courtenay T. Vaughan and Simon D. Hammond Sandia National Laboratories.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Cohesive Subgraph Computation over Large Graphs
Analysis of algorithms
What is the next line of the proof?
Ray-Cast Rendering in VTK-m
Chapter 2.
Path Queries in Stardog
Analysis of algorithms
Presentation transcript:

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL SAND P Triangle Finding: How Graph Theory can Help the Semantic Web Edward Jimenez, Eric Goodman

The Semantic Web as a Graph

Optimizing Queries with Graph Theory  Graph theory has a lot to offer the semantic web  One example: triangle finding  O(|E| 1.5 )  Much more efficient than what a typical database would do. Query2 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:GraduateStudent. ?Y rdf:type ub:University. ?Z rdf:type ub:Department. ?X ub:memberOf ?Z. ?Z ub:subOrganizationOf ?Y. ?X ub:undergraduateDegreeFrom ?Y} Query9 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:type ub:Student. ?Y rdf:type ub:Faculty. ?Z rdf:type ub:Course. ?X ub:advisor ?Y. ?Y ub:teacherOf ?Z. ?X ub:takesCourse ?Z}

Experiment  Compare these three approaches, finding all triangles in a graph  Sesame  Jena  MultiThreaded Graph Library (MTGL)  MTGL  Open source library of graph algorithms, targeted towards shared memory supercomputers  Used MTGL’s implementation of J. Cohen’s triangle finding algorithm  Had to modify slightly to allow for multiple edges between vertices.

Data  Data: An Recursive Matrix (R-MAT) graph  Specify  |V|  edge factor (average number of edges per vertex)  Probabilities a, b, c, d, where a+b+c+d=1.  Has properties similar to real-world graphs such as short diameters and small-world properties.  Used as basis of Graph500 benchmark.  Nodes are given a unique IRI and edges are given a random value.  |V| = { }  Edge factor: {16, 32, 64} a b c d a b c d

Possible Triangles

Trying to Find Triangles via SPARQL SELECT ?X ?Y ?Z WHERE { {?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X } UNION {?Y ?a ?X ?Z ?b ?Y ?X ?c ?Z} UNION {?X ?a ?Y ?Y ?b ?Z ?X ?c ?Z} UNION {?X ?a ?Y. ?Z ?b ?Y. ?X ?c ?Z } UNION {?Y ?a ?X ?Y ?b ?Z ?X ?c ?Z} UNION {?Y ?a ?X ?Z ?b ?Y ?Z ?c ?X} UNION {?X ?a ?Y. ?Z ?b ?Y. ?Z ?c ?X } UNION {?Y ?a ?X ?Y ?b ?Z ?Z ?c ?X}} Redundant Solutions

The Problem: Graph Isomorphism ?X ?Z ?Y iii ?X ?Z ?Y iv ?X = Alice ?Y = Bob ?Z = Charlie Alice Bob Charlie ?X = Alice ?Y = Charlie ?Z = Bob Alice Charlie Bob

The Other Problem: Automorphism ?X ?Z ?Y i Alice Bob Charlie Alice Bob ?X = Alice ?Y = Bob ?Z = Charlie ?X = Charlie ?Y = Alice ?Z = Bob

Possible Triangles

The SPARQL Query SELECT ?X ?Y ?Z WHERE {{ ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?X) < STR(?Y)) FILTER (STR(?Y) < STR(?Z)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?Z ?c ?X FILTER (STR(?Y) > STR(?Z)) FILTER (STR(?Z) > STR(?X)) } UNION { ?X ?a ?Y. ?Y ?b ?Z. ?X ?c ?Z }}

Cohen’s Triangle Algorithm  Assumptions  Simplified graph  Completely connected  Map 1: O(m)  Use v 1 < v 2 < ··· < v n for tie-breaking

Cohen’s Triangle Algorithm  Reduce: O(m 3/2 ), … …

Cohen’s Triangle Algorithm  Map 2: O(m 3/2 )  Identity mapping of previous reduce step.  Map edges v8v8 v8v8 v20v20 v20v20 v1v1 v1v1 v8v8 v8v8 v20v20 v20v20 v3v3 v3v3 v8v8 v8v8 v20v20 v20v20 v2v2 v2v2 bin … v8v8 v8v8 v20v20 v20v20  Reduce 2: O(m 3/2 )  Emit triangles for the contents of each bin when the edge exists between v i and v j.

Results: Growth of Triangles

Results

Comparison at Larger Scales  With 1 billion edges, assuming the same constant  An O(x 1.39 ) implementation versus an O(x 1.58 ) is 50x faster  An O(x 1.39 ) implementation versus an O(x 1.83 ) is 9000x faster

Conclusions  The Semantic Web is a graph  Graph theory can add a lot in terms of speeding up queries  It also has other approaches for analyzing the data  SPARQL has unexpected issues when graph isomorphism or automorphisms arise.