Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Topologies of Complex Networks Functions vs. Structures Lun Li Advisor: John C. Doyle Co-advisor: Steven H. Low Collaborators: David Alderson (NPS) Walter.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Analysis and Modeling of Social Networks Foudalis Ilias.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Spark: Cluster Computing with Working Sets
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
LOFAR Self-Calibration Using a Blackboard Software Architecture ADASS 2007Marcel LooseASTRON, Dwingeloo.
Computer Science 1 Web as a graph Anna Karpovsky.
A Study in NoSQL & Distributed Database Systems John Hawkins.
Models of Influence in Online Social Networks
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
Limits of Local Algorithms in Random Graphs
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
1 Delay Tolerant Network Routing Sathya Narayanan, Ph.D. Computer Science and Information Technology Program California State University, Monterey Bay.
Aemen Lodhi (Georgia Tech) Amogh Dhamdhere (CAIDA)
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
ISIM’06, Přerov ; Corporate Memory Corporate Memory: A framework for supporting tools for acquisition, organization and maintenance of information.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Efficient Labeling Scheme for Scale-Free Networks The scheme in detailsPerformance of the scheme First we fix the number of hubs (to O(log(N))) and show.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
GSAF: A Grid-based Services Transfer Framework Chunyan Miao, Wang Wei, Zhiqi Shen, Tan Tin Wee.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
Parallelizing Functional Tests for Computer Systems Using Distributed Graph Exploration Alexey Demakov, Alexander Kamkin, and Alexander Sortov
Of 17 Limits of Local Algorithms in Random Graphs Madhu Sudan MSR Joint work with David Gamarnik (MIT) 7/11/2013Local Algorithms on Random Graphs1.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Cohesive Subgraph Computation over Large Graphs
PREGEL Data Management in the Cloud
David Ostrovsky | Couchbase
Building and Analyzing Genome-Wide Gene Disruption Networks
Department of Computer Science University of York
Network Science: A Short Introduction i3 Workshop
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy of sciences, Bratislava 2 Swedish Institute of Computer Science Stockholm, Sweden

Overview Graph data management Graph databases –Characteristics –Unique features –Challenges GDB Benchmarking –Motivation –Related work Graph traversal benchmark –Goals –Design Preliminary results 21 November 20112

Booming area of R&D in recent years Reasons: –Increased availability and importance of graph data –Natural way for modelling various real world phenomena (networks: social, information, communication) Two dominant data management directions: –Distributed graph processing frameworks Mining/processing of large graphs –Pregel and clones (Goden Orb, Giraph) –Graph databases Persistent management of graph data –Neo4J, OrientDB, Dex Graph data management 21 November 20113

Graph databases Property graph data model –Graph structure –Elements have properties 21 November Node K1 Attr I1: val Attr I2: val Attr I3: val Node K1 Attr I1: val Attr I2: val Attr I3: val Node K3 Attr I1: val Attr I2: val Attr I3: val Node K3 Attr I1: val Attr I2: val Attr I3: val Node K4 Attr I1: val Attr I2: val Attr I3: val Node K4 Attr I1: val Attr I2: val Attr I3: val Node K2 Attr I1: val Attr I2: val Attr I3: val Node K2 Attr I1: val Attr I2: val Attr I3: val L1 L3 L2 L1

Graph databases Property graph data model –Graph structure –Elements have properties Unique feature –Graph topology capturing the relations of objects –Graph database should be Efficient in exploiting topology Allows for fast traversal Challenges –Traditionally – graph processing/traversing done in memory –Reasons: Data driven computation Random access pattern for data access 21 November 20115

Graph database benchmarking Motivation –Number of emerging graph data management solutions. –Which is right one for a specific problem? –Fair measurement of performance for distinct use cases. –Identify limits – what use cases have good performance. 21 November 20116

Graph database benchmarking Motivation –Number of emerging graph data management solutions. –Which is right one for a specific problem? –Fair measurement of performance for distinct use cases. –Identify limits – what use cases have good performance. Related work –Only few works address directly graph databases D. Dominguez-Sal et al: –Adoption of HPC benchmark for graph data processing –Design of a benchmark suitable for graph database systems GraphBench - basic benchmarking framework implementation 21 November 20117

Graph database benchmarking Motivation –Number of emerging graph data management solutions. –Which is right one for a specific problem? –Fair measurement of performance for distinct use cases. –Identify limits – what use cases have good performance. Traversal operation benchmarking –Graph topology – unique feature of the graph databases –Test the ability to do: Local traversals (exploring k-hops neighbourhood) Global traversals (traversals of whole graph) –Perform traversals in a memory constraint environment (can we deal efficiently with data sets exceeding the physical memory?) 21 November 20118

Benchmark design Fairness –Blueprints API – effort to provide common API –Using Blueprints – one implementation of benchmark for all the benchmarked systems Avoid bias of different implementation of benchmark for different systems –execution of the same sequence of operations on the same data log operations and their parameters in the first run over the defined data logs are persistent, allowing benchmarks to be rerun on different versions of a product, and the change in performance can thus be measured 21 November 20119

Benchmark design Data –Different data properties / distributions affects benchmark results E.g. dense vs. sparse graphs –Ideally, data sets properties similar to those of real world data sets –Use: scale free networks with small world properties social networks, the Internet, traffic networks, biological networks, and term co- occurrence networks LFR-Benchmark generator - networks with power-law degree distribution and implanted communities within the network 21 November

Benchmark design Traversal operations –Local traversals Compute local clustering coefficient (2-hops breadth first traversal) 3-hops breadth first traversal –Global traversals Compute connected components –Incomming / ougoing edges k-iterations of HITS algorithm Memory constraint environment Intermediate results for global traversals operations: –Kept in memory –Kept as properties on nodes 21 November

Benchmark implementation Implemented on top of Blueprints API Test performed on: –Neo4J, –DEX, –OrientDB 6, –Native RDF repository (NativeSail) –SGDB (research prototype ) Challenge: deal with differences in underlying systems, E.g.: –triple stores – naming constraints, –some impl. do not support properties on some elements –Some impl. do not support iteration over nodes/edges –Nodes Ids generation – user provided vs. autogenerated –Transaction support / no transactions 21 November

Benchmark Runs Performed on older hardware: –2G mem Data sets sizes: –1K, 10K, 40K, 50K, 100K, 200K, 400K, 800K, 1M –Most systems were not able to load nets with 400K+ edges (constraint: load 10K edges in less than 60 sec.) 21 November

Graph loading – elements insertion 21 November

Local traversal – BFS 3 hops 21 November

Global traversals – connected components 21 November

Conclusion Extending work on benchmarking graph databases Focusing on graph traversal operations Local/Global traversals Preliminary results: –Problem just to load larger datasets into GDBs –Stable performance for local traversals with 2-3 hops Suitable for most ego-centric node properties analysis –Bad performance for global traversal operations on larger networks 21 November

Thank you for your attention November

SemSets – activation spreading over network 21 November