1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.

Slides:



Advertisements
Similar presentations
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Advertisements

NoSQL, No SQL!!, No, SQL? Raj Nair, Penton. Variety is the spice of life Key-Value stores Document stores ColumnFam ily Graph Hybrid Spice can lead to.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
Reporter: Haiping Wang WAMDM Cloud Group
Introduction to Backend James Kahng. Install Node.js.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
Neo4j Adam Foust.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Triple Stores.
Titan Graph Database Meet Bhatt(13MCEC02).
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
NoSQL by Michael Britton, Mark McGregor, and Sam Howard
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
NoSQL Not Only SQL Edel Sherratt. What is NoSQL? Not Only SQL Large volumes of data No schema Partition tolerance – scale by adding more commodity servers.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
© Copyright 2013 STI INNSBRUCK
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
CSE 3330 Database Concepts MongoDB. Big Data Surge in “big data” Larger datasets frequently need to be stored in dbs Traditional relational db were not.
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Introduction to MongoDB
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Context Aware RBAC Model For Wearable Devices And NoSQL Databases Amit Bansal Siddharth Pathak Vijendra Rana Vishal Shah Guided By: Dr. Csilla Farkas Associate.
Orion Contextbroker PROF. DR. SERGIO TAKEO KOFUJI PROF. MS. FÁBIO H. CABRINI PSI – 5120 – TÓPICOS EM COMPUTAÇÃO EM NUVEM
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Neo4j: GRAPH DATABASE 27 March, 2017
2 Phase Commit Protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of atomic commitment.
Don't Know Jack About Object-Relational Mapping?
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
and Big Data Storage Systems
The Client-Server Model
Triple Stores.
CS122B: Projects in Databases and Web Applications Winter 2017
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NoSQL Database and Application
Modern Databases NoSQL and NewSQL
NOSQL.
CMPE 280 Web UI Design and Development October 17 Class Meeting
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
NOSQL and CAP Theorem.
CSE 482 Lecture 5: NoSQL.
Introduction to NoSQL Database Systems
CMPE 280 Web UI Design and Development March 14 Class Meeting
Triple Stores.
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
Presentation transcript:

1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig Supervised by: Dr. Lena Wiese Georg-August-University Göttingen Prof. Dr. René Peinl, Florian Holzschuher Performance of graph query languages: Comparison of Cypher, Gremlin and native access in Neo4j

2 Agenda RDBMS Reason for NoSQL Categories of NoSQL databases Comparison of popular NoSQL databases Motivation Neo4j and Query Languages Comparison of Neo4j to other databases Testing (importance of benchmarking, different suites) Results Limitations and Future work

3 Introduction 1 RDBMS For decades relational databases been a dominant choice Structured Query Language (SQL) retrieves data with ease Currently, Outsized volumes of dynamic data is been developed Strict schemas and joining several tables for answering queries Not a good choice for current state So we require dynamic schemas, high scalability, high performance and so on

4 Introduction 2 NoSQL databases are the first choice now, solves most the problems Graph databases are best suited for storing networks of data (social networking) Features – NoSQL database has a proper query language – NoSQL databases do either trade availability or consistency in favor of partition-tolerance (CAP). – Neo4j, Cassandra, MongoDB, BigTable to name a few It is an ideal choice for web 2.0

5 NoSQL databases Four important categories of NoSQL databases Key-values StoresColumn Family Stores Document storesGraph Databases Simplest and easy to implement, having a hash table with a unique key to the value as a pointer Widely used for data distribution, where keys point to multiple columns Used for semi structured data, storing it in JSON format similar to key-value store Used for storing graph like data e.g. social networks Redis, Oracle BDB, Voldemort BigTable model of Google MongoDBNeo4j

6 Comparison Between Popular NoSQL Databases MongoDB (Document-oriented) Rank No. 1 Cassandra (Wide Column) Rank No. 2 Neo4j (Graph) Rank No. 5 Replication and Failover for high availability Trade off is done for consistency providing high availability Neo4j which is very similar to MongoDB with blocking replication, cluster setup for high availability Consistency is default, auto sharding to ease scalability, replication, full index support Cassandra with incremental scalability, high availability, very eventually consistent Neo4j with scalable clustering support, runtime failover, Live Backup support

7 Different types of DBs and Languages DatabasesLanguages Relational DatabasesSQL XML databasesXPATH, XQUERY RDFRQL, SPARQL Objected orientedOQL MultidimensionalMDX GraphCypher, Gremlin

8 Motivation To measure the performance of different graph query languages and native access in Neo4j Compare ease of understanding, code readability, maintainability of the languages Test the performance and correctness of these graph databases Apache Shindig, for hosting OpenSocial applications Compare performance of different back-ends on Neo4j

9 Neo4j and Query Languages Neo4j, is an open-source NoSQL graph database Which implements the property graph data model Neo4j has a native Java Api with a traversal framework Features – Supports ACID properties – Runtime failover – High performance – Scalability – Very good documentation – Very good query language, Cypher Cypher, declarative query language similar to SQL Gremlin, Groovy based query language

10 Comparison of Neo4j to other DBs Existing Work Neo4j and MySQLNeo4j and Other graph database Neo4j retrieved results faster than relational databases Data used for testing performance: 1k, 32k and 1m nodes reaching from 9k relationships to 8.4 million relationships Flexible than MySQLJena and HypergraphDB were not able to load the database in a specified time Query times are 2-5 times lower that MySQL for their 500 objects data set DEX and Neo4j were able to load the largest benchmark sizes Neo4j performed better at the structural type queries than SQL Jena could load the graph with 1M nodes faster than Neo4j but it couldn’t scale Neo4j were slower than MySQL with integer data Neo4j is faster than DEX for the large dataset, and the reverse happens for the small dataset So, Neo4j is used for queries like friendship, movie favorites and more complicated commercial purposes queries DEX is able to scale better, whereas Neo4j obtained a good throughput

11 Setup Apache shindig 2.5, for hosting OpenSocial applications Neo4j has a native Java Api with which we can retrieve and traverse methods Also directly accessible when neo4j is in embedded mode A RESTful (REST stands for Representational State Transfer) web service interface Several wrappers for various programming languages like python and java Cypher is used for all the CRUD (create, read, update and delete) Gremlin does both imperative and declarative querying

12 Data Used for testing 2011 people 26,982 messages 24,365 activities 2000 address 200 groups 100 organizations They even tested on a bigger dataset 10,003 people One had at least 1 friend or a maximum of 667 friends from 25,0000 friendship relationships For bigger dataset 10,003 people, there were 137,000 friendships in total, a maximum of 1,448 friends for one person

13 Suites used for testing Neo4j embedded Neo4j REST Neo4j Cypher embedded Neo4j Cypher REST Neo4j Gremlin Rest MySQL JPA These suites retrieves profiles, friends, group recommendations and other social networking features

14 Results 1 Comparison of query languages and native access Native object accessCypherGremlinSQL Can retrieve and traverse methods, with a traversal framework Declarative query language does all the CRUD operations Groovy based query language with a compact syntax Structured query language, simple to understand Difficult to learn,Easy to learn,Difficult to learnEasy to learn Several lines of codes for simple retrieval Simple and easy to understand Compact syntax, difficult to understand Several lines of code ComparableGood for complex retrieval Good for small retrieval Slows down for complicated queries

15 Results 2 - Gremlin vs. Cypher Cypher START person= node:people(id = {id}) MATCH person-[:FRIEND_OF] -> friend-[:FRIEND_OF] -> friend_of_friend WHERE not (friend_of_friend <- [:FRIEND_OF]-person) RETURN friend_of_friend, COUNT(*) ORDER BY COUNT(*) DESC Gremlin t = new Table(); x = [];" g.idx('persons')[[id:id_param]]. out('FRIEND_OF').fill(x);" g.idx('persons')[[id:id_param]].out('FRIEND_OF'). out('FRIEND_OF').dedup().except(x).id.as('ID'). back(1).displayName.as('name'). table(t,['ID','name']){it}{it}.iterate(); t Friend Suggestion For A Person

16 Results 3 - Gremlin vs. Cypher QueriesCypher and Gremlin Performance Friend queries (simple)Gremlin is bit faster than Cypher Peoples queriesGremlin is slower than Cypher Message queriesGremlin is on par with Cypher FOAF queries (complicated)Cypher better than Gremlin Gremlin is slower when there are complicated pattern matching Complex queries with many properties, relationships Cypher out performed Gremlin Gremlin is better for simple cases

17 Results 4 - from Original Paper Figure 1: 2000 people in msFigure 2: Gremlin vs Cypher in ms

18 Results 5 Embedded instance way faster than DBMS over the network Neo4j query languages outperform JPA for friend queries Remote access with REST slower compared to the embedded Neo4j native object access JPA VS RESTful cypher and gremlin very interesting – For person profile JPA back-end performances equally good as RESTful cypher

19 Results 6 Friend queries are more than one order of magnitude slower for JPA Neo4j showed a constant performance when increasing from 2000 to 10,000 persons MySQL drops performance by a factor of 5 for people queries MySQL drops performance by a factor of 7-9 for peoples friends queries Restful case is slower than JPA in most of the cases

20 Limitation The data which they used was realistic to an extent Results always showed some fluctuations Not good for benchmarking and using the results for further research because of fluctuations They have used different Cypher queries for embedded and rest benchmarking Neo4j’s normal server settings were used Haven't tested Neo4j´s advanced version with load balancing

21 Conclusion and Future work Analyzed the performance and programming effort for different back-ends Compared JPA back-end using MySQL with Cypher and Gremlin Neo4j with Cypher had better performance overall Gremlin performed better with simple queries Cypher performed better with complicated queries Neo4j is a good replacement for the traditional RDBMS for web 2.0 Future work: To implement and test with an interesting approach of spring data Neo4j

22