Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig.

Similar presentations


Presentation on theme: "1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig."— Presentation transcript:

1 1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig Supervised by: Dr. Lena Wiese Georg-August-University Göttingen Prof. Dr. René Peinl, Florian Holzschuher Performance of graph query languages: Comparison of Cypher, Gremlin and native access in Neo4j

2 2 Agenda RDBMS Reason for NoSQL Categories of NoSQL databases Comparison of popular NoSQL databases Motivation Neo4j and Query Languages Comparison of Neo4j to other databases Testing (importance of benchmarking, different suites) Results Limitations and Future work

3 3 Introduction 1 RDBMS For decades relational databases been a dominant choice Structured Query Language (SQL) retrieves data with ease Currently, Outsized volumes of dynamic data is been developed Strict schemas and joining several tables for answering queries Not a good choice for current state So we require dynamic schemas, high scalability, high performance and so on

4 4 Introduction 2 NoSQL databases are the first choice now, solves most the problems Graph databases are best suited for storing networks of data (social networking) Features – NoSQL database has a proper query language – NoSQL databases do either trade availability or consistency in favor of partition-tolerance (CAP). – Neo4j, Cassandra, MongoDB, BigTable to name a few It is an ideal choice for web 2.0

5 5 NoSQL databases Four important categories of NoSQL databases Key-values StoresColumn Family Stores Document storesGraph Databases Simplest and easy to implement, having a hash table with a unique key to the value as a pointer Widely used for data distribution, where keys point to multiple columns Used for semi structured data, storing it in JSON format similar to key-value store Used for storing graph like data e.g. social networks Redis, Oracle BDB, Voldemort BigTable model of Google MongoDBNeo4j

6 6 Comparison Between Popular NoSQL Databases MongoDB (Document-oriented) Rank No. 1 Cassandra (Wide Column) Rank No. 2 Neo4j (Graph) Rank No. 5 Replication and Failover for high availability Trade off is done for consistency providing high availability Neo4j which is very similar to MongoDB with blocking replication, cluster setup for high availability Consistency is default, auto sharding to ease scalability, replication, full index support Cassandra with incremental scalability, high availability, very eventually consistent Neo4j with scalable clustering support, runtime failover, Live Backup support

7 7 Different types of DBs and Languages DatabasesLanguages Relational DatabasesSQL XML databasesXPATH, XQUERY RDFRQL, SPARQL Objected orientedOQL MultidimensionalMDX GraphCypher, Gremlin

8 8 Motivation To measure the performance of different graph query languages and native access in Neo4j Compare ease of understanding, code readability, maintainability of the languages Test the performance and correctness of these graph databases Apache Shindig, for hosting OpenSocial applications Compare performance of different back-ends on Neo4j

9 9 Neo4j and Query Languages Neo4j, is an open-source NoSQL graph database Which implements the property graph data model Neo4j has a native Java Api with a traversal framework Features – Supports ACID properties – Runtime failover – High performance – Scalability – Very good documentation – Very good query language, Cypher Cypher, declarative query language similar to SQL Gremlin, Groovy based query language

10 10 Comparison of Neo4j to other DBs Existing Work Neo4j and MySQLNeo4j and Other graph database Neo4j retrieved results faster than relational databases Data used for testing performance: 1k, 32k and 1m nodes reaching from 9k relationships to 8.4 million relationships Flexible than MySQLJena and HypergraphDB were not able to load the database in a specified time Query times are 2-5 times lower that MySQL for their 500 objects data set DEX and Neo4j were able to load the largest benchmark sizes Neo4j performed better at the structural type queries than SQL Jena could load the graph with 1M nodes faster than Neo4j but it couldn’t scale Neo4j were slower than MySQL with integer data Neo4j is faster than DEX for the large dataset, and the reverse happens for the small dataset So, Neo4j is used for queries like friendship, movie favorites and more complicated commercial purposes queries DEX is able to scale better, whereas Neo4j obtained a good throughput

11 11 Setup Apache shindig 2.5, for hosting OpenSocial applications Neo4j has a native Java Api with which we can retrieve and traverse methods Also directly accessible when neo4j is in embedded mode A RESTful (REST stands for Representational State Transfer) web service interface Several wrappers for various programming languages like python and java Cypher is used for all the CRUD (create, read, update and delete) Gremlin does both imperative and declarative querying

12 12 Data Used for testing 2011 people 26,982 messages 24,365 activities 2000 address 200 groups 100 organizations They even tested on a bigger dataset 10,003 people One had at least 1 friend or a maximum of 667 friends from 25,0000 friendship relationships For bigger dataset 10,003 people, there were 137,000 friendships in total, a maximum of 1,448 friends for one person

13 13 Suites used for testing Neo4j embedded Neo4j REST Neo4j Cypher embedded Neo4j Cypher REST Neo4j Gremlin Rest MySQL JPA These suites retrieves profiles, friends, group recommendations and other social networking features

14 14 Results 1 Comparison of query languages and native access Native object accessCypherGremlinSQL Can retrieve and traverse methods, with a traversal framework Declarative query language does all the CRUD operations Groovy based query language with a compact syntax Structured query language, simple to understand Difficult to learn,Easy to learn,Difficult to learnEasy to learn Several lines of codes for simple retrieval Simple and easy to understand Compact syntax, difficult to understand Several lines of code ComparableGood for complex retrieval Good for small retrieval Slows down for complicated queries

15 15 Results 2 - Gremlin vs. Cypher Cypher START person= node:people(id = {id}) MATCH person-[:FRIEND_OF] -> friend-[:FRIEND_OF] -> friend_of_friend WHERE not (friend_of_friend <- [:FRIEND_OF]-person) RETURN friend_of_friend, COUNT(*) ORDER BY COUNT(*) DESC Gremlin t = new Table(); x = [];" g.idx('persons')[[id:id_param]]. out('FRIEND_OF').fill(x);" g.idx('persons')[[id:id_param]].out('FRIEND_OF'). out('FRIEND_OF').dedup().except(x).id.as('ID'). back(1).displayName.as('name'). table(t,['ID','name']){it}{it}.iterate(); t Friend Suggestion For A Person

16 16 Results 3 - Gremlin vs. Cypher QueriesCypher and Gremlin Performance Friend queries (simple)Gremlin is bit faster than Cypher Peoples queriesGremlin is slower than Cypher Message queriesGremlin is on par with Cypher FOAF queries (complicated)Cypher better than Gremlin Gremlin is slower when there are complicated pattern matching Complex queries with many properties, relationships Cypher out performed Gremlin Gremlin is better for simple cases

17 17 Results 4 - from Original Paper Figure 1: 2000 people in msFigure 2: Gremlin vs Cypher in ms

18 18 Results 5 Embedded instance way faster than DBMS over the network Neo4j query languages outperform JPA for friend queries Remote access with REST slower compared to the embedded Neo4j native object access JPA VS RESTful cypher and gremlin very interesting – For person profile JPA back-end performances equally good as RESTful cypher

19 19 Results 6 Friend queries are more than one order of magnitude slower for JPA Neo4j showed a constant performance when increasing from 2000 to 10,000 persons MySQL drops performance by a factor of 5 for people queries MySQL drops performance by a factor of 7-9 for peoples friends queries Restful case is slower than JPA in most of the cases

20 20 Limitation The data which they used was realistic to an extent Results always showed some fluctuations Not good for benchmarking and using the results for further research because of fluctuations They have used different Cypher queries for embedded and rest benchmarking Neo4j’s normal server settings were used Haven't tested Neo4j´s advanced version with load balancing

21 21 Conclusion and Future work Analyzed the performance and programming effort for different back-ends Compared JPA back-end using MySQL with Cypher and Gremlin Neo4j with Cypher had better performance overall Gremlin performed better with simple queries Cypher performed better with complicated queries Neo4j is a good replacement for the traditional RDBMS for web 2.0 Future work: To implement and test with an interesting approach of spring data Neo4j

22 22


Download ppt "1 Analysis on the performance of graph query languages: Comparative study of Cypher, Gremlin and native access in Neo4j Athiq Ahamed, ITIS, TU-Braunschweig."

Similar presentations


Ads by Google