David Ostrovsky | Couchbase Who’s afraid of graphs? David Ostrovsky | Couchbase
The Seven Bridges of Konigsberg Problem Leonard Euler The Seven Bridges of Konigsberg Problem Devise a route through the city that only crosses each bridge once. Paper published in 1736 – regarded as the first paper on Graph Theory. Konigsberg, Prussia – which is Kaliningrad, Russia today.
Graph Databases Use Nodes, Edges and Properties to store data. Important to note that a graph database has: Native graph storage – the engine is built to handle graph data Native graph processing capability, including index-free adjacency to facilitate traversals
Use Cases For Graph Databases Social – of course Recommendation systems (a logical extension from the social graph, or stand-alone – find all customers who bought a book that X customers liked., then find all books similar to that one, etc.) Managing interconnected datasets: Networks, Organization Hierarchies, ACL, in-game economy, etc. Geo-location and routing (think Waze or network routing.) Use-cases for migrating from RDBMS: Problems with JOIN performance Continually evolving dataset or open-ended business requirements The domain is naturally designed for graph representation
Meet the Players For comparison – MongoDB has a score of 330.47, Cassandra 124.21
Databases vs Frameworks Real-time queries Smaller datasets Standard NoSQL features (scaling, HA, etc.) Offline/batch Larger datasets Relies on big data platform (usually Hadoop) Frameworks: Giraph – apache project, used by Facebook to power it’s graph search and process trillions of connections. GraphX – Integrated with Apache Spark, has a library of build in algorithms and ETL functionality. Doesn’t perform as well as Giraph. Franus (from the same team as Titan) GraphLab – open source graph toolkit.
Querying and Traversal
(a) –[:FRIEND]-> (b) Cypher (Neo4j) a b FRIEND (a) –[:FRIEND]-> (b)
SQL-Derivatives (OrientDB)
g.v(1).outE('friend').inV.name // Starting with vertex 1 // find outgoing edges ‘friend’, // follow to the next vertex, // and return the property ‘name’. Gremlin is the graph traversal language of Apache TinkerPop, which in turn a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
Scaling Graphs is Hard Most graph partitioning algorithms fall into the N—Hard category, which is a set of problems that are at least as hard as the hardest problem in NP. Some specialized graph partitioning algorithms have NP-Complete complexity. So unless P=NP, graph partitioning solutions will continue to rely on approximations and various statistical approaches.
Clustering Architecture Neo4J Clustering Architecture
Polyglot Persistence To the Rescue