Chuck Olson Software Engineer October 2015 Graph Databases and Java 1
Outline Assumptions What is a graph and what are they good for? What is a graph database? What is Neo4J and how does one use it? Case: Subway Model Results Compilation Questions 2
Audience Assumptions Working knowledge of: Java Relational databases 3
What is a graph? Collection of nodes and edges Edges can be directed (or not) Edges can represent many things 4 Chuck Jim Jay Gary Knows Coco Annoys
What is a graph? 5
Transportation Example 6 Denver 18:00 20:00 13:00 15:00 12:00 Los Angeles Chicago New York Dallas 16:00 17:00
What are graphs good for? Often map more directly to the structure of some object-oriented problems. Work best for storing “richly connected” data Many algorithms exist to extract useful information – Dijkstra’s shortest path – Minimum spanning tree (Kruskal and others) 7
What is a graph database? A NoSQL database that stores nodes and edges, and provides a mechanism to easily query information from it. Can contain nodes of different types Can have free-form attributes within nodes Can have edges (relationships) of different types Can have attributes attached to edges (distance, cost, relationship) Query mechanism 8
Why would I ever use one? Easier to find solutions to certain problems by framing data graphically. “The right tool for the job” 9
Neo4J Open source (GPLv3 for Community Edition) V1.0 released in 2010 Written in Java and Scala Managed by Neo Technology Uses the Property Graph Model Embedded or server Fully transactional Set of jar files ~30MB Query language: Cypher 10
How do you use Neo4J? Creating a database 11 // Location of database String dbPath = “/Users/chuck/myneodb”; GraphDatabaseFactory factory = new GraphDatabaseFactory(); GraphDatabaseBuilder builder = factory.newEmbeddedDatabaseBuilder(dbPath); GraphDatabaseService dbService = builder.newGraphDatabase();
How do you use Neo4J? Creating fixed node and edge types 12 // Node types public enum NodeLabel implements Label {Station}; // Relationship types public enum RelType implements RelationshipType {TRACKS_TO, ROUTE_TO, AIRWAY_TO};
How do you use Neo4J? Adding nodes to a database 13 // Create Station node Node node1 = dbService.createNode(NodeLabel.Station); // Set properties on the Station node1.setProperty("number", “100”); node1.setProperty("name", “State St”); // Add another Node node2 = dbService.createNode(NodeLabel.Station); node2.setProperty("number", “101”); node2.setProperty("name", “Lake St”);
How do you use Neo4J? Adding edges to a database 14 // Create edge from node1 to node2 Relationship edge = node1.createRelationshipTo(node2, RelType.ROUTE_TO); // Set props on the edge edge.setProperty("route", “State St Subway”); edge.setProperty("line", “Red”); // Create another edge of a different type. edge = node1.createRelationshipTo(node2, RelType.TRACK_TO);
How do you use Neo4J? Querying the database 15 // Returns station numbers of all stations in graph. String queryText = “MATCH (stn:Station) RETURN stn.number"; ExecutionEngine engine = new ExecutionEngine(dbService); ExecutionResult result = engine.execute(queryText); Iterator stnIt = result.columnAs("stn.number"); // Print results while (stnIt.hasNext()) System.out.println(;
Case: Studying Subways 16
Case: Studying Subways Questions we might want to ask: “Find all the stations that have air connectivity paths to station X that are less than K km” “Find all the train routes that that go through all stations that are N stops from station X” 17
Case: Studying Subways 18
Case: Studying Subways 19 Stations - Number (id) - Name - Lat - Lon Segments - SegmentId (id) - StationFromNumber - StationToNumber - Length - SegmentType SegmentTypes - SegmentTypeId (id) - TypeName LineSegments - LineId (id) - SegmentId - SegmentIndex Lines - LineId (id) - LineName - LineDirection Relational attempt…
Case: Studying Subways 20 Station Name Station Number Route Name Distance Graph attempt… Node ROUTE_TO Edge TRACKS_TO Edge AIRWAY_TO Edge
Case: Studying Subways 21 St. Paul’s 100 Bank 101 Cannon Street 200 Monument 201 Tower Hill 202 Tower Gateway 300 Red Green Yellow 1 km.7 km.2 km 1.8 km Green Yellow.1 km White/Blue
Case: Studying Subways Answering the question: returns all stations 2 track segments from station 200 (Cannon Street) 22 MATCH p=(fromStn:Station)-[edge:TRACKS_TO*2..2]- (toStn:Station {number:‘200’}) WHERE fromStn.number <> toStn.number RETURN distinct fromStn,toStn,fromStn.number
Drawbacks No standard query language like SQL. Vendor-specific. Query language learning curve. Lack of built-in visualization tools. 23
For Further Reading… 24 Ian Robinson, Jim Webber, Emil Eifrem Graph Databases, 2 nd Edition O’Reilly and Associates Rik Van Bruggen Learning Neo4J Packt Publishing
Questions 25