A gentle introduction to graph databases

Slides:



Advertisements
Similar presentations
Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
Advertisements

COLUMN-BASED DBS BigTable, HBase, SimpleDB, and Cassandra.
Graph & BFS.
Graphs Chapter 28 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Chapter 9: Graphs Basic Concepts
MS Access 2007 IT User Services - University of Delaware.
Neo4j Sarvesh Nagarajan TODO: Perhaps add a picture here.
Course Topics Administering SQL Server 2012 Jump Start 01 | Install and Configure SQL Server04 | Manage Data 02 | Maintain Instances and Databases05 |
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
Introducing ArcGIS Chapter 1. Objectives  Understand the architecture of the ArcGIS program.  Become familiar with the types of data files used in ArcGIS.
1 12/2/2015 MATH 224 – Discrete Mathematics Formally a graph is just a collection of unordered or ordered pairs, where for example, if {a,b} G if a, b.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Graphs. Contents Terminology Graphs as ADTs Applications of Graphs.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Graph Concepts and Algorithms Using LEDA By Caroline Moore and Carmen Frerichs (252a-at and 252a-ao) each graph in the presentation was created using gw_basic_graph_algorithms.
Graph Database - Neo4j ISQS3358, Spring Graph Database A graph database is a database that uses graph structures for semantic queries with nodes,
Graph Terms By Susan Ott. Vertices Here are 7 vertices without any edges Each Vertex is labeled a different color and number.
IT 5433 LM3 Relational Data Model. Learning Objectives: List the 5 properties of relations List the properties of a candidate key, primary key and foreign.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
Graphs.
Database Systems: Design, Implementation, and Management Tenth Edition
Database management system (DBMS)
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
and Big Data Storage Systems
CS4222 Principles of Database System
CS 201: Design and Analysis of Algorithms
Basic Concepts Graphs For more notes and topics visit:
SQL Server 2017 Graph Database Inside-Out
Every Good Graph Starts With
Entity-Relationship Model
Modern Databases NoSQL and NewSQL
NOSQL databases and Big Data Storage Systems
Database Management  .
Databases A brief introduction….
Relational Algebra Chapter 4, Part A
Databases and Information Management
Russ Thomas Director, Information Services, TSYS
Introduction to Database Systems
Discrete Maths 9. Graphs Objective
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 10 Trees Slides are adopted from “Discrete.
11/18/2018 2:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Chapter 4 The Relational Model Pearson Education © 2009.
Graphs CSE 2011 Winter November 2018.
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Chapter 9: Graphs Basic Concepts
Applying Data Warehouse Techniques
Chapter 4 The Relational Model Pearson Education © 2009.
Instructor 彭智勇 武汉大学软件工程国家重点实验室 电话:
Chapter 10 ADO.
Test Your Tech Blogging is: Someone's online journal.
Relational Database Design
Database Management Systems
DATABASES WHAT IS A DATABASE?
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Chapter 1 Introduction to Database Processing
Introduction to NoSQL Database Systems
Applying Data Warehouse Techniques
Let’s Build a Tabular Model in Azure
Trees-2, Graphs Data Structures with C Chpater-6 Course code: 10CS35
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Copyright ©2012 by Pearson Education, Inc. All rights reserved
Chapter 9: Graphs Basic Concepts
Applying Data Warehouse Techniques
Introduction to Graphs
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

A gentle introduction to graph databases Michael Green A gentle introduction to graph databases

Michael Green Slides available afterwards DBA.SE chat

Plan of Attack What are graphs Databases - graph versus relational SQL Server’s functionality (demo) Where it might end up

What are Graphs “In a mathematician's terminology, a graph is a collection of points and lines connecting some .. of them.“ [1] Graph databases have a mathematical basis, as do relational. Point = nodes = vertex Line = edge = arc Points need not be connected. Edges connect exactly two nodes. [1] http://mathworld.wolfram.com/Graph.html

Graph Features A node need not be connected There is no upper limit on how many other nodes a node can connect to An edge connects exactly two nodes

Graph Features Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Labelled Some terminology Directed: *I* sent and email *to* you – direction Labels – multiple labels are OK

A simple example Michael (Person) SQL Saturday (Event) Presenting at Is about Works with SQL Server (Product) This is a directed graph. Edges are directed: there is a “from” and a “to”. Both nodes and edges can have properties (the name) and labels. In degree and out degree. A node can be disconnected in = out = zero. This one is cyclic.

Example: Tree Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Organisation’s org chart B-Tree Query plan An ERD is a graph but not a Tree

Example: Roads Directed / undirected – one-way streets Cyclic / acyclic Property graphs (weighted) – speed limits, tolls Connected / discrete components (Tasmania?) Simple / multigraph - at most one edge between two nodes Self-connected nodes

Directed? Service versus line Weighted (journey time) Cyclic mostly a tree except for the loop (the clue is in the name) and a few others Connected – not interested in stations to which we cannot take a train.

The internet in 2003 - http://www.opte.org/the-internet/ Directed, unweighted, cyclic, multigraph, self-connected The internet in 2003 - http://www.opte.org/the-internet/

What are Graph Databases “A database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.” [1] If the DBMS presents its interface as nodes & edges it is a graph database. [1] https://en.wikipedia.org/wiki/Graph_database

What are Graph Databases A node is the “thing” An edge is how things are connected Both can have properties Edges and nodes are “labelled” i.e. enumerable i.e. can have a PK

What are Graph Databases It’s about how the DBMS interfaces with consumers Many internal representations are possible On-disk storage is not a determinant Key-value, relational and graph can solve any given problem In-memory, columnstore, fixedvar – they’re all relational

Graph on the persistence spectrum Flat file Key – value Column family Relational Graph

Thinking in graphs Files & key-value rows & fields Relational sets, selections & projections Graph sets & paths

Graph DB Features Directed / undirected Cyclic / acyclic Property graphs (weighted) Connected / discrete components Simple / multigraph - at most one edge between two nodes Self-connected nodes Some terminology Specifically, DBs are directed

Graph versus Relational Entity type -> table Entity instance -> row Relationship -> FK Normalisation DRI The model enforces no container that corresponds to a table. Products are moving toward stronger schema. Labels take the role of defining types. No limit on which nodes an edge can connect (cf joining on non-FKs e.g. shoe_size <-> house_number <-> description)

Graph versus Relational Entity type -> table Entity instance -> row Relationship -> FK Normalisation DRI Entity type -> ? (label) Entity instance -> node Relationship -> edge Multi-role permitted Not mandated The model enforces no container that corresponds to a table. Products are moving toward stronger schema. Labels take the role of defining types. Edges can have properties; foreign keys cannot. No limit on which nodes an edge can connect (cf joining on non-FKs e.g. shoe_size <-> house_number <-> description)

Use Cases Where connectedness is as, or more, important than content Social – friends of friends (of friends of friends …) – especially indeterminate depth Fraud detection – “Is X connected to failed companies?” – “Are the parties in this transaction suspiciously connected?” Network modelling – “If this router goes down what services are lost?” Code dependency analysis – “If I change this data type, where must I re-program?” Crime detection – metadata Many other examples.

SQL Server Demo

Alternatives MS GraphEngine MS Azure CosmosDB Many other vendors – DBEngine, Wikipedia Neo4j, Cypher query language http://neo4j.com/docs/cypher-refcard/current/ DataStax: Graph on Cassandra, Gremlin programming API http://tinkerpop.apache.org/ NodeXL for MS Excel Cypher / Gremlin is like SQL / LINQ

Neo4j Example

Where it might end up RC1 release blog [1] ALTER existing tables to graph tables Extended to temporary, in-memory etc Transitive closure Pollymorphism Improved syntax, as for joins [1] https://blogs.technet.microsoft.com/dataplatforminsider/2017/04/20/graph-data-processing-with-sql-server-2017/

Where it might end up GPUs – node-wise parallel execution R / Python / data science & AI Visualisation in SSMS, SSRS SSAS – OLAP graphs LINQ to SQL Graph Path analytic functions SSMS, SSRS present geography results differently

Some links Graph processing with SQL Server https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview Graph version of Wide World Importers https://github.com/Microsoft/sql-server-samples/tree/master/samples/features/sql-graph Graph Engine https://www.graphengine.io/ Azure Cosmos DB https://docs.microsoft.com/en-us/azure/cosmos-db/