Joe Sack, Principal Program Manager, Microsoft

Joe Sack, Principal Program Manager, Microsoft Joe.Sack@Microsoft.com
SQLintersection Graph Database Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft

Session Objectives Understand graph scenarios and when they may be useful Learn about what’s being offered in SQL Server 2017 and Azure SQL Database Learn enough to be able to get quickly started on your own Questions, feedback or issues we don’t get to? me at

SQL Server 2017 Themes Choice Mindshare Adaptability
Adapt based on customer workload characteristics Adaptability Provide customers with a choice Choice Leverage the strength of strong technical communities Mindshare

Ingredient Recall Scenario

Understanding the impact

Relational vs. Graph Graph and relational designs can answer the same questions But if traversal of relationships define the primary application requirements, Graph can solve this more intuitively and with less code

Define: Graph Nodes: Entities – for example, stores, people, products, businesses Edges: Relationships between nodes, lines that connect nodes to other nodes Properties or Attributes: information associated with specific nodes and edges Graph: a collection of nodes and edges (or entities and relationships)

What is a graph database?
Edges or relationships are first class entities in a Graph Database and can have attributes or properties associated with them A single edge can flexibly connect multiple nodes in a Graph Database You can express pattern matching and multi-hop navigation queries easily Supports OLTP and OLAP (analytics) just like SQL databases

Hierarchical or interconnected data, entities with multiple parents.
Why Graph Databases? Manages Hierarchical or interconnected data, entities with multiple parents. Analyze interconnected data, materialize new information from existing facts. Identify non-obvious connections A Manages Manages O works_for Location Location Manages Manages A Manages B C D Manages Manages Manages Leads Manages Leads Leads Location B C D Leads Leads E Leads F collaborates collaborates Complex many-to-many relationships. Organically grow connections as the business evolves. Section 1: It is somewhat complex to implement hierarchical data in relational databases, entities might have bi-directional links, or multiple parents for example, here you can see that Node F has 2 parents, Node D who manages and leads F and A who is a lead of F for another project that F is working on. Section 2: There could be complex many to many relationships in your schema and relationships could be evolving with growing business. For example, you have managers and leads in your organization. Now you also want to find out which people or teams are collaborating with each other on projects, so you want to add another relationship “collaborates”. In a relational database, you will have to modify existing schema to keep that information, which could be a time consuming operation based on the size of your schema any may result in complicating your queries later on. In Graph database, you can add new node or edge types anytime, so your schema organically evolves with growing business needs. Section 3: Graph databases help you to derive conclusions from existing information. For example, We know that A –manages->B and A-worksfor->O organization, so we can conclude from this information that B works for O. In a large data set, graph databases can help you identify non-obvious connections between different entities.

Graph Scenarios Recommendation engines Social networks
Networks and IT infrastructure topologies Fraud detection Product catalog with sales and marketing data IOT device telemetry

SQL Server 2017 Support for graph data
Native nodes and edge table support Query language extension provides multi- hop navigation using join-free pattern matching Query across regular SQL tables and graph data Interoperability – for example, support for Columnstore indexes

Why Extend SQL Server to support Graphs?
Trusted by many customers for enterprise and mission critical applications Mature product, supports many advanced technologies like high availability, disaster recovery etc. Also comes with cutting edge technologies like Columnstore, advanced analytics, ML etc. One platform with no need to extract, transform and load data into another system to analyze relationships

Recommendation Engine
Scenario examples Product sales data If a person P bought product A, find friends P who bought A and also bought other products. Make recommendation based on that. Social Network Yelp like application, find me friends who like the same restaurant that I like and also recommend other restaurants. Friends location hasBought isLooking sellsAt Product

Recommending Songs: Approach
User1 Song1 User2 Song2 User3 Song3 User4 Song4 User5 Song5

The Million Song Dataset
Data courtesy of the Million Song Dataset ( and the Echo Nest Taste Profile Subset ( ) Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. The Echo Nest Taste profile subset, the official user data collection for the Million Song Dataset, available at:

Implementation using SQL Graph
UniqueUser (node table) UniqueSong (node table) Likes (edge table) CREATE TABLE UniqueUser (UserId VARCHAR(80)) AS NODE CREATE TABLE Likes (ListenCount BIGINT)) AS EDGE CREATE TABLE UniqueSong (SongId VARCHAR(50) ,SongTitle VARCHAR(500) ,ArtistName VARCHAR(500)) AS NODE

Demo SQL Graph ( Special thanks to Arvind Shyamsundar from the SQLCAT team!)

Storing edge and nodes in tables
Natural choice for us (no need to re-invent) Storing edge data in a separate table allows us to benefit from the query optimizer, which can pick the optimal join strategy for large queries Depending on the complexity of query and data statistics, the optimizer can pick a nested loop join, hash join, or other join strategies

$node_id Node tables have an implicit $node_id column created which uniquely identifies a given node in the database Combination of object_id of that node table and an internally generated bigint value When the $node_id column is selected, a computed value in the form of a JSON string is displayed $node_id is a pseudo column, that maps to an internal name with hex string in it. When you select $node_id from the table, the column name will appear as $node_id_\<hex_string>

Edge Columns

sys.tables sys.columns Metadata is_node and is_edge
graph_type and graph_type_desc Shows graph column types like “from ID” and “to ID”

Indexing Nodes We create a default unique, non-clustered index on $node_id by default Edges We automatically create a unique non-clustered index on $edge_id For OLTP scenarios, we recommend that users create indexes on these columns ($from_id, $to_id) for faster lookups in the direction of the edge

DDL (beyond creating the table)
ALTER TABLE Supported for user-defined columns, indexes or constraints CREATE INDEX Supported on user-defined and pseudo-columns for node and edge tables, including CCI and NCCI DROP TABLE Supported, but we don’t automatically cascade deletion of edges or nodes

Supported table types Regular tables are supported
Not supported for nodes or edge tables (for this release): Memory-optimized Temporary tables (global and local) Temporal tables Stretch tables External tables (PolyBase)

Can I alter an existing table into a node or edge table?
In the first release, ALTER TABLE to convert an existing relational table into a node or edge table is not supported Instead, users can create a node table and use INSERT INTO … SELECT FROM to populate data into the node table To populate an edge table from an existing table, proper $from_id and $to_id values must be obtained from the node tables.

INSERTs into Graph tables
Inserting into a node is same as inserting into any relational table; the values for the $node_id column are automatically generated. For edge table, users must provide values for $from_id and $to_id columns

DELETE, UPDATE, MERGE DELETE UPDATE MERGE
Works the same as regular table No constraints to check for edges pointing to deleted nodes, or nodes used with edges (no cascade) UPDATE You can update user-defined columns – but not $from_id, $to_id MERGE Not supported in this version

How can I ingest unstructured data?
Since we are storing data in tables, users must know the schema at the time of creation Users can always add new types of nodes or edges to their schema If you want to modify an existing node or edge table, they can use ALTER TABLE to add or delete attributes If you expect any unknown attributes in your schema, you could either use sparse columns or create a column to hold JSON strings and use that as a placeholder for unknown attributes

Transitive Closure How do I find a node connected to me and arbitrary number of hops away, in my graph? The ability to navigate through a combination of nodes and edges, an arbitrary number of times, is called transitive closure For example, find all the people connected to me through three levels of indirections or find the employee chain for a given employee in an organization Transitive closure is not supported in the first release and we’re going to publish alternative working examples in the meantime

Graph Polymorphism How do I find ANY Node connected to me in my graph?
The ability to find any type of node connected to a given node in a graph is called polymorphism SQL graph does not support polymorphism in the first release A possible workaround is to write queries with UNION clause over a known set of node and edge types

MATCH Specifies the search condition for a graph
Can only be used with graph node and edge tables Usable in SELECT statement as part of WHERE clause MATCH (<graph_search_pattern>) Uses ASCII art syntax to traverse a path in the graph One node to another via an edge in the direction of the arrow provided

MATCH Node names or aliases appear at the two ends of the arrow
MATCH(UniqueUser-(LikesThis)->MySong) Edge names or aliases are provided inside parentheses The arrow can go in either direction in the pattern MATCH(SimilarSong<-(LikesOther)-UniqueUser-(LikesThis)->MySong)

MATCH The node names inside MATCH can be repeated
An edge name cannot be repeated inside MATCH Use an alias for the edge if you want to use to form a new path An edge can point in either direction, but it must have an explicit direction

MATCH MATCH can be combined with other expressions using AND in the WHERE clause OR and NOT operators are not supported in the MATCH pattern But you can have a sub-query using EXISTS or NOT EXISTS part of the overall query

SQL Graph Customer Scenarios
Fraud detection for online gaming Look at devices, IPs, logins and connect to gaming activity Sales Lead Management Traverse a hierarchy of child accounts, and given an opportunity see who responded to specific s and promotions Product hierarchies, Bill of Materials management Find the hierarchy for a given product or the hierarchies for a set of products Find all products that contain a given product according to the BoM hierarchy

(Revisit) Ingredient Recall Scenario

(Revisit) Understanding the impact

Questions? Don’t forget to complete an online evaluation! Graph Database Processing with Azure SQL DB and SQL Server 2017 Your evaluation helps organizers build better conferences and helps speakers improve their sessions. Thank you!

Save the Date www.SQLintersection.com Oct 30-Nov 2, 2017
We’re back in Vegas baby!

Joe Sack, Principal Program Manager, Microsoft

Similar presentations

Presentation on theme: "Joe Sack, Principal Program Manager, Microsoft"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joe Sack, Principal Program Manager, Microsoft

Similar presentations

Presentation on theme: "Joe Sack, Principal Program Manager, Microsoft"— Presentation transcript:

Similar presentations

About project

Feedback