Hierarchies & Trees in SQL by Joe Celko copyright 2008.

Slides:



Advertisements
Similar presentations
DB glossary (focus on typical SQL RDBMS, not XQuery or SPARQL)
Advertisements

Keys, Referential Integrity and PHP One to Many on the Web.
Binary Trees Chapter 6. Linked Lists Suck By now you realize that the title to this slide is true… By now you realize that the title to this slide is.
©Brooks/Cole, 2003 Chapter 12 Abstract Data Type.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Introduction to Structured Query Language (SQL)
Data Structures: Trees i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, or Glenn Brookshear.
Introduction to Structured Query Language (SQL)
CSE 143 Lecture 19 Binary Trees read slides created by Marty Stepp
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
Introduction to Structured Query Language (SQL)
P2P Course, Structured systems 1 Introduction (26/10/05)
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 17: Binary Search Trees; Heaps.
CS4432: Database Systems II
Table & Query Design for Hierarchical Data without CONNECT-BY -- A Path Code Approach Charles Yu Database Architect Elance Inc. Elance Inc.
SQL Basics. SQL SQL (Structured Query Language) is a special-purpose programming language designed from managing data in relational database management.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
B+ Trees COMP
Data Structures - CSCI 102 Binary Tree In binary trees, each Node can point to two other Nodes and looks something like this: template class BTNode { public:
Data Access Patterns Some of the problems with data access from OO programs: 1.Data source and OO program use different data modelling concepts 2.Decoupling.
Trees Chapter 15 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
CS Data Structures Chapter 15 Trees Mehmet H Gunes
Trees and Graphs CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
CSE 143 Lecture 18 Binary Trees read slides created by Marty Stepp
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
Trees Dr. Andrew Wallace PhD BEng(hons) EurIng
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Starting at Binary Trees
TREES. What is a tree ? An Abstract Data Type which emulates a tree structure with a set of linked nodes The nodes within a tree are organized in a hierarchical.
CSC 213 Lecture 8: (2,4) Trees. Review of Last Lecture Binary Search Tree – plain and tall No balancing, no splaying, no speed AVL Tree – liberté, égalité,
Trees : Part 1 Section 4.1 (1) Theory and Terminology (2) Preorder, Postorder and Levelorder Traversals.
Data Structures Chapter 6. Data Structure A data structure is a representation of data and the operations allowed on that data. Examples: 1.Array 2.Record.
Discrete Mathematics Chapter 5 Trees.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Binary Tree. Some Terminologies Short review on binary tree Tree traversals Binary Search Tree (BST)‏ Questions.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
TREES K. Birman’s and G. Bebis’s Slides. Tree Overview 2  Tree: recursive data structure (similar to list)  Each cell may have zero or more successors.
Week 7 - Wednesday.  What did we talk about last time?  Recursive running time  Master Theorem  Symbol tables.
Starting with Oracle SQL Plus. Today in the lab… Connect to SQL Plus – your schema. Set up two tables. Find the tables in the catalog. Insert four rows.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
This Lecture Intro to Tree ADT Terminologies Tree Types Tree Traversals Binary Search Tree Expression Trees.
Trees By JJ Shepherd. Introduction Last time we discussed searching and sorting in a more efficient way Divide and Conquer – Binary Search – Merge Sort.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
2006­03-08 | Managing Hierarchical Data | © MySQL AB 2005 | 1 Managing Hierarchical Data in MySQL Mike Hillyer - MySQL AB.
COMP261 Lecture 23 B Trees.
CSE373: Data Structures & Algorithms
Trees Chapter 15.
Querying Hierarchical Data
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Trees ---- Soujanya.
Trees.
B+ Tree.
Lecture 18. Basics and types of Trees
i206: Lecture 13: Recursion, continued Trees
Data Structures and Database Applications Binary Trees in C#
Session #, Speaker Name Indexing Chapter 8 11/19/2018.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Database systems Lecture 3 – SQL + CRUD
CMSC 202 Trees.
Contents Preface I Introduction Lesson Objectives I-2
Credit for some of the slides in this lecture goes to
CPSC-608 Database Systems
B-Trees.
Data Structures – Binary Tree
Presentation transcript:

Hierarchies & Trees in SQL by Joe Celko copyright 2008

Trees in SQL Trees are graph structures used to represent –Hierarchies –Parts explosions –Organizational charts Three methods in SQL –Adjacency list model –Nested set model –Path enumeration

Trees in SQL -2 Trees are not hierarchies –Hierarchies have subordination –Kill your captain, you still have to take orders from your general –Break an edge in a tree, and you have two or more disjoint trees. This means an adjacency list model is a tree, but not a hierarchy

Tree Terminology

Tree as Graph

Trees as Nested Sets root A0 A1 A2 B0

Graphs as Tables Nodes and edges are not the same kind of things –Organizational chart & Personnel file You should use separate tables for the structure and the elements –The structure table will be small (two integers and a key) –You can put more than one structure table on the same elements

Adjacency List Model node parent cost ================== Root NULL 2.50 A0 Root 1.75 A1 A A2 A B0 Root 4.00 Cost really should not be in the table, but most adjacency list tables mix nodes and edges (see Oracle’s Scott/Tiger sample database)‏ Most common method in use.

Adjacency List Model Programmers do not add constraints: CHECK((SELECT COUNT(*) FROM Tree) -1 = SELECT COUNT(*)‏ FROM ((SELECT child FROM Tree) UNION (SELECT parent FROM Tree)))‏

Path Enumeration Model Tree node cost path ============== Root 2.50 ‘Root’ A ‘Root,A0’ A ‘Root,A0,A1’ A ‘Root,A0,A2’ B ‘Root,B0’ Cost really should not be in the table, but most path enumeration tables mix nodes and edges. Paths are search with path LIKE ‘Root,%’ predicates

Graph with Traversal

Nested Sets with Numbers A0 A1 A2 B0 Root

Nested Sets as Numbers Split nodes and edges into two tables. You can join them back together later This could be personnel and Org chart – Tree.node would be job titles –Nodes would need job titles and the person holding it Tree Nodes node lft rgt node cost ============== ======== Root 1 10 Root 2.50 A0 2 7 A A1 3 4 A A2 5 6 A B0 8 9 B0 4.00

Problems with Adjacency list You have to use cursors or self-joins to traverse the tree Cursors are not a table -- their order has meaning -- Closure violation! Cursors take MUCH longer than queries Ten level self-joins are worse than cursors

Problems with Path Enumeration Path can get long in a deep tree Great for searching down the tree, but not up the tree –SELECT * FROM Tree WHERE path LIKE ‘Root,%’; –SELECT * FROM Tree WHERE path LIKE ‘%,B0’; Inserting and deleting nodes is complicated –Requires string manipulation to change all the paths beneath the insertion or deletion point

Tree Aggregates Give me the total cost for all subtrees –(root, 13.75) -- sum of every node in tree –(A0, 7.25) -- sum of “A0” subtree –(A1, 2.00)‏ –(A2, 3.50)‏ Dropping A2 would reduce all superior rows by 3.50,but would not change A1

Find Root of Tree SELECT * FROM Tree WHERE lft = 1; It helps to have an index the lft column The rgt value will be twice the number of nodes in the tree. General rule: The number of nodes in any subtree ((rgt -lft) + 1 )/ 2

Find All Leaf Nodes SELECT * FROM Tree WHERE lft = rgt -1; An index on lft will help A covering index on (lft, rgt) is even better

Find Superiors of X SELECT Super.* FROM Tree AS T1, Tree AS Sup WHERE T1.node = ‘X’ AND T1.lft BETWEEN Sup.lft AND Sup.rgt; This is the most important trick in this method The BETWEEN predicates preserve subordination in the hierarchy One query for any depth tree

Find Subordinates of X SELECT Sub.* FROM Tree AS T1, Tree AS Sub WHERE T1.node = ‘X’ AND Sub.lft BETWEEN T1.lft AND T1.rgt; This is the same pattern as finding superiors

Find Depth of Tree SELECT T1.node, COUNT(T2.node) AS level FROM Tree AS T1, Tree AS T2 WHERE T1.lft BETWEEN T2.lft AND T2.rgt GROUP BY T1.node; Count the containing nested sets for levels The closer to the root a node is, the greater the value of (rgt - lft)‏

Totals by Level in Tree SELECT T1.node, SUM(T2.cost) AS tot_level_cost FROM Tree AS T1, Tree AS T2 WHERE T2.lft BETWEEN T1.lft AND T1.rgt GROUP BY T1.node; Uses any aggregate function the same way

Delete a Subtree Remove subtree rooted at :my_node DELETE FROM Tree WHERE lft BETWEEN (SELECT lft FROM Tree WHERE node = :my_node)‏ AND (SELECT rgt FROM Tree WHERE node = :my_node);

Delete a Single Node Method one - promote a child to the parent’s prior position in the tree. Oldest son inherits family business Method two- subordinate the entire subtree to the grandparent. Orphans go live with grandmother.

Delete & Promote Oldest - 1 Delete A0 node

Delete & Promote Oldest - 2

Delete & Promote Subtree - 1 Delete A0 node

Delete & Promote Subtree - 2

Closing gaps in nested set model -1 Deleted nodes leave gaps in numbering of lft and rgt nodes. Fill in gaps by sliding everyone over to the lft until there are no gaps. UPDATE Tree SET lft = lft - gap_size, rgt = rgt - gap_size WHERE rgt >= gap_start OR lft >= gap_start;

Closing gaps in nested set model -2 CREATE VIEW LftRgt(i)‏ AS SELECT lft FROM Tree UNION ALL SELECT rgt FROM Tree; UPDATE Tree SET lft = (SELECT COUNT(*)‏ FROM LftRgt WHERE i <= lft; rgt = (SELECT COUNT(*)‏ FROM LftRgt WHERE i <= rgt;

Inserting into a Tree The real trick is numbering the subtree correctly before inserting it. Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree. The position of the subtree within the siblings of the new parent in the tree is another decision.

Inserting into a Tree The real trick is numbering the subtree correctly before inserting it. Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree. The position of the subtree within the siblings of the new parent in the tree is another decision.

Inserting into a Tree -2 If you are worried about having to update the tree structure too often, then use a bigger spread in the numbering. At higher levels, use steps of 100,000, then 10,000 and so forth. Most SQL products can handle DECIMAL(s,p) of 30 or more digits. Since insertion are done on the right side of the siblings, you can re-organize the tree by sliding everyone to the left and closing the gaps.

Inserting into a Tree -4 B A1 A2 A0 Root Slide everyone to the left

Creating a Tree -1 If you want to have all the constraints for a proper hierarchy, then it is complicated. CREATE TABLE Tree (node_id INTEGER NOT NULL REFERENCES Nodes(node_id), lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), UNIQUE (lft, rgt), – redundant, but useful CHECK (lft < rgt)‏ ); You can also declare node_id to be the PRIMARY KEY, but then one person cannot hold two jobs.

Creating a Tree -2 Other needed constraints – no overlaps in the nodes SELECT * FROM Tree AS T1 WHERE EXISTS (SELECT * FROM Tree AS T2 WHERE T1.lft BETWEEN T2.lft AND T2.rgt AND T1.rgt NOT BETWEEN T2.lft AND T2.rgt;

Creating a Tree -3 Other needed constraints – no disjoint nodes SELECT * FROM Tree AS T1 WHERE EXISTS (SELECT * FROM Tree AS T2 WHERE T1.lft < (SELECT rgt FROM Tree WHERE lft = 1));

Creating a Tree -4 If you do not have triggers or CREATE ASSERTION, you can use an updatable view CREATE VIEW GoodTree (node, i, j)‏ AS SELECT T1.node, T1.i, T1.j FROM Tree AS T1 WHERE NOT EXISTS ( )‏ AND NOT EXISTS ( )‏ WITH CHECK OPTION;

Converting an Adjacency Model into a Nested Set Model Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers. Adjacency list method does not order siblings; nested set model does automatically Classic push down stack algorithm works You can keep both models in one table with a column for the immediate superior

Converting a Nested Set Model into an Adjacency Model This actually pretty straight forward; you can put it into a single view SELECT B.emp AS boss, P.emp FROM OrgChart AS P LEFT OUTER JOIN OrgChart AS B ON B.lft = (SELECT MAX(lft)‏ FROM OrgChart AS S WHERE P.lft > S.lft AND P.lft < S.rgt);

Structure versus Contents Nested set model allows the structure of trees to be compared. For each tree find the lft value of the root node of each tree Make a canonical form and UNION ALL them EXISTS ( SELECT * FROM ( SELECT (lft - lftmost), (rgt - lftmost)‏ FROM Tree1 UNION ALL SELECT (lft - lftmost), (rgt - lftmost)‏ FROM Tree2) AS Both (lft, rgt)‏ GROUP BY Both.lft, Both.rgt HAVING COUNT (*) =1 )

Questions & Answers ?