CS 583 Analysis of Algorithms

Slides:



Advertisements
Similar presentations
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Advertisements

Chapter 4: Trees Part II - AVL Tree
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Tirgul 6 B-Trees – Another kind of balanced trees Some notes regarding Home Work.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
COMP 171 Data Structures and Algorithms Tutorial 9 B-Trees.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
October 10, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
November 12, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
Data Structures – Week #6 Special Trees. January 14, 2016Borahan Tümer, Ph.D.2 Outline Adelson-Velskii-Landis (AVL) Trees Splay Trees B-Trees.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
October 24, 2005L11.1 Introduction to Algorithms LECTURE (Chapter 18) B-Trees ‧ 18.1 Definition of B-Tree ‧ 18.2 Basic operations on B-Tree ‧ 18.3 Deleting.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
DAST Tirgul 7.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
B-Tree Michael Tsai 2017/06/06.
Multiway Search Trees Data may not fit into main memory
Chapter 18: B-Trees Example: M Note: Each leaf has the same depth D H
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B-Trees Example: Comp 750, Fall 2009 M Note: Each leaf
Chapter 18 B-Trees Lee, Hsiu-Hui
B+-Trees.
B+-Trees.
B+-Trees.
CSCI Trees and Red/Black Trees
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 583 Analysis of Algorithms
Lecture 7 Algorithm Analysis
Chapter Trees and B-Trees
Chapter Trees and B-Trees
B-Trees -Stephen R. Covey
B Tree Adhiraj Goel 1RV07IS004.
Wednesday, April 18, 2018 Announcements… For Today…
Red-Black Trees.
Data Structures – Week #6
CS 583 Analysis of Algorithms
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Lecture 9 Algorithm Analysis
Indexing and Hashing Basic Concepts Ordered Indices
Lecture 9 Algorithm Analysis
B-Tree.
Lecture 9 Algorithm Analysis
CS 583 Analysis of Algorithms
B+-Trees (Part 1).
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Lecture 7 Algorithm Analysis
Chapter 12: Binary Search Trees
B-Tree Presenter: Jun Tao.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 583 Analysis of Algorithms
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 583 Analysis of Algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Lecture 7 Algorithm Analysis
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Design and Analysis of Algorithms
Analysis of Algorithms CS 477/677
Algorithms and Data Structures Lecture IX
1 Lecture 13 CS2013.
B-Tree of degree 3 t=3 data internal nodes
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B-Trees B-trees are characterized in the following way:
Presentation transcript:

CS 583 Analysis of Algorithms B-Trees CS 583 Analysis of Algorithms 11/23/2018 CS583 Fall'06: B-Trees

Outline Data Structures on Secondary Storage B-Trees Self-test Magnetic disks Efficient operations B-Trees Definitions Searching Inserting Self-test 18.1-1, 18.1-2, 18.2-1, 18.2-2 11/23/2018 CS583 Fall'06: B-Trees

Magnetic Disks The main memory of a computer system consists of silicon memory chips. It is typically two orders of magnitude more expensive than the magnetic storage technology. Magnetic disks are cheaper and have higher capacity than main memory. However, they are much slower because of moving parts. In order, to amortize time spent for mechanical movements, disks access several items at the same time. Information is divided into equal size pages. Pages appear as consecutive bits within cylinders. Once the read/write head is positioned at the desired page, large amounts of data can be accessed quickly. 11/23/2018 CS583 Fall'06: B-Trees

Disk Operations When x is an object that resides on a disk the following pseudocode conventions are used: x = <a pointer to some object> Disk-Read(x) <access and modify fields of x> Disk-Write(x) In most systems the running time of a B-Tree algorithm is determined by the number of disk read and write operations. Hence, a B-tree node is usually as large as a disk page. Example: a B-tree with a branching factor of 1001 and height 2 can store a Billion+ keys. Since the root note is stored in main memory, only two disk accesses at most are needed to find any key! 11/23/2018 CS583 Fall'06: B-Trees

B-tree Definition We assume that any satellite information associated with a key is stored in the same node as a key. A B-tree is a rooted tree with the following properties: Every node x has the following fields: n[x], the number of keys stored in x. n[x] keys stored in non-decreasing order: key1[x] <= key2[x] <= ... <= keyn[x][x] leaf[x] = true if x is a leaf, and false otherwise. Each internal node x contains n[x]+1 pointers to its children: c1[x], c2[x], ... , cn[x]+1[x] 11/23/2018 CS583 Fall'06: B-Trees

B-tree Definition (cont.) Properties (cont.): The keys keyi[x] separate the ranges stored in each subtree: if ki is any key stored in the subtree with root ci[x], then k1 <= key1[x] <= k2 <= key2[x] <= ...<= keyn[x][x] <= kn[x]+1 All leaves have the same depth, -- the tree’s height h. There are lower and upper bounds on the number of keys in a node. They are expressed in terms of an integer t >= 2 called the minimum degree: Every node other than the root must have at least t-1 keys. Every node can contain at most (2t-1) keys. We say the node is full if it contains exactly (2t-1) keys. 11/23/2018 CS583 Fall'06: B-Trees

Height of the Tree The number of disk accesses for a B-tree is proportional to the height of the tree. Theorem 18.1 If n >= 1, then for any n-key B-tree T of height h and minimum degree t >= 2: h <= logt (n+1)/2 Proof. If a B-tree has height h, the root contains at least one key and all other nodes contain at least (t-1) keys. Thus there are at least 2 nodes at depth 1, at least 2t nodes at depth 2, and so on, until 2th-1 nodes at depth h. 11/23/2018 CS583 Fall'06: B-Trees

Height of the Tree (cont.) The number of n keys satisfies inequality: n >= 1 + (t-1) i=1,h 2ti-1 = 1+2(t-1)(th-1)/(t-1) = 2 th-1 => th <= (n+1)/2 => h <= logt(n+1)/2  Hence the height of the B-tree grows as O(logt n) , which is significantly slower than the growth of the height of the red-black tree, -- O(lg n). This means that the number of disk accesses is substantially reduced for most tree operations. 11/23/2018 CS583 Fall'06: B-Trees

Basic Operations The root of the B-tree is always in main memory. Disk-Read on the root is never required. Disk-Write is required when the root node is changed. Any nodes that are passed as parameters have already had Disk-Read performed on them. All basic procedures are “one-pass” algorithms: They proceed downward from the root of the tree, without having to back up. 11/23/2018 CS583 Fall'06: B-Trees

Searching The searching algorithm takes as input a pointer to the root node x of a subtree, and a key k. It returns a pair (y, i) such that keyi[y] = k. B-Tree-Search(x,k) 1 i = 1 2 while i <= n[x] and k > key_i[x] 3 i++ 4 if i <= n[x] and k = key_i[x] 5 return (x,i) 6 if leaf[x] 7 return NIL 8 else 9 Disk-Read (c_i[x]) // read ith child of x 10 return B-Tree-Search(c_i[x],k) 11/23/2018 CS583 Fall'06: B-Trees

Searching: Performance The nodes encountered during the recursion form a path downward from the root of the tree. The number of disk pages accessed by B-Tree-Search is O(h) = O(logt n). For each node, n[x] < 2t, hence the while loop 2-3 takes O(t) time. Therefore the total CPU time is O(th) = O(logt n). 11/23/2018 CS583 Fall'06: B-Trees

Inserting General algorithm: If the node y is full (having 2t-1 keys): Search for the leaf node y at which to insert the new key. If the node y is full (having 2t-1 keys): Split the full node around its median key: keyt[y]: Create two nodes with (t-1) keys each. Move the median key up to y’s parent. If y’s parent is also full, make the split again. The key is inserted in a single path down the tree. Each full node is split along the way. This assures that when the y node needs to be split, its parent cannot be full. 11/23/2018 CS583 Fall'06: B-Trees

Splitting a Node The procedure B-Tree-Split-Child takes as input non-full node x, index i, and a full child y of x: y=ci[x]. The procedure then splits y in two and adjusts x so that it has an additional child. When the root needs to be split, a new root needs to be created. The tree grows in height by one. Splitting is the only means to grow the tree. 11/23/2018 CS583 Fall'06: B-Trees

Splitting Node: Pseudocode B-Tree-Split-Child(x,i,y) 1 z = Allocate-Node() // allocate a disk page 2 leaf[z] = leaf[y] 3 n[z] = t-1 4 for j = 1 to t-1 5 keyj[z] = keyj+t[y] 6 if not leaf[y] 7 for j = 1 to t 8 cj[z] = cj+t[y] 9 n[y] = t-1 // shift children to the right 10 for j = n[x] downto i+1 11 cj+1[x] = cj[x] 12 ci+1[x] = z // add z as a new child 11/23/2018 CS583 Fall'06: B-Trees

Splitting Node: Pseudocode (cont.) // make room for the median 13 for j = n[x] downto i 14 keyj+1[x] = keyj[x] 15 keyi[x] = keyt[y] 16 n[x]++ 17 Disk-Write(y) 18 Disk-Write(z) 19 Disk-Write(x) The CPU time is determined by loops 4-5 and 7-8, which is (t). Note that other loops perform O(t) iterations. The procedure performs (1) disk operations. 11/23/2018 CS583 Fall'06: B-Trees

Inserting a Key: Algorithm B-Tree-Insert(T,k) 1 r = root[T] 2 if n[r] = 2t-1 // full node 3 s = Allocate-Node() 4 root[T] = s 5 leaf[s] = FALSE 6 n[s] = 0 7 c1[s] = r // split the old root 8 B-Tree-Split-Child(s,1,r) 9 B-Tree-Insert-Nonfull(s,k) 10 else 11 B-Tree-Insert-Nonfull(r,k) 11/23/2018 CS583 Fall'06: B-Trees

Inserting a Key: Algorithm (cont.) // Insert key k into a non-full node x B-Tree-Insert-Nonfull(x,k) 1 i = n[x] 2 if leaf[x] // k is inserted in the ordered list 3 while i >= 1 and k < keyi[x] 4 keyi+1[x] = keyi[x] 5 i-- 6 keyi+1[x] = k 7 n[x]++ 8 Disk-Write(x) 9 else // search the leaf to insert into 11/23/2018 CS583 Fall'06: B-Trees

Inserting a Key: Algorithm (cont.) 10 while i >= 1 and k < keyi[x] 11 i-- 12 i++ 13 Disk-Read(ci[x]) 14 if n[ci[x]] = 2t-1 // full node 15 B-Tree-Split-Child(x,i,ci[x]) 16 if k > keyi[x] 17 i++ 18 B-Tree-Insert-Nonfull(ci[x], k) 11/23/2018 CS583 Fall'06: B-Trees

Inserting a Key: Performance The number of disk accesses performed by B-Tree-Insert is O(h) for a B-tree of height h. Only a O(1) of Disk-Read and Disk-Write operations are performed at each level in the B-Tree-Insert-Nonfull. The total CPU time is O(t h) = O(logt n) At each level of the tree the number of CPU operations are determined by while loops in B-Tree-Insert-Nonfull. The maximum number of iterations in these loops are 2t-1, hence the total time at each level is O(t). 11/23/2018 CS583 Fall'06: B-Trees