0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.

Slides:



Advertisements
Similar presentations
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees.
Advertisements

0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Tirgul 6 B-Trees – Another kind of balanced trees Some notes regarding Home Work.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
(B+-Trees, that is) Steve Wolfman 2014W1
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
October 10, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
November 12, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Data Structures – Week #6 Special Trees. January 14, 2016Borahan Tümer, Ph.D.2 Outline Adelson-Velskii-Landis (AVL) Trees Splay Trees B-Trees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
October 24, 2005L11.1 Introduction to Algorithms LECTURE (Chapter 18) B-Trees ‧ 18.1 Definition of B-Tree ‧ 18.2 Basic operations on B-Tree ‧ 18.3 Deleting.
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: ec2620m.htm Office: TEL 3049.
B-Tree Michael Tsai 2017/06/06.
CSE 332 Data Abstractions B-Trees
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Data Structures – Week #6
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B+-Trees (Part 1).
B-Tree Presenter: Jun Tao.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
B-Trees.
Presentation transcript:

0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures (Ch. 6) n Balanced Search Trees: Splay Trees (Ch. 4.5) n Union-Find data structure (Ch. 8.1–8.5) n Graphs: Representations and basic algorithms  Topological Sort (Ch )  Minimum spanning trees (Ch. 9.5)  Shortest-path algorithms (Ch ) n B-Trees: External-Memory data structures (Ch. 4.7) n kD-Trees: Multi-Dimensional data structures (Ch. 12.6) n Misc.: Streaming data, randomization

1B-Trees n B-Trees are designed to minimize I/O operations n Large database stored in secondary memory (disks, cloud)  Too large for main memory  Non-uniform data access cost: memory vs. disk  Similar considerations for cache vs. main memory

2B-Trees n Data access costs vary significantly  Disk access times slower than silicon memory.  Somewhat outdated numbers, but illustrative:  Disks rotate at about 7200 RPM; typical range 5K-15K.  One rotation takes 8.33 ms, which is about 5 orders slower than a 50 nsec access for current silicon memory.

3B-Trees n To amortize huge disk access cost, data transferred in large chunks.  Information (e.g. set of key values) chunked into large "pages” laid out consecutively within each disk cylinder  Typical page size: 2 11 to 2 14 bytes (2K-16K)  Think of B-Tree as a BST with very fat nodes n Processing an entire page takes less time than to fetch it n So, in disk-bound data structures, we consider two factors separately:  number of disk accesses,  the CPU time.

4B-Trees  B-Tree algorithms operate at the granularity of pages. I.e., the unit operations READ or WRITE a page  The main memory can accommodate only so many pages, so older pages will be flushed out as new ones are fetched.

5B-Trees  To optimize the number of page accesses, we choose the size of the B-Tree node to match the page size.  That is, each node will store keys for about items, and will have the similar branching factor  With a branching factor of 1001 (1000 keys per node), 1 billion keys can be accessed by a tree of height 2.  Just 2 disk accesses!

6 B-Trees: an example

7 B-Tree: Structural Details

8 B-Tree: Formal Definitions

9

10 B-Tree: Properties n Simplest B-tree when t=2 (also called trees) n Height of a B-tree: h <= log t (n + 1)/2

11 B-Tree: Height Bound

12 B-Tree: Searching for a Key Root node always in main memory, so no disk-access required there. But access to any other node requires disk-read.

13 B-Tree: Inserting a key k n Search for the leaf node to insert k.  What happens if that node is full (2t-1 keys)  All B-Tree nodes must be at same depth and have between t-1 and 2t-1 keys.  For instance, insert key I into the leaf [J, K, L]

14 B-Tree: Splitting a node n If insert leaf is full, we split it around its middle t th key.  Median key moves to the parent, and  Two new children, each with t-1 keys formed.  If parent becomes full, recursively split upwards

15B-Tree-Split-Child y is the i th child of x, and is the node being split. y originally has 2t children, and is reduced to t (and t-1 keys) by this operation. z adopts the t largest children of y, and z becomes a new child of x, positioned just after y in x's table. The median key of y moves up to x, separating y and z.

16B-Tree-Insert

17 B-Tree: Splitting a Node

18 B-Tree-Insert: main routine 1 st while loop = x is a leaf. When x non-leaf, insert into appropriate leaf in the sub-tree rooted at x. 2 nd while loop determines the child of x to descend to. The if condition checks if that child is a full node. If full, B_TREE_SPLIT splits it into two non-full nodes, and the next if determines which of the children to descend to.

19 B-Tree-Insert: Illustration

20 B-Tree-Insert: Illustration

21 B-Trees: Insert Complexity The number of disk accesses by B-TREE-INSERT is O(h): only O(1) disk read or writes between calls to B-TREE-NONFULL. n The total CPU time is O(t h) = O(t log t n).

22 B-Trees: Deleting a key Deleted key may be in a non-leaf node. n If the size of a node drops below t-1 after deletion, fix it. n Suppose we from sub-tree rooted at x. n We ensure that when B-TREE-DELETE is called on a node x, the number of keys in x is at least t. n This allows us to perform deletion in one pass, without backing up.

23 B-Trees: Deleting a key (Complicated!) n If k is in leaf-node x, delete k from x. n If k is in non-leaf node x, do:  if the child y that precedes k in node x has t or more keys, then find predecessor k' of k in sub-tree rooted at y. Recursively delete k', and replace k with k' in x.  Symmetrically, if child z that follows k in node x has >= t keys, find the successor k' of k in sub-tree rooted at z. Recursively delete k', and replace k with k' in x.  Otherwise, if both y and z have only t-1 keys, merge k and all of z into y so that x loses both k and the pointer to z, and y now has 2t-1 keys. Free z and recursively delete k from y.

24 B-Trees: Deleting a key (Complicated!) n If the key k is not present in node x,  determine the root c i (x) of the appropriate sub-tree containing k.  If c i (x) has only t-1 keys, execute steps (3a) or (3b) to guarantee we descend to a node with >= t keys.  Then finish by recursive call on appropriate child of x.  3a. If c i (x) has only t-1 keys but has an immediate sibling with t or more keys, give c i (x) an extra key by moving a key from x down to c i (x), moving a key from c i (x)'s immediate left or right sibling up into x, and moving the appropriate child pointer from the sibling into c i (x).  3b. If c i (x) and both of c i (x)'s immediate siblings have t-1 keys, merge c i (x) with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node.

25 B-Tree Deletion: Illustration

26 B-Tree Deletion: Illustration

27 B-Tree Deletion: Illustration

28 B-Tree: Deletion Complexity n Also O(h) disk accesses and O(t log t n) CPU.