Oct 29, 2001CSE 373, Autumn 20011 External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.

Slides:



Advertisements
Similar presentations
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Advertisements

Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE 332 Review Slides Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
Hashing CS 3358 Data Structures.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CSE 326: Data Structures Hashing Ben Lerner Summer 2007.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Efficient Storage and Retrieval of Data
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
Primary Indexes Dense Indexes
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Cpt S 223 – Advanced Data Structures Course Review Midterm Exam # 2
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE373: Data Structures & Algorithms Lecture 15: B-Trees Linda Shapiro Winter 2015.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
CompSci Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Nov 2, 2001CSE 373, Autumn Hash Table example marking deleted items + choice of table size.
Algorithms Design Fall 2016 Week 6 Hash Collusion Algorithms and Binary Search Trees.
CE 221 Data Structures and Algorithms
COMP261 Lecture 23 B Trees.
CPS216: Data-intensive Computing Systems
B-Trees Text B-Tree Objects Building a B-Tree Read Weiss, §19.8
Multiway Search Trees Data may not fit into main memory
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
B+-Trees.
B+-Trees.
B+-Trees.
CSE373: Data Structures & Algorithms Lecture 15: B-Trees
B+-Trees (Part 1).
CSE 373 Data Structures and Algorithms
Presentation transcript:

Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer than a machine instruction. The RAM model does not account for disk I/O. memory disk 128 MB fast, expensive 60 GB slow, cheap

Oct 29, 2001CSE 373, Autumn Disks, continued The difference between memory speed and disk speed is increasing. Example: State of Florida driving records (256 bytes). 10,000,000 items. 6 disk accesses per second on a time-sharing system. unbalanced binary search tree: possibly 10,000,000 accesses. BST: on avg. 32 accesses (5 sec.) AVL: worst: 1.44 log n typical case: log n, 25 accesses (4 sec.)

Oct 29, 2001CSE 373, Autumn Disk accesses Goal: reduce the number of disk accesses. We are willing to do more complicated computations in memory in order to save disk time. Idea: increase the branching of the tree so that the height is decreased. Defn: An M-ary search tree allows up to M children per node.

Oct 29, 2001CSE 373, Autumn B-Trees 1.All the data items are stored at the leaves. 2.The non-leaf nodes store up to M-1 keys. The ith key represents the smallest key in subtree i+1. 3.The root is either a leaf of has between 2 and M children. 4.All non-leaf nodes (except the root) have between  M/2  and M children. 5.All leaves are at the same depth and have between  L/2  and L data items.

Oct 29, 2001CSE 373, Autumn B-Trees: Choices Choose M and L based on the size of the keys K and on the size of the record R. Suppose a disk block is of size B (bytes). Choose M so that a non-leaf node fits into one block: B  (M-1) · K + M · 4 Choose L so that a leaf node fits into one block: B  L · R accesses: log 2 N vs. log  M/2  N

Oct 29, 2001CSE 373, Autumn Hash Tables Constant time accesses! A hash table is an array of some fixed size, usually a prime number. General idea: key space (e.g., strings) 0 … TableSize –1 hash func. h(K) hash table

Oct 29, 2001CSE 373, Autumn Desirable Properties We want a hash function to: 1.be simple/fast to compute, 2.map different keys to different cells, (impossible – why?) 3.have keys distributed evenly among cells. Idea: If #1 and #3 are true and the hash table is not very full, then it should be fast to do a find.

Oct 29, 2001CSE 373, Autumn Example key space = integers h(K) = K mod We lose all ordering information: findMin, findMax, inorder traversal, printing items in sorted order.

Oct 29, 2001CSE 373, Autumn Example 2 key space = strings s = s 0 s 1 s 2 … s k-1 h(s) = s 0 mod TableSize BAD HASH FUNCTION h(s) = mod TableSize BETTER HASH FUNCTION

Oct 29, 2001CSE 373, Autumn Collision Resolution Separate chaining: All keys that map to the same hash value are kept in a list