Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.

Similar presentations


Presentation on theme: "CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001."— Presentation transcript:

1 CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001

2 Extendible Hashing Hashing technique for huge data sets –optimizes to reduce disk accesses –each hash bucket fits on one disk block –better than B-Trees if order is not important Table contains –buckets, each fitting in one disk block, with the data –a directory that fits in one disk block used to hash to the correct bucket

3 001010011 110 111 101 Extendible Hash Table Directory contains entries labeled by k bits plus a pointer to the bucket with all keys starting with its bits Each block contains keys+data matching on the first j  k bits 000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 directory for k = 3

4 Inserting (easy case) 001010011 110 111 101000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 insert(11011) 001010011 110 111 101000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11100 11110

5 Splitting 001010011 110 111 101000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (2) 11001 11011 11100 11110 insert(11000) 001010011 110 111 101000 100 (2) 00001 00011 00100 00110 (2) 01001 01011 01100 (3) 10001 10011 (3) 10101 10110 10111 (3) 11000 11001 11011 (3) 11100 11110

6 Rehashing 01101100 (2) 01101 (2) 10000 10001 10011 10111 (2) 11001 11110 insert(10010) 001010011 110 111 101000 100 No room to insert and no adoption! Expand directory Now, it’s just a normal split.

7 Rehash of Hashing Hashing is a great data structure for storing unordered data that supports insert, delete, and find Both separate chaining (open) and open addressing (closed) hashing are useful –separate chaining flexible –closed hashing uses less storage, but performs badly with load factors near 1 –extendible hashing for very large disk-based data Hashing pros and cons + very fast + simple to implement, supports insert, delete, find - lazy deletion necessary in open addressing, can waste storage - does not support operations dependent on order: min, max, range

8 Recall: AVL Tree Dictionary Data Structure 4 121062 115 8 141379 Binary search tree properties –binary tree property –search tree property Balance property –balance of every node is: -1  b  1 –result: depth is  (log n) 15

9 Splay Trees “blind” rebalancing – no height info kept amortized time for all operations is O(log n) worst case time is O(n) insert/find always rotates node to the root! –Good locality – most common keys move high in tree

10 Idea 17 10 92 5 3 You’re forced to make a really deep access: Since you’re down there anyway, fix up a lot of deep nodes!

11 Splay Operations: Find Find(x) 1.do a normal BST search to find n such that n->key = x 2.move n to root by series of zig-zag and zig-zig rotations, followed by a final zig if necessary

12 Zig-Zag * g X p Y n Z W * This is just a double rotation n Y g W p ZX Helped Unchanged Hurt

13 Zig-Zig n Z Y p X g W g W X p Y n Z

14 Zig p X n Y Z n Z p Y X root

15 Why Splaying Helps Node n and its children are always helped (raised) Except for final zig, nodes that are hurt by a zig- zag or zig-zig are later helped by a rotation higher up the tree! Result: –shallow (zig) nodes may increase depth by one or two –helped nodes may decrease depth by a large amount If a node n on the access path is at depth d before the splay, it’s at about depth d/2 after the splay –Exceptions are the root, the child of the root, and the node splayed

16 Locality Assume m  n access in a tree of size n –Total amortized time O(m log n) –O(log n) per access on average Gets better when you only access k distinct items in the m accesses. –Exercise.

17 Splaying Example 2 1 3 4 5 6 Find(6) 2 1 3 6 5 4 zig-zig

18 Still Splaying 6 zig-zig 2 1 3 6 5 4 1 6 3 25 4

19 Almost There, Stay on Target zig 1 6 3 25 4 6 1 3 25 4

20 Splay Again Find(4) zig-zag 6 1 3 25 4 6 1 4 35 2

21 Example Splayed Out zig-zag 6 1 4 35 2 61 4 35 2

22 Splay Tree Summary All operations are in amortized O(log n) time Splaying can be done top-down; better because: –only one pass –no recursion or parent pointers necessary Invented by Sleator and Tarjan (1985), now widely used in place of AVL trees Splay trees are very effective search trees –relatively simple –no extra fields required –excellent locality properties: frequently accessed keys are cheap to find

23 Coming Up Project 3: implement a “smart” web server. Heaps! Disjoint Sets Graphs and more.


Download ppt "CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001."

Similar presentations


Ads by Google