Presentation is loading. Please wait.

Presentation is loading. Please wait.

File StructuresSNU-OOPSLA Lab.1 Chap12. Extendible Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures by Folk, Zoellick and Riccardi.

Similar presentations


Presentation on theme: "File StructuresSNU-OOPSLA Lab.1 Chap12. Extendible Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures by Folk, Zoellick and Riccardi."— Presentation transcript:

1 File StructuresSNU-OOPSLA Lab.1 Chap12. Extendible Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures by Folk, Zoellick and Riccardi

2 File StructuresSNU-OOPSLA Lab.2 Chapter Objectives u Describe the problem solved by extendible hashing and related approaches u Explain how extendible hashing works; show how it combines tries with conventional, static hashing u Use the buffer, file, and index classes of previous chapters to implement extendible hashing, including deletion u Review studies of extendible hashing performance u Examine alternative approaches to the same problem, including dynamic hashing, linear hashing, and hashing schemes that control splitting by allowing for overflow buckets

3 File StructuresSNU-OOPSLA Lab.3 Contents u 12.1 Introduction u 12.2 How extendible hashing works u 12.3 Implementation u 12.4 Deletion u 12.5 Extendible hashing performance u 12.6 Alternative approaches

4 File StructuresSNU-OOPSLA Lab.4 12.1 Introduction u Dynamic files u undergo a lot of growths u Static hashing u described in chapter 11 (direct hashing) u typically worse than B-Tree for dynamic files u eventually requires file reorganization u Extendible hashing u hashing for dynamic file u Fagin, Nievergelt, Pippenger, and Strong (ACM TODS 1979)

5 File StructuresSNU-OOPSLA Lab.5 Overview(1) u Direct access (hashing) files have static size, so not suitable for files whose size is unknown in advance u Dynamic file structure is desired which retains the feature of fast retrieval by primary key, and which also expands and contracts as the number of records in the file fluctuates (without reorganizing the whole file) u Similar motivation! u Indexed-sequential File ==> B tree u Hashing ==> Extendible Hashing

6 File StructuresSNU-OOPSLA Lab.6 Overview(2) u Extendible Hashing Primary key H(key) Hashing function Directory Index Extract first d digit File pointer Table look-up

7 File StructuresSNU-OOPSLA Lab.7 12.2 How Extendible Hashing works u Idea from Tries file (radix searching) u The branching factor of the tree is equal to the # of alternative symbols in each position of the key e.g.) Radix 26 trie - able, abrahms, adams, anderson, adnrews, baird u Use the first n characters for branching a b b d n l r d e r able abrahms adams anderson andrews baird

8 File StructuresSNU-OOPSLA Lab.8 Extendible Hashing u H maps keys to a fixed address space, with size the largest prime less than a power of 2 (65531 < 2 16 ) u File pointers point to blocks of records known as buckets, where an entire bucket is read by one physical data transfer, buckets may be added to or removed from the file dynamically u The d bits are used as an index in a directory array containing 2 d entries, which usually resides in primary memory u The value d, the directory size(2 d ), and the number of buckets change automatically as the file expands and contracts

9 File StructuresSNU-OOPSLA Lab.9 Extendible Hashing Example 000 001 010 011 100 101 110 111 d’=1 d’=3 d’=2 Directory with d=3 and 4 buckets B0B0 B 100 B 101 B 11 H(key)=0 H(key)=100 H(key)=101 H(key)=11 d=3

10 File StructuresSNU-OOPSLA Lab.10 Turning the trie into a directory u Using Trie for extendible hashing (1) Use Radix 2 Trie : Keys in A : beginning with 0 Keys in B : beginning with 10 Keys in C : beginning with 11 (2) Retrieving from secondary storage the buckets containing keys, instead of individual keys A B C 0 1 0 1

11 File StructuresSNU-OOPSLA Lab.11 Representation of Trie (1) u Tree is not preferable (directory is not big) u A flattened array 1. Make a complete full binary tree 2. Collapse it into the directory structure 0 1 0 1 0 1 C A B 00 01 10 11 A B C

12 File StructuresSNU-OOPSLA Lab.12 Representation of Trie(2) u Directory is a complete binary tree u Directory entry : a pointer to the associated bucket u Given an address beginning with the bits 10, the 2 10 directory entries u Introduced for uniform distribution

13 File StructuresSNU-OOPSLA Lab.13 Retrieve a record u Steps in retrieving a record with a given key u find H(given key) u extract first d bits of H(given key) u use this value as an index into the directory to find a pointer u use this pointer to read a bucket into primary memory u locate the desired record within the bucket (scan)

14 File StructuresSNU-OOPSLA Lab.14 Expansion & Contraction(1) u A pair of adjunct buckets with the same value of d’ which share a common value of the first d’-1 bits of H(key) can be combined if the average load < 50%, so all records would be able to fit into one bucket u File contraction is the reverse of expansion; the directory can be compacted and d decremented whenever all pairs of pointers have the same values

15 File StructuresSNU-OOPSLA Lab.15 Expansion & Contraction(2) 000 001 010 011 100 101 110 111 d’=2 Bucket B 0 overflows, then splits into B 0 and B 1 B 00 H(key)=00.. d’=2 B 01 H(key)=01.. d’=3 B 100 H(key)=100.. d’=3 B 00 H(key)=101.. d’=2 B 00 H(key)=11.. d=3

16 File StructuresSNU-OOPSLA Lab.16 Expansion & Contraction(3) 0000 d’=2 Bucket B 100 overflows, d increase to 4 B 00 H(key)=00.. d’=2 B 01 H(key)=01.. d’=4 B 1000 H(key)=1000.. d’=4 B 1001 H(key)=1001.. d’=3 B 101 H(key)=101.. d=4 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 d’=2 B 11 H(key)=11..

17 File StructuresSNU-OOPSLA Lab.17 Splitting to Handle Overflow (1) u When overflow occurs e.g.1) Overflowing of bucket A u Split A into A and D u Come to use additional unused bits u No need to expand the directory 00 01 10 11 B C A D 00 01 10 11 A B C

18 File StructuresSNU-OOPSLA Lab.18 Splitting to Handle Overflow(2) u e.g. Overflowing of bucket B u Do not have additional unused bits (need to expand the directory) 1. Divide B using 3 bits of hash address 2. Make a complete full binary tree 3. Collapse it into the directory structure 00 01 10 11 A B C

19 File StructuresSNU-OOPSLA Lab.19 A B C D 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 A B D C 000 001 010 011 A 100 101 110 111 C B D 1. Result of overflow of bucket B 3. Directory 2. Complete Binary Tree

20 File StructuresSNU-OOPSLA Lab.20 Creating Address u Function hash(KEY) u Fold/Add hashing algorithm u Do not MOD hashing value by address space since no fixed address space exists u Output from the hash function for a number of keys bill 0000 0011 0110 1100 lee 0000 0100 0010 1000 pauline 0000 1111 0110 0101 alan 0100 1100 1010 0010 julie 0010 1110 0000 1001 mike 0000 0111 0100 1101 elizabeth 0010 1100 0110 1010 mark 0000 1010 0000 0111

21 File StructuresSNU-OOPSLA Lab.21 Int Hash (char * key) { int sum = 0; int len = strlen(key); if (len % 2 == 1) len ++; // make len even for (int j = 0; j < len; j+2) sum = (sum + 100 * key[j] + key[j+1]) % 19937; return sum; } Figure 12.7 Function Hash (key) returns an integer hash value for key for a 15 bit

22 File StructuresSNU-OOPSLA Lab.22 Int MakeAddress (char * key, int depth) { int retval = 0; int hashVal = Hash(key); // reverse the bits for (int j = 0; j < depth; j++) { retval = retval << 1; int lowbit = hashVal & 1; retval = retval | lowbit; hashVal = hashVal >> 1; } return retval; } Figure 12.9 Function MakeAddress(key,depth)

23 File StructuresSNU-OOPSLA Lab.23 u Class Bucket: protected TextIndex u {protected: u Bucket (Directory & dir, int maxKeys = defaultMaxKeys); u int Insert (char * key, int recAddr); u int Remove(char * key); u Bucket * Split (); u int NewRange (int & newStart, int & newEnd); u int Redistribute (Bucket & newBucket); u int FindBuddy (); u int TryCombine (); u int Combine (Bucket * buddy, int buddyIndex); u int Depth; u Directory & Dir; u int BucketAddr; u friend class Directory; u friend class BucketBuffer; u }; Figure 12.10 Main members of class Bucket

24 File StructuresSNU-OOPSLA Lab.24 class Directory {public: Directory (…..); ~Directory(); int Open (..); int Create(…); int Close(); int Insert(…); int Delete(…); int Search(…); protected int DoubleSize(); int Collape(); int InsertBucket (….); int Find (…); int StoreBucket(…); int LoadBucket(…) ….. } Figure 12.11 Definition of class Directory

25 File StructuresSNU-OOPSLA Lab.25 12.4 Deletion u When to combine buckets u Buddy buckets: the buckets are siblings and at the leaf level of the tree (Buddy means something like friend) e.g., B and D in page 19 are buddy buckets u Examine the directory to see if we can make changes there u Shrink the directory if none of the buckets requires the depth of address information that is currently available in the directory

26 File StructuresSNU-OOPSLA Lab.26 Buddy Bucket u Given a bucket with an address uvwxy, where u, v, w, x, and y have values of either 0 or 1, the buddy bucket, if it exists, has the value uvwxz, such that z = y XOR 1 u If enough keys are deleted, the contents of buddy buckets can be combined into a single bucket

27 File StructuresSNU-OOPSLA Lab.27 Collapsing the Directory u Collapse condition u If a single cell, downsizing is impossible u If there is a pair of directory cells that do not both point to the same bucket, collapsing is impossible u Allocating space u Allocate half the size of the original u Copy the bucket references shared by each cell pair to a single cell in the new directory

28 File StructuresSNU-OOPSLA Lab.28 12.5 Extendible Hashing Performance u Time : O(1) u If the directory can kept in RAM: a single access u Otherwise: two accesses are necessary u Space utilization of the bucket u r (# of records), b (block size), N (# of Blocks) u Utilization = r / bN u Average utilization ==> 0.69 u Space utilization for the directory u How large a directory should we expect to have, given an expected number of keys? u Expected value for the directory size by Flajolet(1983) u Estimated directory size =3.92 / b X r (1+1/b)

29 File StructuresSNU-OOPSLA Lab.29 u Periodic and fluctuating u With uniform distributed addresses, all the buckets tend to fill up at the same time -> split at the same time u As buffer fills up : 90% u After a concentrated series of splits : 50% u r : # of records, b : block size u N ~= 4/(b ln 2) u Utilization = r / bN ~= ln 2 = 0.69 u Average utilization of 69% u B tree space utilization u Normal B-tree : 67%, B-tree with redistribution in insertion : 85 % Space utilization for buckets

30 File StructuresSNU-OOPSLA Lab.30 12.6 Alternative Approaches(1): 12.6 Alternative Approaches(1): Dynamic Hashing u Similar to dynamic extendible hashing u Use a directory to track bucket addresses u Extend the directory through the use of tries u Start with a hash function that covers an address space of a fixed size u When overflow occurs u splits forming the leaves of a trie that grows down from the original address node makes a trie

31 File StructuresSNU-OOPSLA Lab.31 u Two kinds of nodes u External node: reference a data bucket u Internal node: point to two children index nodes u When a node has split children, it changed from an external node to an internal node u Two hash functions u Apply the first hash function original address space u if external node is found : search is completed u if internal node is found : apply second hash function Alternative Approaches(2): Alternative Approaches(2): Dynamic Hashing

32 File StructuresSNU-OOPSLA Lab.32 1 2 3 4 4 1 2 3 40 41 4 1 3 1 410 20 21 41 411 2 Original address space Original address space Original address space (a) (b) (c)

33 File StructuresSNU-OOPSLA Lab.33 Dynamic Hashing vs. Extendible Hashing(1) Dynamic Hashing vs. Extendible Hashing(1) u Overflow handling u Both schemes extend the hash function locally, as a binary search trie u Both schemes use directory structure u Dynamic hashing: a linked structure u Extendible hashing: perfect tree expressible as an array u Space Utilization u both schemes is the same (space utilization : 69%)

34 File StructuresSNU-OOPSLA Lab.34 Dynamic Hashing and Extendible Hashing(2) u Growth of directory u Dynamic hashing: slower, more gradual growth u Extendible hashing: extend directory by doubling it u Actual size of an index node u Dynamic hashing is lager than a directory cell in extendible hashing (because of pointers) u Page fault u Dynamic hashing: more than one page fault (with linked structure for the directory) u Extendible hashing: single page fault

35 File StructuresSNU-OOPSLA Lab.35 Alternative Approaches(3): Alternative Approaches(3): Linear Hashing u Unlike extendible hashing and dynamic hashing, linear hashing does not use a directory. u The actual address space is extended one bucket at a time as buckets overflow u Because the extension of the address space does not necessarily correspond to the bucket that is overflowing, linear hashing necessarily involves the use of overflow buckets, even as the address space expands u No directories : Avoid additional seek resulting from additional layer u Use more bits of hashed value u h d (k) : depth d hashing function (using function make_address)

36 File StructuresSNU-OOPSLA Lab.36 a b c d 00 01 10 11 a b c d A w 00 01 10 11 100 101 a b c d A B x a b c d A B C 00 01 10 11 100 101 110 x y (a) (b) (c) (d) (continued...) The growth of address space in linear hashing(1) 000 01 10 11 100

37 File StructuresSNU-OOPSLA Lab.37 a b c d A B C D 00 01 10 11 100 101 110 111 x (e) The growth of address space in linear hashing(2)

38 File StructuresSNU-OOPSLA Lab.38 Alternative Approaches(5) : Approaches to Controlling Splitting u Postpone splitting: increase space utilization u B-Tree: redistribution rather than splitting u Hashing: placing records in chains of overflow buckets to postpone splitting u Triggering event for splitting u Linear hashing u Every time any bucket overflows u Not split overflowing bucket u Litwin(1980): overall load factor of the file u Below 2 seeks, 75% ~ 80% storage utilization

39 File StructuresSNU-OOPSLA Lab.39 Alternative Approaches(5) : Approaches to Controlling Splitting u Postpone splitting for extensible hashing u Use chaining overflow bucket u Avoid doubling directory space u 1.1 seek, 76% ~ 81% storage utilization

40 File StructuresSNU-OOPSLA Lab.40 Let’s Review !!! u 12.1 Introduction u 12.2 How extendible hashing works u 12.3 Implementation u 12.4 Deletion u 12.5 Extendible hashing performance u 12.6 Alternative approaches


Download ppt "File StructuresSNU-OOPSLA Lab.1 Chap12. Extendible Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures by Folk, Zoellick and Riccardi."

Similar presentations


Ads by Google