Download presentation
Presentation is loading. Please wait.
1
Chapter 11. Hashing
2
Contents Introduction A Simple Hashing Algorithm
Hashing Functions and Record Distributions How Much Extra Memory Should Be Used? Collision Resolution by Progressive Overflow Storing More Than One Record per Address: Buckets Making Deletions Other Collision Resolution Techniques Patterns of Record Access
3
1. Introduction O-notation O(1) O(N) : sequential searching O(log2N)
O(logkN) : B-Tree (k : 리프 노드 크기) What is Hashing? a = h(K) h (hash function), K (key), a (home address) Example K = BASS h = (first char * second char) mod 1000 a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290
4
Introduction Collision
Example key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4 Several ways to reduce the number of collisions 1. Spread out the records Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address Buckets
5
2. A Simple Hashing Algorithm
3 Steps 1. Represent the key in numerical form 2. Fold and add 3. Divide by a prime number and use the remainder as the address Example Step 1. Represent the Key in Numerical Form LOWELL = L O W E L L Blanks
6
A Simple Hashing Algorithm
Example (계속) Step 2. Fold and Add 76 79 | | | | | = ( = => 2byte Maximum 값 을 초과하므로) = => mod = = => mod = = => mod = = => mod = = => mod = 13883 Step 3. Divide by the Size of the Address Space a = s mod n (n : # of address in file) a = mod 100 = 83 a = mod 101 = 46
7
3. Hashing Functions and Record Distributions
Distributing Records among Addresses 1 2 3 4 5 6 7 8 9 10 A B C D E F G Record Address Best (a) 1 2 3 4 5 6 7 8 9 10 A B C D E F G Record Address Worst (b) Acceptable Record Address 1 2 3 4 5 6 7 8 9 10 A B C D E F G (c) <Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)
8
Hashing Functions and Record Distributions
Some Other Hashing Methods Better than random Examine keys for a pattern 주민등록 번호 Divide the key by a prime number Random Square the key and take the middle => Radix transformation
9
4. How Much Extra Memory Should Be Used ?
Packing Density Example r = 75 records N = 100 address
10
How Much Extra Memory Should Be Used ?
Predicting Collisions for Different Packing Densities Packing density (%) Synonyms (%) <Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses
11
5. Collision Resolution by Progressive Overflow
Open addressing Linear probing address 3 York h(K) 1 2 Rosen Novak’s home address 3 Jasper York’s home address address 2 Novak h(K) 4 York
12
Collision Resolution by Progressive Overflow
Search Length Key Home Address # of Access (Search Length) Adams Bates Cole Dean Evans Adams 1 Bates 2 Cole 3 Dean 4 Evans 5
13
Collision Resolution by Progressive Overflow
Search Length (계속) Example <Figure 11.7> Average search length versus packing density in a hashed file
14
6. Storing More Than One Record per Address : Buckets
Key Home Address Green Hall Jenks King Land Marx Nutt Green Hall 1 Jenks 2 King Land Marks 3 Nutt 4
15
Storing More Than One Record per Address : Buckets
Effects of Buckets on Performance r : # of records N : # of addresses b : # of records in a bucket File without buckets File with buckets # of records # of addresses Bucket size Packing density Ratio of records to addresses r = 750 N = b = r/N = 0.75 r = 750 N = 500 b = r/N = 1.5
16
Storing More Than One Record per Address : Buckets
<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes Packing density Bucket size 1 2 5 10 20 % 50 % 80 % 100 % 9.4 21.3 31.2 36.8 2.2 10.4 20.4 27.1 0.1 2.5 10.3 17.6 0.0 0.4 5.3 12.5
17
7. Making Deletions 처음상태 Key Home Address Actual address Adams Jones
Morris Smith 1 2 3 Adams Jones 1 Morris 2 Smith 3
18
Making Deletions (1) Tombstones for Handling Deletions
Adams Jones 1 Morris 2 Smith 3 * Deletion of Morris Adams Jones 1 ### 2 Smith 3 “Smith는 찾을 수 없다” ### : tombstone This mark indicates that a record once lived there but no longer does
19
Making Deletions (2) Implications of Tombstones for Insertions
Inserting “Smith” (3) Effects of Deletions and Additions on Performance Solution to problem of deteriorating average search length Reorganization
20
8. Other Collision Resolution Techniques
(1) Double Hashing Second hashing function Increment(c) adding Seek time overhead
21
Other Collision Resolution Techniques
(2) Chained Progressive Overflow Adams 1 Bates 2 Cole Key Home address Actual Address Search length(1) Search length(2) Adams Bates Cole Dean Evans Flint 3 Dean 4 Evans 5 Flint Adams 2 1 Bates 3 2 Cole 5 3 Dean -1 4 Evans -1 5 Flint -1
22
Other Collision Resolution Techniques
(3) Chaining with a Separate Overflow Area Home address Primary data area Overflow area Adams Cole 2 1 Bates 1 Dean -1 2 Flint -1 3 4 Evans -1
23
Other Collision Resolution Techniques
(4) Scatter Tables: Indexing Revisited Adams 1 1 2 3 4 Coles 3 Bates 4 Flint -1 Deans -1 Evans -1
24
Patterns of Record Access
A small percentage of the records in a file account for a large percentage of the accesses : / 20 Rule 80% of the accesses are performed on 20% of the records
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.