Download presentation
Presentation is loading. Please wait.
Published bySharon Simmons Modified over 9 years ago
1
File Processing - Hash File Considerations MVNC1 Hash File Considerations
2
File Processing - Hash File Considerations MVNC2 Hashing - Hash File Considerations l Statistical Considerations »Record Distribution is important »Ideal - one record per location »Load Factor - How full the file is –Load Factor = r / b * m –r - number of records stored –b - bucket size –m - number of addresses
3
File Processing - Hash File Considerations MVNC3 Hashing - Statistical Considerations l Graphing Record Distribution »Frequency Distribution Graph –Y axis - records per address –X axis - RRP »Alternate Frequency Distribution Graph –Y axis - Number of address with x records –X axis - x records assigned l Example - (x DIV 5) MOD 4, » Data: 22, 1, 14, 56, 25, 13, 43, 62, 11
4
File Processing - Hash File Considerations MVNC4 Hashing - Overall Guidelines l If possible, use uniformly distributed Keys l Use a carefully designed hashing scheme »Do statistical studies if possible »Monitor performance »Should be computationally efficient l Taylor bucket size and load factor to particular I/O device
5
File Processing - Hash File Considerations MVNC5 Hashing - Advantages l Flexibility »Adaptable to a variety of situations »Useful both for disk and memory based retrieval l Efficiency of record access »Can achieve O(1) access times
6
File Processing - Hash File Considerations MVNC6 Hashing - Disadvantages l No ordered record access by PK l Data (key set) dependency »Must be specifically tailored for each key distribution and form »If characteristics change, hashing scheme may need to change l Fixed upper limit on file size »Size determined at creation time »Must "rehash" to larger file if expansion needed »May need to redesign hash algorithm as well
7
File Processing - Hash File Considerations MVNC7 Hashing Considerations l Static vs. Dynamic Files »Static files –fixed key data –entire domain of keys known a priori (key set) –By experimentation, my be able to find collision free solution –Examples l Assembler OP code table l FAX group three compression table
8
File Processing - Hash File Considerations MVNC8 Hashing Considerations l Static vs. Dynamic Files »Dynamic files –Key set not known in advance –Patterns/samples of keys may be known –Collision free solution not generally possible –Experimentation may be used to to fine good hash algorithm and configuration. l Hash Algorithm technique l File size l bucket size l Overflow strategy
9
File Processing - Hash File Considerations MVNC9 Hashing Considerations l Static vs. Dynamic Hashing »Static Hashing –file size fixed over life of file –must rebuild to make larger »Dynamic Hashing –file may expand and contract over time –called extensible hashing
10
File Processing - Hash File Considerations MVNC10 Hashing Considerations l Distribution of keys »May know some information about key distribution in advance –Complete set –patterns are predicable –completely unpredictable
11
File Processing - Hash File Considerations MVNC11 Hashing Considerations l Files versus arrays »Hashing suitable for both primary and secondary retrieval purposes. »Primary memory based systems –I/O time not a consideration l buckets not really helpful –Other factors gain in importance l Hash algorithm complexity l overflow technique
12
File Processing - Hash File Considerations MVNC12 Hashing Considerations l Hash Algorithms - general forms »Division –Division remainder scheme an example. –Choice of divisor importance l Should be prime relative to the file size. l Should not be a power of two. l Bad choices result in simple truncation, thus part of the key is simply discarded.
13
File Processing - Hash File Considerations MVNC13 Hashing Considerations l Hash Algorithms - general forms »Multiplication –Multiplicative techniques tend to use ALL of the information in the key (no truncation) –Mid-square technique is an example. »Compression. extraction, folding –Useful for large keys
14
File Processing - Hash File Considerations MVNC14 Hashing Considerations l Hash Algorithms - general forms »Double Hashing –Rather then progressive overflow on collision, use a secondary hash function to generate a step length for the next probe –Helps reduce secondary clustering of linear probing with step size greater then one. –Non-linear, or random probing
15
File Processing - Hash File Considerations MVNC15 Hashing Considerations l Hash Algorithms - general forms »Multi-Attribute hashing –Base the calculation for home address on more than the primary key attribute. –Useful if the primary key exhibits certain bad hashing attributes (clustering, etc.) –Example - use part number (PK) and distributor fields. »Extendible Hashing –See text
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.