File Processing - Hash File Considerations MVNC1 Hash File Considerations.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Data Structures Using C++ 2E
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing Techniques.
Introduction to Hashing & Hashing Techniques
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Searching Chapter 2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Copyright © Curt Hill Hashing A quick lookup strategy.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Data Structures Chapter 8: Hashing 8-1. Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted.
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Table.
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Hash Tables.
Resolving collisions: Open addressing
Advance Database System
What we learn with pleasure we never forget. Alfred Mercier
Collision Resolution.
Presentation transcript:

File Processing - Hash File Considerations MVNC1 Hash File Considerations

File Processing - Hash File Considerations MVNC2 Hashing - Hash File Considerations l Statistical Considerations »Record Distribution is important »Ideal - one record per location »Load Factor - How full the file is –Load Factor = r / b * m –r - number of records stored –b - bucket size –m - number of addresses

File Processing - Hash File Considerations MVNC3 Hashing - Statistical Considerations l Graphing Record Distribution »Frequency Distribution Graph –Y axis - records per address –X axis - RRP »Alternate Frequency Distribution Graph –Y axis - Number of address with x records –X axis - x records assigned l Example - (x DIV 5) MOD 4, » Data: 22, 1, 14, 56, 25, 13, 43, 62, 11

File Processing - Hash File Considerations MVNC4 Hashing - Overall Guidelines l If possible, use uniformly distributed Keys l Use a carefully designed hashing scheme »Do statistical studies if possible »Monitor performance »Should be computationally efficient l Taylor bucket size and load factor to particular I/O device

File Processing - Hash File Considerations MVNC5 Hashing - Advantages l Flexibility »Adaptable to a variety of situations »Useful both for disk and memory based retrieval l Efficiency of record access »Can achieve O(1) access times

File Processing - Hash File Considerations MVNC6 Hashing - Disadvantages l No ordered record access by PK l Data (key set) dependency »Must be specifically tailored for each key distribution and form »If characteristics change, hashing scheme may need to change l Fixed upper limit on file size »Size determined at creation time »Must "rehash" to larger file if expansion needed »May need to redesign hash algorithm as well

File Processing - Hash File Considerations MVNC7 Hashing Considerations l Static vs. Dynamic Files »Static files –fixed key data –entire domain of keys known a priori (key set) –By experimentation, my be able to find collision free solution –Examples l Assembler OP code table l FAX group three compression table

File Processing - Hash File Considerations MVNC8 Hashing Considerations l Static vs. Dynamic Files »Dynamic files –Key set not known in advance –Patterns/samples of keys may be known –Collision free solution not generally possible –Experimentation may be used to to fine good hash algorithm and configuration. l Hash Algorithm technique l File size l bucket size l Overflow strategy

File Processing - Hash File Considerations MVNC9 Hashing Considerations l Static vs. Dynamic Hashing »Static Hashing –file size fixed over life of file –must rebuild to make larger »Dynamic Hashing –file may expand and contract over time –called extensible hashing

File Processing - Hash File Considerations MVNC10 Hashing Considerations l Distribution of keys »May know some information about key distribution in advance –Complete set –patterns are predicable –completely unpredictable

File Processing - Hash File Considerations MVNC11 Hashing Considerations l Files versus arrays »Hashing suitable for both primary and secondary retrieval purposes. »Primary memory based systems –I/O time not a consideration l buckets not really helpful –Other factors gain in importance l Hash algorithm complexity l overflow technique

File Processing - Hash File Considerations MVNC12 Hashing Considerations l Hash Algorithms - general forms »Division –Division remainder scheme an example. –Choice of divisor importance l Should be prime relative to the file size. l Should not be a power of two. l Bad choices result in simple truncation, thus part of the key is simply discarded.

File Processing - Hash File Considerations MVNC13 Hashing Considerations l Hash Algorithms - general forms »Multiplication –Multiplicative techniques tend to use ALL of the information in the key (no truncation) –Mid-square technique is an example. »Compression. extraction, folding –Useful for large keys

File Processing - Hash File Considerations MVNC14 Hashing Considerations l Hash Algorithms - general forms »Double Hashing –Rather then progressive overflow on collision, use a secondary hash function to generate a step length for the next probe –Helps reduce secondary clustering of linear probing with step size greater then one. –Non-linear, or random probing

File Processing - Hash File Considerations MVNC15 Hashing Considerations l Hash Algorithms - general forms »Multi-Attribute hashing –Base the calculation for home address on more than the primary key attribute. –Useful if the primary key exhibits certain bad hashing attributes (clustering, etc.) –Example - use part number (PK) and distributor fields. »Extendible Hashing –See text