Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hashing.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)
CSCE 3400 Data Structures & Algorithm Analysis
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Log Files. O(n) Data Structure Exercises 16.1.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
Dictionaries and Hash Tables1  
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Chapter 5: Hashing Hash Tables
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Get(theKey)  Delete(theKey)  Insert(theKey, theElement)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Lecture 10: Search Structures and Hashing
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Dictionaries and Hash Tables. Dictionary A dictionary, in computer science, implies a container that stores key-element pairs called items, and allows.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing functions Have many uses. We can use them to hash values into a hashing table, but they have more general uses such as computing a unique identifier.
Hash Tables1   © 2010 Goodrich, Tamassia.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
 2008 Pearson Education, Inc. All rights reserved Case Study: Random Number Generation C++ Standard Library function rand – Introduces the element.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Data Structures. 2 Motivating Quotation “Every program depends on algorithms and data structures, but few programs depend on the invention of brand.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
SETS AND HASHING. SETS An un-ordered collection of values Operations (S and T are sets): S ∩ T // the intersection of S and T S U T // The Union of S.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Hashing, Hashing Tables Chapter 8. Class Hierarchy.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 9 Hashing Dr. Youssef Harrath
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Dictionaries and Hashing CSCI 3333 Data Structures.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
State Representation of State Space Searching Alan Tam Siu Lung
1 Introduction to Hashing - Hash Functions Sections 5.1 and 5.2.
CSE 311 Foundations of Computing I Lecture 12 Modular Arithmetic and Applications Autumn 2012 CSE
CSE 311 Foundations of Computing I Lecture 11 Modular Exponentiation and Primes Autumn 2011 CSE 3111.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Introduction to Hashing - Hash Functions
Hash Functions Sections 5.1 and 5.2
Hash Table.
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Andy Wang Data Structures, Algorithms, and Generic Programming
Chapter 5: Hashing Hash Tables
Presentation transcript:

Hash Functions Andy Wang Data Structures, Algorithms, and Generic Programming

Introduction Hash function – Maps keys to integers (buckets) Hash(Key) = Integer – Ideally in a random-like manner Evenly distributed bucket values Even if the input data is not evenly distributed

An Example ID Number Generation – Key = your name – Hash(Key) = a number Not a great hash function… – Two people with the same name will have the same number…

Simple Hash Functions Assumptions: – K: an unsigned 32-bit integer – M: the number of buckets (the number of entries in a hash table) Goal: – If a bit is changed in K, all bits are equally likely to change for Hash(K)

A Simple Hash Function… What if K = M? Hash(K) = K What is wrong? Your student ID = SSN – I can’t use your SSN to post your grades…

Another Simple Function If K > M Hash(K) = K % M What is wrong? Suppose M = 4, K = 2, 4, 6, 8 K % M = 2, 0, 2, 0

Yet Another Simple Function If K > P, P = prime number Hash(K) = K % P Suppose P = 3, K = 2, 4, 6, 8 K % P = 2, 1, 0, 3 More uniform distribution…but still problematic for other cases

More on Prime Numbers K > P 1 > P 2, P 1 and P 2 are prime numbers Hash(K) = (K % P 1 ) % P 2 Suppose P 1 = 5, P 2 = 3, K = 2, 4, 6, 8, 10 (K % 5) = 2, 4, 1, 3, 0 (K % 5) % 3 = 2, 1, 1, 0, 0 Still uniform distribution

Polynomial Functions If K > P, P = prime number Hash(K) = K(K + 3) % P Slightly better than pure modulo functions

How About… Hash(K) = rand() What is wrong? Not repeatable

How About… K > P, P = prime number Hash(K) = rand(K) % P Better randomness Can be expensive to compute random numbers

Pre-generated Randomness Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to rand(i) % P 2 Hash(K) = R[K % P 1 ] Slight Problem: Possible duplicate mapping

To Avoid Duplicate Mapping… Two prime numbers: P 1 and P 2 K > P 1 and K > P 2 A table R[P 1 ], with R[i] pre-initialized to unique random numbers Hash(K) = R[K % P 1 ]

An Example K = 0…2 32, P 1 = 3, P 2 = 5 R[3] = {0, 4, 1} Hash(K) = R[K % 3]

Hashing a Sequence of Keys K = {K 1, K 2, …, K n ) E.g., Hash(“test”) = Design Principles – Use the entire key – Use the ordering information – Use pre-generated randomness

Use the Entire Key unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] } return hash; } Problem: Hash(“ab”) == Hash(“ba”)

Use the Ordering Information unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] hash = /* hash with some shiftings */ } return hash; } Problem: H(short keys) will not perturb all 32-bits (clustering)

Use Pre-generated Randomness unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ R[Key[j]] hash = /* hash with some shiftings */ } return hash; }

CRC Variant Do 5-bit circular shift of hash XOR hash and K[j] … for (…) { highorder = hash & 0xf ; hash = hash << 5; hash = hash ^ (highorder >> 27) hash = hash ^ K[j]; } …

CRC Variant + For long keys, all 32-bits are exercised + More randomness toward lower bits - Not all bits are changed for short keys

BUZ Hash Set up an array R to store precomputed random numbers … for (…) { highorder = hash & 0x ; hash = hash << 1; hash = hash ^ (highorder >> 31) hash = hash ^ R[K[j]]; } …

References Aho, Sethi, and Ullman. Compilers: Principles, Techniques, and Tools, Cormen, Leiserson, River. Introduction to Algorithms, 1990 Knuth. The Art of Computer Programming, 1973 Kuenning. Hash Functions, 2003.