D ESIGN & A NALYSIS OF A LGORITHM 01 – H ASHING Informatics Department Parahyangan Catholic University.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Hash Table indexing and Secondary Storage Hashing.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing Lecture 10. More Efficient Searching Question: Can searching be more efficient than logarithmic O(log n) significantly? Example: a set of students.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Design & Analysis of Algorithm Hashing
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Hashing.
Lecture-Hashing.
Presentation transcript:

D ESIGN & A NALYSIS OF A LGORITHM 01 – H ASHING Informatics Department Parahyangan Catholic University

M OTIVATION We have seen many data structures: array, linked list, stack, and queue. Each has its own strength and weaknesses Consider the case when we want to find an element in a data structure Unsorted array  sequential search O(n) Sorted array  binary search O(lg n) Linked list  sequential search O(n) Can we achieve O(1) performance ?

A NALOGY After hours of playing the same hidden object game, we remember where each item is located, thus our search is no longer “sequential”. Moreover, we can find them right away

A NALOGY We remember which rack sells our favorite item in a supermarket, thus we directly goes to that rack without checking the other racks.

A NALOGY We know where to find “ Universitas Parahyangan ” in Yellow Pages – it must be around the beginning of “ P ” section.

H OW DOES HASHING WORK ? 1. Associate keys with values Given a key (e.g. a company’s name), retrieve the value (e.g. phone number, address, etc.) for the given key 2. A hash function is defined to map the key to an index of a table where the value is stored e.g. “Universitas Parahyangan” is stored at section “P” O(1) for insertion and lookup Is it really O(1)?

A NOTHER E XAMPLE Dairy products are barcoded (given a key) and put into “Dairy” rack (mapped to a table)

A NOTHER E XAMPLE John Smith Lisa Smith Sam Doe J (74) K (75) L (76) … S (83) John Smith Lisa Smith Sam Doe DATA = KEY + VALUE KEYHASH TABLE

D IRECT A DDRESSING If the size of universe of keys is small, and the keys are unique, then we can set up a table whose size is the same as the universe’s size. Each slot with index k in the table stores element with key k. If no element with key k, then slot with index k is empty

D IRECT A DDRESSING :: EXAMPLE Student’s NPM : XXXXYYZZZZ XXXX = year YY = faculty and department’s number ZZZZ = student’s number Key’s universe is – (10,000,000,000 keys)  very big ! Let’s say we only want to store the data of Informatics Department (YY=73). Key’s universe is – (100,000,000 keys)  still a lot !

D IRECT A DDRESSING :: EXAMPLE First year of Parahyangan’s Informatics Department is Let’s say we want to store student’s information only up to year Key’s universe is – (24,009,999 keys)  still a lot ! But the “73” part is always the same, so let’s cut it out ! Then the key’s universe becomes – (249,999 keys)  better ! We can save even more by considering that each year’s student never exceed 999 (doesn’t need the 4 th digit) and write the year in 2 digits format.

P ROBLEMS IN D IRECT A DDRESSING Only implementable if the size of universe is small What if we want to store IP addresses ? to = 256^4 = 4,294,967,296 = 4GB space What if we want to store 10 characters names ? = 26^10 = 141,167,095,653,376 What if we want to store 16 digits KTP numbers ? = 10^16 = 10,000,000,000,000,000 What if 50 characters address ? When the size is big: Requires too much memory space Inefficient if only a small portion of the keys are stored When the size is big: Requires too much memory space Inefficient if only a small portion of the keys are stored Solution: Use hash table with size |K| = the number of keys stored e.g. in the previous example, we don’t need to prepare a space for data before year 1996 Requires fewer storage space but still O(1) time complexity for lookup Solution: Use hash table with size |K| = the number of keys stored e.g. in the previous example, we don’t need to prepare a space for data before year 1996 Requires fewer storage space but still O(1) time complexity for lookup

H ASH F UNCTION & H ASH T ABLE A hash function h(k) is defined to map the key k to an index of a table where the element with key k is stored John Smith Lisa Smith Sam Doe J (74) K (75) L (76) … S (83) KEYHASH TABLE Hash function e.g. take the ASCII number of the first character Hash function e.g. take the ASCII number of the first character The value h(k) is called hash value

E XAMPLE :: NPM Hash function : 1.Extract the last 2 digits of year 2.Extract the last 3 digits of student number 3.Concatenate the two of them Hash function : 1.Extract the last 2 digits of year 2.Extract the last 3 digits of student number 3.Concatenate the two of them

C OLLISION Since the storage size is reduced, two distinct keys k 1 and k 2 may be mapped to the same index h(k 1 ) = h(k 2 ) This condition is known as collision  resolution strategy is required (we shall see later) Example: John Smith Jane Smith J (74) K (75) L (76) … Hash function e.g. take the ASCII number of the first character Hash function e.g. take the ASCII number of the first character

C HOOSING H ASH F UNCTION Deterministic h(k) always gives the same result for the same k Easy to compute needs to be O(1), otherwise insertion and lookup become expensive The range has to agree with table size must not map any value outside the hash table

T YPES OF H ASH F UNCTION Modular/Division Truncation Multiplicative Folding Length-dependent

Define the table size M h(k) = k mod M M should be prime numbers, since prime numbers provide better distribution in the table Why should M be prime ? H ASH F UNCTION M ODULAR /D IVISION

Suppose we want to store NPM into a hash table with hash function h(k) = k mod 100 So, only the last 2 digits of NPM determine the hash value Why should M be prime ? Observe that there are more students with small NPM than students with large NPM. Additionally, NPM ≥100 are also hashed to index 0..99, thus the smaller indexes have more collisions Using prime number for M gives a better distribution (thus less collisions) because every digits of the key contribute to the hash value.

H ASH F UNCTION T RUNCATION Take the last n digits/characters as table index e.g. taking the last 3 digits of your NPM Fast, but often cannot evenly distribute the keys in the table What is the difference with Modulo/Division method ? Similar reason as the previous example

H ASH F UNCTION M ULTIPLICATIVE Suppose we have a floating point key k, 0 ≤ k < 1 And a hash table of size M Define M = 10 k = k =

H ASH F UNCTION M ULTIPLICATIVE What if key’s domain is not a floating point ? Choose a floating point A in the range 0 < A < 1 Define floating point floating point ranged 0..1

H ASH F UNCTION M ULTIPLICATIVE Does the value of M matter ? M doesn’t matter Usually M is a power of 2 since it’s easier to implement on most computer Does the value of A matter ? This method works practically with any valid A, but some works better than the other Knuth suggest that A ≈ (√5 – 1)/2 = … (golden ratio) is likely to work reasonably well Disadvantage ? Computing hash value is slower than modular method

H ASH F UNCTION F OLDING /S HIFTING Just like folding a paper

H ASH F UNCTION F OLDING /S HIFTING Like cutting the paper and stacks them up

H ASH F UNCTION L ENGTH - DEPENDENT Useful when the keys do not have the same length  use the length of the key as one of the hashing function’s parameter E.g. the keys are names of people, take the sum of first 5 characters plus its length to get the table index (can be combined with modular method if needed)

S TRING TO INTEGER KEY What if the type of the key is not a number ? (e.g. string) Treat the string as a base n number Base 26 if string consist of A..Z only e.g. A is digit 0, B is 1, …, and Z is 25 Base 52 if string consist of A..Z, a..z only e.g. a = 0, … z = 25, A = 26, … Z = 51 Base 256 if string consist of all possibly ASCII characters Similar approach can be used to encode other key types

S TRING TO INTEGER KEY Be careful when choosing number’s base and M ! Both numbers should be coprime to each other (do not have common factor other than 1) Example : String is treated as base 26 number M = 13 ABC 26 = (Cx26 0 )mod 13+ (Bx26 1 )mod 13 + (Ax26 2 )mod 13 = 1 = 2 x 13 = (2 x 13) 2 multiply of 13 C C0 0 Only the last digit which is not 0, thus not every digit contributes to the hash value