1  H ASH TABLES ARE A COMMON APPROACH TO THE STORING / SEARCHING PROBLEM. Hashing & Hash Tables Dr. Yousef Qawqzeh CSI 312 Data Structures.

Slides:



Advertisements
Similar presentations
Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach to the.
Advertisements

Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching “It is better to search, than to be searched” --anonymous.
CSC212 Data Structure - Section AB Lecture 20 Hashing Instructor: Edgardo Molina Department of Computer Science City College of New York.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing Techniques.
hashing1 Hashing It’s not just for breakfast anymore!
1 Introduction to Hashing & Hashing Techniques Review of Searching Techniques Introduction to Hashing Hash Tables Types of Hashing Hash Functions Applications.
Introduction to Hashing & Hashing Techniques
hashing1 Hashing It’s not just for breakfast anymore!
L l Chapter 11 discusses several ways of storing information in an array, and later searching for the information. l l Hash tables are a common approach.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
1 Hashing: Collision Resolution Schemes Collision Resolution Techniques Introduction to Separate Chaining Collision Resolution using Separate Chaining.
Collision Resolution: Open Addressing
Hashing General idea: Get a large array
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
CSC211 Data Structures Lecture 20 Hashing Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Department of Computer Engineering Faculty of Engineering, Prince of Songkla University 1 9 – Hash Tables Presentation copyright 2010 Addison Wesley Longman,
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Unit 12 Hashing King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashtables.
Data Structures Using C++ 2E
Hashing, Hash Function, Collision & Deletion
CSCI 210 Data Structures and Algorithms
School of Computer Science and Engineering
Hashing CENG 351.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Search by Hashing.
Hash functions Open addressing
Introduction to Hashing & Hashing Techniques
Advanced Associative Structures
Hash Table.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Hash Tables.
Hash Tables and Associative Containers
Data Structures – Week #7
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advance Database System
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Introduction to Hashing & Hashing Techniques
CSC212 Data Structure - Section KL
Data Structures – Week #7
What we learn with pleasure we never forget. Alfred Mercier
Data Structures and Algorithm Analysis Hashing
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Lecture-Hashing.
Hash Tables Chapter 11 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common approach.
Presentation transcript:

1  H ASH TABLES ARE A COMMON APPROACH TO THE STORING / SEARCHING PROBLEM. Hashing & Hash Tables Dr. Yousef Qawqzeh CSI 312 Data Structures

2 W HAT IS A H ASH T ABLE ? The simplest kind of hash table is an array of records. This example has 701 records. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] An array of records... [ 700]

3 W HAT IS A H ASH T ABLE ? Each record has a special field, called its key. In this example, the key is a long integer field called Number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [ 700] [ 4 ] Number

4 W HAT IS A H ASH T ABLE ? The number might be a person's identification number, and the rest of the record has information about the person. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ]... [ 700] [ 4 ] Number

5 W HAT IS A H ASH T ABLE ? When a hash table is in use, some spots contain valid records, and other spots are "empty". [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number

6 I NSERTING A N EW R ECORD In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

7 I NSERTING A N EW R ECORD Typical way create a hash value: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number (Number mod 701) What is ( mod 701) ?

8 I NSERTING A N EW R ECORD Typical way to create a hash value: [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number (Number mod 701) What is ( mod 701) ? 3

9 I NSERTING A N EW R ECORD The hash value is used for the location of the new record. Number [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number [3]

10 I NSERTING A N EW R ECORD The hash value is used for the location of the new record. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

11 C OLLISIONS Here is another new record to insert, with a hash value of 2. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2].

12 C OLLISIONS This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot.

13 C OLLISIONS This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot.

14 C OLLISIONS This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number When a collision occurs, move forward until you find an empty spot.

15 C OLLISIONS This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number The new record goes in the empty spot.

16 A Q UIZ Where would you be placed in this table, if there is no collision? Use your social security number or some other favorite number. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number

17 S EARCHING FOR A K EY The data that's attached to a key can be found fairly quickly. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number

18 S EARCHING FOR A K EY Calculate the hash value. Check that location of the array for the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

19 S EARCHING FOR A K EY Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

20 S EARCHING FOR A K EY Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Not me.

21 S EARCHING FOR A K EY Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Yes!

22 S EARCHING FOR A K EY When the item is found, the information can be copied to the necessary location. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number My hash value is [2]. Yes!

23 D ELETING A R ECORD Records may also be deleted from a hash table. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Number Please delete me.

24 D ELETING A R ECORD Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number

25 D ELETING A R ECORD [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number Number Number Number Number Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. The location must be marked in some special way so that a search can tell that the spot used to have something in it.

26  H ASH TABLES STORE A COLLECTION OF RECORDS WITH KEYS.  T HE LOCATION OF A RECORD DEPENDS ON THE HASH VALUE OF THE RECORD ' S KEY.  W HEN A COLLISION OCCURS, THE NEXT AVAILABLE LOCATION IS USED.  S EARCHING FOR A PARTICULAR KEY IS GENERALLY QUICK.  W HEN AN ITEM IS DELETED, THE LOCATION MUST BE MARKED IN A SPECIAL WAY, SO THAT THE SEARCHES KNOW THAT THE SPOT USED TO BE USED. Summary

27 R EVIEW OF SEARCHING T ECHNIQUES The efficiency of these search strategies depends on the number of items in the container being searched. Search methods with efficiency independent on data size would be better. Consider the following Java class that describes a student record: The id field in this class can be used as a search key for records in the container. class StudentRecord { String name; // Student name double height; // Student height long id; // Unique id }

28 H ASHING Suppose that we want to store 10,000 students records (each with a 5-digit ID) in a given container. · A linked list implementation would take O(n) time. · A height balanced tree would give O(log n) access time. · Using an array of size 100,000 would give O(1) access time but will lead to a lot of space wastage. Is there some way that we could get O(1) access without wasting a lot of space? The answer is hashing.

29 E XAMPLE 1: I LLUSTRATING H ASHING Use the function f(r) = r.id % 13 to load the following records into an array of size Al-Otaibi, Ziyad Al-Turki, Musab Ahmad Bakeer Al-Saegh, Radha Mahdi Al-Shahrani, Adel Saad Al-Awami, Louai Adnan Muhammad Al-Amer, Yousuf Jauwad Al-Helal, Husain Ali AbdulMohsen

30 E XAMPLE 1: I NTRODUCTION TO H ASHING ( CONT ' D ) HusainYousufLouaiZiyadRadhaMusabAdel h(r) = id % 13IDName Al-Otaibi, Ziyad Al-Turki, Musab Ahmad Bakeer Al-Saegh, Radha Mahdi Al-Shahrani, Adel Saad Al-Awami, Louai Adnan Muhammad Al-Amer, Yousuf Jauwad Al-Helal, Husain Ali AbdulMohsen

31 H ASH T ABLES There are two types of Hash Tables: Open-addressed Hash Tables and Separate- Chained Hash Tables. An Open-addressed Hash Table is a one-dimensional array indexed by integer values that are computed by an index function called a hash function. A Separate-Chained Hash Table is a one-dimensional array of linked lists indexed by integer values that are computed by an index function called a hash function. Hash tables are sometimes referred to as scatter tables..\ Typical hash table operations are: · Initialization. · Insertion. · Searching · Deletion.

32 T YPES OF H ASHING There are two types of hashing : 1. Static hashing: In static hashing, the hash function maps search-key values to a fixed set of locations. 2. Dynamic hashing: In dynamic hashing a hash table can grow to handle more items. The associated hash function must change as the table grows. The load factor of a hash table is the ratio of the number of keys in the table to the size of the hash table. Note: The higher the load factor, the slower the retrieval.

33 H ASH F UNCTIONS A hash function, h, is a function which transforms a key from a set, K, into an index in a table of size n: h: K -> {0, 1,..., n-2, n-1} A key can be a number, a string, a record etc. The size of the set of keys, |K|, to be relatively very large. It is possible for different keys to hash to the same array location. This situation is called collision and the colliding keys are called synonyms.

34 H ASH F UNCTIONS ( CONT ’ D ) A good hash function should: · Minimize collisions. · Be easy and quick to compute. · Distribute key values evenly in the hash table. · Use all the information provided in the key.

35 C OMMON H ASHING F UNCTIONS 1. Division Remainder (using the table size as the divisor) Computes hash value from key using the % operator. Table size that is a power of 2 like 32 and 1024 should be avoided, for it leads to more collisions. Also, powers of 10 are not good for table sizes when the keys rely on decimal integers. Prime numbers not close to powers of 2 are better table size values.

36 C OMMON H ASHING F UNCTIONS ( CONT ’ D ) 2. Truncation or Digit/Character Extraction Works based on the distribution of digits or characters in the key. More evenly distributed digit positions are extracted and used for hashing purposes. For instance, students IDs or ISBN codes may contain common subsequences which may increase the likelihood of collision. Very fast but digits/characters distribution in keys may not be very even.

37 C OMMON H ASHING F UNCTIONS ( CONT ’ D ) 3. Folding It involves splitting keys into two or more parts and then combining the parts to form the hash addresses. To map the key to a range between 0 and 9999, we can: · split the number into two as 2593 and 6715 and · add these two to obtain 9308 as the hash value. Very useful if we have keys that are very large. Fast and simple especially with bit patterns. A great advantage is ability to transform non-integer keys into integer values.

38 C OMMON H ASHING F UNCTIONS ( CONT ’ D ) 4. Radix Conversion Transforms a key into another number base to obtain the hash value. Typically use number base other than base 10 and base 2 to calculate the hash addresses. To map the key in the range 0 to 9999 using base 11 we have: = We may truncate the high-order 3 to yield 8652 as our hash address within 0 to 9999.

39 C OMMON H ASHING F UNCTIONS ( CONT ’ D ) 5. Mid-Square The key is squared and the middle part of the result taken as the hash value. To map the key 3121 into a hash table of size 1000, we square it = and extract 406 as the hash value. Works well if the keys do not contain a lot of leading or trailing zeros. Non-integer keys have to be preprocessed to obtain corresponding integer values.

40 C OMMON H ASHING F UNCTIONS ( CONT ’ D ) 6. Use of a Random-Number Generator Given a seed as parameter, the method generates a random number. The algorithm must ensure that: It always generates the same random value for a given key. It is unlikely for two keys to yield the same random value. The random number produced can be transformed to produce a valid hash value.

41 S OME A PPLICATIONS OF H ASH T ABLES Database systems: Specifically, those that require efficient random access. Generally, database systems try to optimize between two types of access methods: sequential and random. Hash tables are an important part of efficient random access because they provide a way to locate data in a constant amount of time. Symbol tables: The tables used by compilers to maintain information about symbols from a program. Compilers access information about symbols frequently. Therefore, it is important that symbol tables be implemented very efficiently. Data dictionaries: Data structures that support adding, deleting, and searching for data. Although the operations of a hash table and a data dictionary are similar, other data structures may be used to implement data dictionaries. Using a hash table is particularly efficient. Network processing algorithms: Hash tables are fundamental components of several network processing algorithms and applications, including route lookup, packet classification, and network monitoring. Browser Cashes: Hash tables are used to implement browser cashes.

42 E ND OF LECTURE