1 HashingHashing Alan, Tam Siu Lung 96397999 99967891.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Spring 2015 Lecture 6: Hash Tables
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Dr. Yingwu Zhu.
Hash Tables1   © 2010 Goodrich, Tamassia.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Ch Hash Tables Array or linked list Binary search trees
Ch. 13 Hash Tables  .
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Instructor: Dr. Michael Geiger Spring 2017 Lecture 33: Hash tables
CSE 373: Data Structures and Algorithms
Presentation transcript:

1 HashingHashing Alan, Tam Siu Lung

2 Prerequisites List ADT –Linked List Table ADT –Array Mathematics –Modular Arithmetic Computer Organization –ASCII Algorithm –Order Analysis

3 Basic Data Types Pascal TypeStorageOperations WordA positive integer+, -, *, div, mod DoubleA real number +, -, *, /, int, frac Array[1..12] of Boolean A sequence of 12 bits y := a[] (get), a[] := y (set)

4 Abstract Data Types (ADT) Stack –Can add and remove in LIFO order Queue –Can add and remove in FIFO order Priority Queue –Can add. Can remove in larger first order. v is comparable.

5 Data Structure An ADT, implemented by a Data Type E.g. –ArrayList, using an array to implement a List ADT –ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)

6 Dictionary ADT Add(k, v) –Add a key-value pair Remove(k) –Remove a key-value pair given the key Search(k) : v –Search for the value given the key A Table ADT only differs in that key is an integer in range.

7 Direct Addressing Use the Table ADT The key is the location Efficient: O(1) for all operations Infeasible: if the key can range from 1 to , if the key is not numeric... 0Ant  5Boy           99Car

8 Time Complexity Average CaseAddRemoveSearch ArrayO(1)O(n) Sorted ArrayO(n)O(lg n) Linked ListO(1)O(n) BSTO(lg n) Hash Table~O(1) Note: For sorted array and BST, keys have to be ordered.

9 Hash Function Hash Function: h m (k) Map all keys into an integer domain, e.g. 0 to m - 1 E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 2 32 ) –Alan: –Max: –Man: –On: Note: We won ’ t use such a big m in our programs!

10 Hash Table Use a Table ADT of size m Use h(k) as the key All operations can be done like using Table Solved except –Collision: What to do if two different k have same h(k) –How to find a suitable hash function

11 Hash Functions If k is an integer, use h(k) = k mod m More advanced: floor(m*frac(k*A)) for some 0 < A < 1 If k is a string, convert it to an integer, e.g. h( ‘ Alan ’ ) = [ASC( ‘ A ’ )* ASC( ‘ l ’ )* ASC( ‘ a ’ )*256+ASC( ‘ n ’ )] mod m If k is other data type, try to combine all features of the type

12 Chaining (a.k.a. Open Hashing) Use Table > instead When there are multiple k ’ s with same h(k), add it to the list (usually linked list) When searching, remove it from the list Order: O(length of all lists)

13 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD

14 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man AlanD MaxZ

15 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man ManX AlanD MaxZ

16 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

17 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

18 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

19 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY ManX AlanD MaxZ

20 Chaining Samples 0  5           99 h( ‘ Alan ’ ) = h( ‘ Man ’ ) = h( ‘ On ’ ) = 0, h( ‘ Max ’ ) = 5 Operations: Add Search for Max Remove Man OnY AlanD MaxZ

21 Chaining (Optional) Note that the Table can be Table > for any Container supporting Add, Remove and Search. Why not consider other things, say another hash table? A BST?

22 Open Addressing (a.k.a. Closed Hashing) During collission, find another slot for the entry E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc Define the probe sequence be the sequence to slots to try (it should be a permutation of Then both add and search will try the same sequence, so finally must find the pair before an empty slot is reached How about delete? Search and mark it empty? Order: O(length of probe sequence)

23 Open Addressing Samples 0AlanD 1Nil  99Nil Add Max Add Man 0AlanD 1Nil MaxZ  99Nil 0AlanD 1ManX 2Nil 3 4 5MaxZ  99Nil

24 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil Search for Max Add Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil

25 Open Addressing Samples 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil Search for Max Delete Man 0AlanD 1ManX 2OnY 3Nil 4 5MaxZ  99Nil 0AlanD 1DelX 2OnY 3Nil 4 5MaxZ  99Nil

26 Collision Resolution The method outlined above is called linear probing –In general, h(k, i) = h(k) + c i –Forms Primary Clustering There is also quadratic probing –In general, h(k, i) = h(k) + c 1 i 2 + c 2 i –Still forms Secondary Clustering

27 Double Hashing (Optional) h(k, i) = ( h(k) + i h ’ (k) ) mod m Note: h ’ (k) cannot be 0 Meaningful h ’ (k) should be in [1, m) E.g. m – k mod (m – 1)

28 How good is Hashing? Nearly constant time if very short list or very low probing rate So we need –A uniform hash function (your job) –A larger hash table (trade it off with memory limit)

29 Size too small? (Optional) Create a new hash table and re-hash all entries (not useful for OI use) If use open addressing, need to re- hash to remove the deleted items anyway

30 Extensible Hashing (Optional) Use Table (Ptr is like the list in chaining) The size m = 2 k Given any uniform hash function h(k), g(k) = last k bits of h(k) Ptr points to an array of size r, each storing an entry The problem: what to do when the array is full

31 Extensible Hashing (Optional) AlanManOn Ben Max h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5

32 Extensible Hashing (Optional) AlanManOn BenSi Max Add Si where h( ‘ Si ’ ) = 9, i.e. g( ‘ Si ’ ) = 01

33 Extensible Hashing (Optional) AlanOn BenSi Max Add Unu where h( ‘ Unu ’ ) = 4, i.e. g( ‘ Unu ’ ) = 100 The first array will be split according to their h(k) Still need to chain? ManUnu