Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees.

Slides:



Advertisements
Similar presentations
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Advertisements

CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing: Collision Resolution Schemes
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Tirgul 9 Hash Tables (continued) Reminder Examples.
1 Hashing: Collision Resolution Schemes Collision Resolution Techniques Introduction to Separate Chaining Collision Resolution using Separate Chaining.
Tirgul 8 Hash Tables (continued) Reminder Examples.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Cpt S 223 – Advanced Data Structures Course Review Midterm Exam # 2
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hashing: Collision Resolution Schemes
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Brought to you by Max (ICQ: TEL: ) February 5, 2005 Advanced Data Structures Introduction.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing: Collision Resolution Schemes
Hashing Alexandra Stefan.
Hashing Exercises.
Advanced Associative Structures
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Hashing: Collision Resolution Schemes
Lecture-Hashing.
Presentation transcript:

Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees

HASHING

Hash Tables All search structures so far Relied on a comparison operation Performance O(n) or O( log n) Assume I have a function f ( key )  integer ie one that maps a key to an integer What performance might I expect now?

Hash Tables - Keys are integers Need a hash function h( key )  integer ie one that maps a key to an integer Applying this function to the key produces an address If h maps each key to a unique integer in the range 0.. m-1 then search is O(1)

Hash Tables - Hash functions Form of the hash function Example - using an n -character key int hash( char *s, int n ) { int sum = 0; while( n-- ) sum = sum + *s++; return sum % 256; } returns a value in xor function is also commonly used sum = sum ^ *s++; Example hash( “AB”, 2 ) and hash( “BA”, 2 ) return the same value! This is called a collision A variety of techniques are used for resolving collisions

6 Hashing: Collision Resolution Schemes Collision Resolution Techniques Separate Chaining Separate Chaining with String Keys Separate Chaining versus Open-addressing Implementation of Separate Chaining Introduction to Collision Resolution using Open Addressing Linear Probing

7 Collision Resolution Techniques There are two broad ways of collision resolution: 1. Separate Chaining:: An array of linked list implementation. 2. Open Addressing: Array-based implementation. (i) Linear probing (linear search) (ii) Quadratic probing (nonlinear search) (iii) Double hashing (uses two hash functions)

8 Separate Chaining The hash table is implemented as an array of linked lists. Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i. identicals are chained in the same linked list.

9 Separate Chaining (cont’d) Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i. Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i. Example: Load the keys 23, 13, 21, 14, 7, 8, and 15, in this order, in a hash table of size 7 using separate chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision

10 Separate Chaining with String Keys Recall that search keys can be numbers, strings or some other object. A hash function for a string s = c0c1c2…cn-1 can be defined as: hash = (c 0 + c 1 + c 2 + … + c n-1 ) % tableSize this can be implemented as: Example: The following class describes commodity items: public static int hash(String key, int tableSize){ int hashValue = 0; for (int i = 0; i < key.length(); i++){ hashValue += key.charAt(i); } return hashValue % tableSize; } class CommodityItem { String name;// commodity name int quantity;// commodity quantity needed double price;// commodity price }

Separate Chaining with String Keys (cont’d) Use the hash function hash to load the following commodity items into a hash table of size 13 using separate chaining: onion tomato cabbage carrot okra mellon potato Banana olive salt cucumber mushroom orange Solution: hash(onion) = ( ) % 13 = 547 % 13 = 1 hash(salt) = ( ) % 13 = 436 % 13 = 7 hash(orange) = ( )%13 = 636 %13 = 12

12 Separate Chaining with String Keys (cont’d) onion okra mellon banana tomatoolive cucumber mushroom salt cabbage carrot potato orange ItemQtyPrice h(key) onion tomato cabbage carrot okra mellon potato Banana olive salt cucumber mushroom orange

13 Separate Chaining versus Open-addressing Organization Advantages Disadvantages Chaining Unlimited number of elements Unlimited number of collisions Overhead of multiple linked lists

14 Introduction to Open Addressing All items are stored in the hash table itself. In addition to the cell data (if any), each cell keeps one of the three states: EMPTY, OCCUPIED, DELETED. While inserting, if a collision occurs, alternative cells are tried until an empty cell is found. Deletion: (lazy deletion): When a key is deleted the slot is marked as DELETED rather than EMPTY otherwise subsequent searches that hash at the deleted cell will be unsuccessful. Probe sequence: A probe sequence is the sequence of array indexes that is followed in searching for an empty cell during an insertion, or in searching for a key during find or delete operations. The most common probe sequences are of the form: h i (key) = [h(key) + c(i)] % n, for i = 0, 1, …, n-1. where h is a hash function and n is the size of the hash table The function c(i) is required to have the following two properties: Property 1: c(0) = 0 Property 2: The set of values {c(0) % n, c(1) % n, c(2) % n,..., c(n-1) % n} must be a permutation of {0, 1, 2,..., n – 1}, that is, it must contain every integer between 0 and n - 1 inclusive.

15 Introduction to Open Addressing (cont’d) The function c(i) is used to resolve collisions. To insert item r, we examine array location h 0 (r) = h(r). If there is a collision, array locations h 1 (r), h 2 (r),..., h n-1 (r) are examined until an empty slot is found. Similarly, to find item r, we examine the same sequence of locations in the same order. Note: For a given hash function h(key), the only difference in the open addressing collision resolution techniques (linear probing, quadratic probing and double hashing) is in the definition of the function c(i). Common definitions of c(i) are: Collision resolution techniquec(i) Linear probingi Quadratic probing±i 2 Double hashingi*h p (key) where h p (key) is another hash function.

16 Introduction to Open Addressing (cont'd) Advantages of Open addressing: All items are stored in the hash table itself. There is no need for another data structure(MEANS NO LINKLIST). Open addressing is more efficient storage-wise. Disadvantages of Open Addressing: The keys of the objects to be hashed must be distinct. Dependent on choosing a proper table size. Requires the use of a three-state (Occupied, Empty, or Deleted) flag in each cell.

Open Addressing Facts In general, the best table size is most important. With any open addressing method of collision resolution, as the table fills, there can be a severe degradation in the table performance. Hashing has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. i.e When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method. As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. The load factor of the table is m/N, where m is the number of distinct indexes used in the table or is the number of records currently in the table. and N is the size of the array used to implement it. Load factors between 0.6 and 0.7 are common. Load factors > 0.7 are undesirable.

18 Open Addressing : Linear Probing (cont’d) Example: Perform the operations given below, in the given order, on an initially empty hash table of size 13 using linear probing with c(i) = i and the hash function: h(key) = key % 13: insert(18), insert(26), insert(35), insert(9), find(15), find(48), delete(35), delete(40), find(9), insert(64), insert(47), find(35) The required probe sequences are given by: h i (key) = (h(key) + i) % 13 i = 0, 1, 2,..., 12

19 a IndexStatusValue 0O26 1E 2E 3E 4E 5O18 6E 7E 8O47 9D35 10O9 11E 12O64 Linear Probing (cont’d)

20 Disadvantage of Linear Probing: Primary Clustering Linear probing is subject to a primary clustering phenomenon. Elements tend to cluster around table locations that they originally hash to. Primary clusters can combine to form larger clusters. This leads to long search sequences and hence deterioration in hash table efficiency. Example of a primary cluster: Insert keys: 18, 41, 22, 44, 59, 32, 31, 73, in this order, in an originally empty hash table of size 13, using the hash function h(key) = key % 13 and c(i) = i: h(18) = 5 h(41) = 2 h(22) = 9 h(44) = 5+1 h(59) = 7 h(32) = h(31) = h(73) =

HEAPS

Heaps A heap is a special kind of rooted tree that can be implemented efficiently in an array without any explicit pointers. It can be used for heap sort and the efficient representation of certain dynamic priority lists, such as the event list in a simulation or the list of tasks to be scheduled by an operating system. A heap is an essentially complete binary tree

Heaps Figure illustrates an essentially complete binary tree containing 10 nodes. The five internal nodes occupy level 3 (the root), level 2, and the left side of level 1; the five leaves fill the right side of level 1 and then continue at the left of level 0. If an essentially complete binary tree has height k, then there is one node (the root) on level k, there are two nodes on level k-1 and so on; there are 2 k-1 nodes on level 1, and at least 1 and not more than 2 k on level 0. A heap is an essentially complete binary tree, each of whose nodes includes an element of information called the value of the node, and which has the property that the value of each internal node is greater than or equal to the values of its children.

An essentially complete binary tree T[1] T[3] T[2] T[4] T[8]T[9] T[5] T[10] T[6] T[7]

A heap Figure shows an example of a heap with 10 nodes.

Heaps Now we have marked each node with its value. This same heap can be represented by the following array The crucial characteristic of this data structure is that the heap property can be restored efficiently if the value of a node is modified. If the value of a node increases to the extent that it becomes greater than the value of its parent, it should be sufficient to exchange these two values, and then to continue the same process upwards in the tree if necessary until the heap property is restored. The modified value is percolated up to its new position in the heap This operation is often called sifting up If the value 1 in Figure is modified so that it becomes 8, we can restore the heap property by exchanging the 8 with its parent 4, and then exchanging it again with its new parent 7.

The heap, after percolating 8 to its place

Heaps If on the contrary the value of a node is decreased so that it becomes less than the value of at least one of its children, it suffices to exchange the modified value with the larger of the values in the children, and then to continue this process downwards in the tree if necessary until the heap property is restored. The modified value has been sifted down to its new position. 9 The heap, after sifting 3 (originally 10) down to its place

Heaps The following procedures describe more formally the basic processes for manipulating a heap. Procedure alter-heap (T[1..n], i, v) {T[1..n] is a heap. The value of T[i] is set to v and the heap property is re-established. Suppose that 1≤ i ≤ n.} x ← T[i] T[i] ← v if v < x then sift-down(T,i) else percolate (T,i)

Procedure sift-down (T[1…n], i) {This procedure sifts node i down so as to re-establish the heap property in T[1..n]. Suppose that T would be a heap if T[i] were sufficiently large and that 1≤ i ≤ n.} k ← i repeat j ← k {find the larger child of node j} if 2j ≤ n and T[2j]> T[k] then k ← 2j if 2j T[k] then k ← 2j+1 exchange T[j] and T[k] {if j=k, then the node has arrived at its final position} until j=k

Procedure percolate (T[1…n], i) {This procedure percolate node i so as to re-establish the heap property in T[1..n]. Suppose that T would be a heap if T[i] were sufficiently small and that 1≤ i ≤ n. The parameter n is not used here} k ← i repeat j ← k if j > 1 and T[j ÷ 2]< T[k] then k ← j ÷ 2 exchange T[j] and T[k] {if j=k, then the node has arrived at its final position} until j=k

Heaps Heap is an ideal data structure for finding the largest element of a set, removing it, adding a new node, or modifying a node. These are exactly the operations we need to implement dynamic priority lists efficiently. The value of a node gives the priority of the corresponding event, the event with highest priority is always found at the root of the heap, and the priority of an event can be changed dynamically at any time. This is particularly useful in computer simulations and in the design of schedulers for an operating system. Some typical procedures are illustrated below.

function find-max (T[1..n]) {Returns the larges element of the heap T[1..n]} return T[1] Procedure delete-max (T[1…n]) {Removes the largest element of the heap T[1..n] and restores the heap property in T[1..n - 1]} T[1] ← T[n] sift-down( T[1..n - 1], 1)

Procedure insert node (T[1…n], v) {Adds an element whose value is v to the heap T[1..n] and restores the heap property in T[1..n + 1]} T[n+1] ← v percolate(T[1..n + 1], n+1)

Heaps There exists a cleverer algorithm for making a heap. Suppose, for example, that our starting point is the following array represented by the tree in Figure The starting situation

Heaps We first make each of the subtrees whose roots are at level 1 into a heap, this is done by sifting down these tools, as illustrated in Figure The level 1 subtrees are made into heaps

Heaps This figure shows the process for the left subtree. The other subtree at level 2 is already a heap. This results in an essentially complete binary tree corresponding to the array One level 2 subtree is made into a heap (the other already is a heap)

It only remains to sift down its root to obtain the desired heap. This process thus goes as follows:

Construct the heap using the array A=(16, 4, 10, 14, 7, 9, 3, 2, 8, 1) Maintaining heap property 2 The initial configuration How to Sort Heap

A= for i←n down 2 do exchange T(1) and T(i) 4 Sift-down(T[..i-1],1) Make-heap (T) How to Sort Heap

i = 10, exchange T[1] & T[10] and sift-down (T[1..9],1) i = 9, exchange T[1] & T[9] and sift-down (T[1..8],1) i = 8, exchange T[1] & T[8] and sift-down (T[1..7],1) i = 7, exchange T[1] & T[7] and sift-down (T[1..6],1) i = 6, exchange T[1] & T[6] and sift-down (T[1..5],1) i = 5, exchange T[1] & T[5] and sift-down (T[1..4],1)

i = 4, exchange T[1] & T[4] and sift-down (T[1..3],1) i = 3, exchange T[1] & T[3] and sift-down (T[1..2],1) i = 2, exchange T[1] & T[2] and sift-down (T[1..1],1) End of Sorting

Sorted heap

B0B0 B1B1 B2B2 B3B3 B4B4 Binomial trees B 0 to B 4 Binomial trees

max A binomial heap containing 11 items Parent node greater than child node: Max binomial heap

Linking two B 2 ’s to make a B 3

merged with 6 yields Merging two binomial heaps

head [H 1 ] 12 head [H 2 ] BINOMIAL-HEAP MERGE Note: Check Heap type max /min before start

head [H]

head [H]

head [H] The node with value 1 to be deleted (b) head [H] Separated into two heaps

head [H] head [H] Node with value 1 has been deleted, two heaps H & H head [H] Merging heaps H & H

head [H] y7y7 42 Node y value decreased from 26 to 7 head [H]

head [H]

head [H]