Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hash Tables.
Hashing.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Dictionaries and Their Implementations
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
Hash Table March COP 3502, UCF.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Chapter 5 Record Storage and Primary File Organizations
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing CSE 2011 Winter July 2018.
Advanced Associative Structures
Hash Table.
Hash Tables.
Dictionaries and Their Implementations
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Lecture-Hashing.
Presentation transcript:

Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA

Basic search terminologies A table or file is a group of elements, called a record. Each record is associated with key to differentiate the record. Key could be internal or embedded with table. In other case could be external (outside of table). For every record, the unique key is called a primary key. A table of records in which a key is used for retrieval is often called a search table or dictionary.

Basic search terminologies A table is organized with specific search technique. The table may be contained completely in memory, completely in auxiliary storage or it may be divided between the two. In this course we will concentrate on internal search (completely in memory).

Dictionary as an Abstract Data Type A dictionary is an ordered or unordered list of key- element pairs, where keys are used to locate elements in the list. Example: ◦ consider a data structure that stores Student information; it can be viewed as a dictionary, where Student ID serve as keys for identification of Student objects. Operation on Dictionary ◦ Size(), empty(), finditem(key), removeitem(), insertitem(key, element) Note Java has a built-in abstract class java.util.Dictionary

Search Techniques for Dictionary ADT Sequential search Indexed sequential search (book handout) Binary Search Hashing Trees searching ◦ Binary search trees ◦ Multiway search trees ◦ B-trees ◦ AVL trees ◦ Red black trees

Sequential Search Simplest form of search algorithm for dictionary, implemented either in array or linklist. The algorithm examine each key; upon finding one that matches the search argument, its index or value is returned. In worst case, number of comparison are O(n).

Indexed Sequential Search This technique increase the search efficiency for a sorted file but need more space to store indexes. These indexes can be applicable to both arrays and linklist.

Binary Search Most efficient search technique. Use sorted array. Cannot implement using linklist The worst case complexity is O(logn)

Hashing Hashing is a method for directly referencing an element in a table by performing arithmetic transformations on keys into table addresses. This is carried out in two steps: Step 1: Computing the so-called hash function H: K -> A. Step 2: Collision resolution, which handles cases where two or more different keys hash to the same table address.

Hashing Hashing is function that maps each key to a location in memory. Minimize the number of comparison to insert a data. A key’s location does not depend on other elements, and does not change after insertion.  unlike a sorted list A good hash function should be easy to compute, minimize hash collision, and uniformly distributed records in table. With such a hash function, the dictionary operations can be implemented in O(1) time.

Hashing Suppose the company has100 inventory record then two digit indices of array will be sufficient. But if the 100 inventory with 5-digit of code then we have to use 5-digit indices of array but store only 100 records will waste the memory. Map key values to hash table addresses keys -> hash table address This applies to find, insert, and remove Usually: integers -> {0, 1, 2, …, Hsize-1} Typical example: f(n) = n mod Hsize Non-numeric keys converted to numbers ◦ For example, strings converted to numbers as  Sum of ASCII values  First three characters

Hash collision When h(k1)=h(k2) then two values cannot be place on one location this is called Hash Collision or hash clash Two method to deal with clashes. ◦ Rehashing ( use secondary hash function that contunue until new location found) ◦ Chaining ( build a link list of all items with same hash key)

Resolve hash clashes using Open Address(Rehashing) The simplest method of resolving clashes is to place the record in the next available position in the array which is still open. This technique of rehashing is called linear probing. Rehash function ◦ rh=(h(k) +1 )%m [cover all indices] ◦ rh=(h(k) +2 )%m [cover only even indices] Loop executes forever to find location if: ◦ Hash table is full, therefore maintain the count.

Resolve hash clashes using Open Address kH(k) A532 B417 C916 D757 E13 F66 G437 H6716 I883 J362 K A 3I 4J 5 6C 7B 8D 9F 10K E H H(k) = k mod 17 Index between 0 and 16

Hashing Good hash function is one minimize the collision Hashing allow direct access therefore preferable to other search technique. It is more efficient to initialized hash table with extra spaces to reduce rehash many times.

Resolve hash clashes by open addressing Store all elements in table If a cell is occupied, try another cell. Linear probing, try cells ◦ H(k), H(k) + 1 mod m, H(k) + 2 mod m,.. Rh(i)=(i+2)%17 used to rehash function. Then any key that hashes into an even integer rehashes into successive even integers even then list is empty in odd locations and same with odd integers.

Primary Clustering Primary Clustering The phenomena, where two keys that hash into different values compete with each other in successive rehashes, is called primary clustering. Example: any record whose new coming key hashes into 4,5,6,7 will be placed in 7, where as only record whose key hashes into 7 will be placed in that location

Primary Clustering To eliminate the primary clustering is to allow the rehash to depend on the number of time hash function is applied: rh(i,j), rehash value I if the key is being rehashed for the j time. E.g. rh(i,j)=(i+j)%tablesize Another approach is to use random number between tablesize to ensure that no two rehashes for same key conflict. However these approaches eliminate primary clustering but not eliminate secondary clustering.

Secondary clustering The secondary clustering is a phenomena in which different keys that hash to the same value follow the same rehash path. To eliminate all clustering, use another phenomena called double hashing.

Double hashing Involve the use of two hash function h1(key) and h2(key) h1 is the primary key function then If the position is occupied then rh(i,key)=(i+h2(key))%tablesize until empty loc is found. The rehash function use original key with hash value As long as h2(key1) does not equal h2(key2), records do not compete for the same set of locations.

Analysis The efficiency of hashing method is measured by the average number of table position that must be examined in searching for a particular item. Let n be the number of item currently in hash table and table size be the number of positions in table then for large table it has been proved that average number of probes required for successful search.

Analysis The load factor x is defined as the ratio between occupied slots and table size. x=(n-1)/tablesize If the fraction of location that is occupied is equal to the load factor x, we can approximate the number of probe (checks) for successful search under linear rehashing. When the load factor is reaches to the tablesize (table is 80% full) then create a hashtable of double size to minimize rehashing.

Analysis Worst-case: All keys hash to the same bucket. Insert takes O(1), but delete and find take O(N). In linear probing max Load factor <=1 Performance of the hash tables, based on open addressing scheme is very sensitive to the table's load factor. If load factor exceeds 0.7 threshold, table's speed drastically degrades. Indeed, length of probe sequence is proportional to (loadFactor) / (1 - loadFactor) value. In extreme case, when loadFactor approaches 1, length of the sequence approaches infinity. In practice it means, that there are no more free slots in the table and algorithm will never find place to insert a new element. Hence, this kind of hash tables should support dynamic resizing in order to be efficient.

Chaining One simple scheme is to chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of collisions to be handled and doesn't require a priori knowledge of how many elements are contained in the collection. The trade off is the same as with linked lists versus array implementations of collections

Chaining In worst case, all keys map to same index. Good hash function should uniformly distributes keys in all indices. Load factor x=n/tablesize. Where n is number of slots occupied. If x is load factor then O(1+x), where x proportion of comparison need to search in linklist or array list which is refer by an index.

Analysis In chaining load factor n/m Worst-case: All keys hash to the same bucket. Insert takes O(1), but delete and find take O(N). Chaining Average Case: ◦ If keys are uniformly distributed, then all buckets are about the same size. ◦ Each linked list has expected size O(N/M). Thus, insert is still O(1), but delete and find become O(N/M). N/M is called the load factor.

Chaining Vs Open addressing ChainingOpen addressing Collision resolutionUsing external data structureUsing hash table itself Memory waste Pointer size overhead per entry (storing list heads in the table) No overhead 1 Performance dependence on table's load factor Directly proportional Proportional to (loadFactor) / (1 - loadFactor) Allow to store more items, than hash table size Yes No. Moreover, it's recommended to keep table's load factor below 0.7 Hash function requirementsUniform distribution Uniform distribution, should avoid clustering Handle removalsRemovals are ok Removals clog the hash table with "DELETED" entries ImplementationSimple Correct implementation of open addressing based hash table is quite tricky

Hash Maps A map data structure stores (key, value) pairs. Values are retrieved by searching for the appropriate key. Maps are also sometimes called dictionaries or associative arrays. Maps can be implemented in a number of ways. One of the most common ways is to store the (key, value) pairs in a hash table. This is called a hash map.