Searching Chapter 2.

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hash Tables.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hashing Lesson Plan - 8.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Data Structures Using C++1 Chapter 9 Search Algorithms.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashed Files Text Versus Binary Meghan Cavanagh. Hashed Files a file that is searched using one of the hashing methods User gives the key, the function.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structures Using C++
Chapter 9 Hashing Dr. Youssef Harrath
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing. Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.
CS212: DATASTRUCTURES Lecture 3: Searching 1. Lecture Contents  searching  Sequential search algorithm.  Binary search algorithm. 2.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hash Tables. Group Members: Syed Husnain Bukhari SP10-BSCS-92 Ahmad Inam SP10-BSCS-06 M.Umair Sharif SP10-BSCS-38.
Data Structures Using C++ 2E
Hashing, Hash Function, Collision & Deletion
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Advanced Associative Structures
Hash Table.
Hash Table.
Data Structures Hashing 1.
CSCE 3110 Data Structures & Algorithm Analysis
CS202 - Fundamental Structures of Computer Science II
Cs212: DataStructures Lecture 3: Searching.
SORTING, SEARCHING AND HASHING TECHNIQUES
Collision Resolution.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Searching Chapter 2

Linear List Searches Hashed List Searches Outline Sequential Search The sentinel search, The probability search, The ordered search. Binary Search Hashed List Searches Collision Resolution

Linear List Searches We study searches that work with arrays. Figure 2-1

Linear List Searches There are two basic searches for arrays The sequential search. It can be used to locate an item in any array. The binary search. It requires an ordered list.

Linear List Searches Sequential Search The list is not ordered! We will use this technique only for small arrays. We start searching at the beginning of the list and continue until we find the target entity. Eighter we find it, or we reach the end of the list!

Locating data in unordered list. Figure 2-2

Linear List Searches Sequential Search Algorithm RETURN: The algorithm must be tell two things to calling algorithm; Did it find the data ? If it did, what is the index (address)?

Linear List Searches Sequential Search Algorithm The searching algorithm requires five parameters: The list. An index to the last element in the list. The target. The address where the found element’s index location is to be stored. The address where the found or not found boolean is to be stored.

Sequential Search Algorithm algorithm SeqSearch (val list <array>, val last <index>, val target <keyType>, ref locn <index>) Locate the target in an unordered list of size elements. PRE list must contain at least one element. last is index to last element in the list. target contains the data to be located. locn is address of index in calling algorithm. POST if found – matching index stored in locn & found TRUE if not found – last stored in locn & found FALSE RETURN found <boolean>

Sequential Search Algorithm looker = 1 loop (looker < last AND target not equal list(looker)) looker = looker + 1 locn = looker if (target equal list(looker)) found = true else found = false return found end SeqSearch Big-O(n)

Variations On Sequential Search There are three variations of sequential search algorithm: The sentinel search, The probability search, The ordered search.

Sequential Search Algorithm The Sentinel Search If the target will be found in the list, we can eliminate the test for the end of list. algorithm SentinelSearch (val list <array>, val last <index>, val target <keyType>, ref locn <index>) Locate the target in an unordered list of size elements. PRE list must contain element at the end for the sentinel. last is index to last element in the list. target contains the data to be located. locn is address of index in calling algorithm. POST if found – matching index stored in locn & found TRUE if not found – last stored in locn & found FALSE RETURN found <boolean>

Sequential Search Algorithm The Sentinel Search list[last+1] = target looker = 1 loop (target not equal list(looker)) looker = looker + 1 if (looker <= last) found = true locn = looker else found = false locn = last return found end SentinelSearch Big-O(n)

Sequential Search Algorithm The Probability Search algorithm ProbabilitySearch (val list <array>, val last <index>, val target <keyType>, ref locn <index>) Locate the target in a list ordered by the probability of each element being the target – most probable first, least probable last. PRE list must contain at least one element. last is index to last element in the list. target contains the data to be located. locn is address of index in calling algorithm. POST if found – matching index stored in locn & found TRUE and element moved up in priority. if not found – last stored in locn & found FALSE RETURN found <boolean>

Sequential Search Algorithm The Probability Search looker = 1 loop (looker < last AND target not equal list[looker]) looker = looker + 1 if (target = list[looker]) found = true if (looker > 1) temp = list[looker-1] list[looker-1] = list[looker] list[looker] = temp looker = looker - 1 else found = false locn = looker return found end ProbabilitySearch Big-O(n)

Sequential Search Algorithm The Ordered List Search If the list is small it can be more efficient to use a sequential search. We can stop search loop, when the target becomes less than or equal to the testing element of the list. algorithm OrderedListSearch (val list <array>, val last <index>, val target <keyType>, ref locn <index>) Locate the target in a list ordered on target. PRE list must contain at least one element. last is index to last element in the list. target contains the data to be located. locn is address of index in calling algorithm. POST if found – matching index stored in locn & found TRUE if not found – last stored in locn & found FALSE RETURN found <boolean>

Sequential Search Algorithm The Ordered List Search if (target <= list[last]) looker = 1 loop (target > list[looker]) looker = looker + 1 else looker = last if (target equal list[looker] found = true found = false locn = looker return found end OrderedListSearch Big-O(n)

Sequential Search The sequential search algorithm is very slow for the big lists. Big-O(n) If the list is ordered, we can use a more efficient algorithm called the binary search.

Binary Search . . . . If it is in the second half! Test the data in the element at the middle of the array. If it is in the first half! If it is in the second half! Test the data in the element at the middle of the array. Test the data in the element at the middle of the array. If it is in the first half! If it is in the second half! If it is in the first half! If it is in the second half! . . . .

mid=(first+last)/2 target > mid first = mid +1 target < mid last = mid -1 Figure 2-4

first becomes larger than last! Figure 2-5

Binary Search Algorithm algorithm BinarySearch(val list <array>, val last <index>, val target <keyType>, ref locn <index>) Search an ordered list using binary search. PRE list is ordered:it must contain at least one element. last is index to the largest element in the list. target is the value of element being sought. locn is address of index in calling algorithm. POST Found : locn assigned index to target element. found set true. Not found: locn = element below or above target. found set false. RETURN found <boolean>

Binary Search Algorithm first = 1 last = end loop (first <= last) mid = (first + last)/2 if (target > list[mid]) first = mid + 1 (Look in upper half). else if (target < list[mid] last = mid – 1 (Look it lower halt). else first = last + 1 (Found equal: force exit) locn = mid if (target equal list[mid]) found = true found = false Return end BinarySearch Big-O(log2n)

Comparison of binary and sequential searches Size Binary Sequential (Average) Sequential (Worst case) 16 4 8 50 6 25 256 128 1.000 10 500 10.000 14 5.000 100.000 17 50.000 1.000.000 20 500.000

Hashed List Searches In an ideal search, we would know exactly where the data are and go directly there. We use a hashing algorithm to transform the key into the index of array, that contains the data we need to locate.

It is a key-to-address transformation! Figure 2-6

We call set of keys that hash to the same location in our list synonymns. A collision is the event that occurs when a hashing algorithm produces an address for an insertion key and that address is already occupied. Each calculation of an address and test for success is known as a probe. Figure 2-7

Hashing Methods Figure 2-8

Direct Hashing Method The key is the address without any algorithmic manipulation. The data structure must contain an element for every possible key. It quarantees that there are no synonyms. We can use direct hashing very limited!

Direct Hashing Method Direct hashing of employee numbers. Figure 2-9

Subtraction Hashing Method The keys are consecutive and do not start from one. Example: A company have 100 employees, Employee numbers start from 1000 to 1100. Ali Esin 1 2 Sema Metin x=1001 1 2 x=1002 x – 1000 x=1100 100 99 100 Filiz Yılmaz

Modulo Division Hashing Method The modulo-division method divides the key by the array size and uses remainder plus one for the address. address = key mod (listSize) + 1 If a list size selected a prime number, that produces fewer collisions than other list sizes.

Modulo Division Hashing Method 121267 / 307 = 395 and remainder = 2 hash(121267)= 2 +1 = 3 We have 300 employees, and the first prime greater that 300 is 307!. Figure 2-10

Digit Extraction Method Selected digits are extracted from the key and used as the address. Example: 379452  394 121267  112 378845  388 526842  568

Midsquare Hashing Method The key is squared and the address selected from the middle of the squared number. The most obvious limitation of this method is the size of the key. Example: 9452 * 9452 = 89340304  3403 is the address. Or 379452  379 * 379 = 143641  364

Folding Hashing Method Figure 2-11

Pseudorandom Hashing Method The key is used as the seed in a pseudorandom number generator and resulting random number then scaled in to a possiple address range using modulo division. Use a function such as: y = (ax + b (mod m))+1 x is the key value, a is coefficient, b is a constant. m is the count of the element in the list. y is the address.

Pseudorandom Hashing Method y = (ax + b (mod m)) + 1  y = (17x + 7 (mod 307)) + 1 x = 121267 is the key value, a = 17 b = 7 m =307 y = ((( 17 * 121267) + 7) mod 307) + 1 y = ((2061539 +7) mod 307) + 1 y = 2061546 mod 307 + 1 y = 41 + 1 y = 42

Rotation Hashing Method Rotation is often used in combination with folding and psuedorandom hashing. Figure 2-12

Collision Resolution Methods All above methods of handling collision are independent of the hashing algorithm. Figure 2-13

Collision Resolution Concepts “Load Factor” We define a full list, as a list in which all elements except one contain data. Rule: A hashed list should not be allowed to become more than %75 full! the number of filled elements in the list Load Factor = ------------------------------------------------------ x 100 total number of elements in the list k α = --------- x 100 the number of elements n

Collision Resolution Concepts “Clustering” Some hashing algorithms tend to couse data to group within the list. This is known as clustering. Clustering is created by collision. If the list contains a high degree of clustering, then the number of probes to locate an element grows and the processing efficiency of the list is reduced.

Collision Resolution Concepts “Clustering” Clustering types are: Primary clustering; clustering around a home address in our list. Secondary clustering; the data are widely distributed across the whole list so that the list appears to be well distributed, however, the time to locate a requested element of data can become large.

Collision Resolution Methods Open Addressing When a collision occurs, the home area addresses are searched for an open or unoccupied element where the new data can be placed. We have four different method: Linear probe, Quadratic probe, Double hashing, Key offset.

Open Addressing “Linear Probe” When data cannot be stored in the home address, we resolve the collision by adding one to the current address. Advantage: Simple implementation! Data tend to remain near their home address. Disadvantages: It tends to produce primary clustering. The search algorithm may become more complex especially after data have been deleted! .

Open Addressing “Linear Probe” 15532 / 307 = 50 and remainder = 2 hash(15532)= 2 +1 = 3 New address = 3+1 =4

Open Addressing “Linear Probe” Figure 2-14

Open Addressing “Quadratic Probe” Clustering can be eliminated by adding a value other than one to the current address. The increment is the collision probe number squared. For the first probe 12 For the second probe 22 For the third collision probe 32 ... Until we eighter find an empty element or we exhoust the possible elements. We use the modulo of the quadratic sum for the new address.

Open Addressing “Quadratic Probe” Increase by two Fore each probe! Probe Number Collision Location Probe*Probe & Increment New Address Factor Next 1 1*1=1 2 2*2=4 6 3 4 3*3=9 15 5 9 4*4=16 31 7 16 5*5=25 56 25 6*6=36 92 11 36 7*7=49 41 13 49 + + +

Open Addressing – Double Hashing “Pseudorandom Collision Resolution” In this methot, rather than using an arithmetic probe functions, the address is rehashed. y = ((ax + c) mod listSize) +1 y = ((3.2 +(-1) mod 307) +1 y = 6 Figure 2-15

Open Addressing – Double Hashing “Key Offset Collision Resolution” Key offset is another double hashing method and, produces different collision paths for different keys. Key offset calculates the new address as a function of the old address and the key.

Open Addressing – Double Hashing “Key Offset Collision Resolution” offSet = [key / listSize] address = ((offSet + old address) mod listSize) + 1 offSet = [166702 / 307] = 543 1. Probe : address = ((543 + 2) mod 307) + 1 = 239 2. Probe : address = ((543 + 239) mod 307) + 1 = 169 Key Home Address Key Offset Probe 1 Probe 2 166702 2 543 239 169 572556 1865 26 50 67234 219 222 135

Collision Resolution Open Addressing Resolution A major disadvantage to open addressing is that each collision resolution increases the probability of future collisions!

Collision Resolution Linked List Resolution Link head pointer. A link list is an ordered collection of data in which each element contains the location of the next element. Figure 2-16

Collision Resolution Bucket Hashing Resolution Figure 2-17

Load your HW-2 to FTP site until 15 Mar. 06 at 17:00. Create an array which includes the random integer 100 numbers between 0 and 150. This should be an unordered list. Use Linear sentinel search algorithm and find the target value in the array. Use the Probability search algorithm and find the target value in the array. Create an ordered list which includes the 100 numbers between 0 and 150. Use ordered list search algorithm and find the target value in the array. Use binary search algorithm and find the target value in the array. Load your HW-2 to FTP site until 15 Mar. 06 at 17:00.

Hw #2 Run the each search algorithm 10 times and report these performance values for each of them. Write your comments about the result table.   Sentinel Search Probability Ordered Binary Number of Completed Searches Number of Successful Searches Avarage number of tests per search