Searching Tables Table: sequence of (key,information) pairs

Searching Tables Table: sequence of (key,information) pairs
(key,information) pair is a record key uniquely identifies information, so no duplicate records Sometimes the key is the whole record Searching a table Given a key k and a table T = (k1,i1),…, (kn,in), find the pair (kj,ij) in T such that k=kj (if it exists) Possible approaches: Sequential search: simple, but only effective for small tables Binary search: fast, but table must be sorted Hashing COSC 2P03 Week 11

Hash Tables Idea: use a function h such that for every possible key k, h(k) = index of record with key k: O(1) search time Hash Function: maps keys to addresses Build the table using the hash function: if h(k)=1 then put record (k,i) in cell 1 of table (etc) Tables are generally sparse Hash functions should ideally be: Easy to compute, and Ensure different keys are always mapped to different cells Perfect hash functions are not always possible COSC 2P03 Week 11

Hashing and Collisions
Collision: the effect of more than one key being mapped to the same cell Given kx ≠ ky, we have f(kx) = f(ky) Ideally collisions would never happen (this is not realistic) Approaches to dealing with collisions: Allow >1 record to be stored in each table index Buckets: each index is a fixed-size bucket of records Separate chaining: each index has a linked list of records Open addressing: allow only 1 record at each index When a collision occurs, use a collision resolution policy to find a new index for the item, e.g. linear probing etc. COSC 2P03 Week 11

Hash tables with buckets
Generally used for storing files on disk Each index has a bucket of fixed size (block) Within the bucket, records are in unsorted order To insert record (k,i): Apply hash function h(k) to determine in which bucket (k,i) belongs and add to next empty space in bucket To search for record (k,i): Compute h(k) and read corresponding block from disk Perform linear search of block to find record with key k COSC 2P03 Week 11

Hash tables – separate chaining
Each cell of the table is a separate linked list To insert record (k,i): Apply hash function h(k) to determine in which linked list the record belongs and add to front of list To search for record (k,i): Compute h(k) and access corresponding linked list Perform linear search of linked list to find record with key k COSC 2P03 Week 11

Hash table with Open Addressing findPos – Linear Probing
If a record is hashed to index j, which is already occupied, then look in index j+1, j+2, … and put the record in the next available index (each attempt is called a probe) int findPos(int k) // search for index that should store // record with key k { current = hash(k, tableSize); while(array[current] != null && array[current].record.key != k) current = (current+1) % tableSize; } return current; COSC 2P03 Week 11

Hash Table search int find(int k) // search for record with key k {
current = findPos(k); if(isActive(current)) return current; else return -1; // not found } COSC 2P03 Week 11

Hash Table – insertion void insert(record R)
// insert R if not already in table { current = findPos(R.key); if(isActive(current)) // already in hash table return; else array[current].record = R; array[current].isActive = true; } COSC 2P03 Week 11

Hash table – deletion void remove(int k) // delete record with key k {
current = findPos(R.key); if(isActive(currentPos)) array[current].isActive = false; // lazy deletion } COSC 2P03 Week 11

Open addressing – collision resolution policies
Linear probing: If a record is hashed to index j, which is already occupied, then look in index j+1, j+2, … and put the record in the next available index Clusters can form when items are hashed to the same address Anything hashed to any index within the cluster makes the cluster bigger → the bigger it gets, the faster it grows Sparse tables reduce the problem but it still exists This is called primary clustering. COSC 2P03 Week 11

Quadratic probing: Attempts to avoid clustering problem by checking indices that are further apart If a record is hashed to index j, which is already occupied, then look in index j+1, j+4, j+9, …, and put the record in the next available index. Items hashed to the same index will all check the same sequence of indices (secondary clustering) COSC 2P03 Week 11

Double hashing: Attempts to avoid both primary and secondary clustering by using a second hashing function to determine which other indices should be tried after a collision Uses 2 hash functions, h1(k) ≠ h2(k) If h1(k) hashes a record to index j, which is already occupied, then use h2(k) as a step size for subsequent probes COSC 2P03 Week 11

Searching Tables Table: sequence of (key,information) pairs

Similar presentations

Presentation on theme: "Searching Tables Table: sequence of (key,information) pairs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Searching Tables Table: sequence of (key,information) pairs

Similar presentations

Presentation on theme: "Searching Tables Table: sequence of (key,information) pairs"— Presentation transcript:

Similar presentations

About project

Feedback