Hash Tables Buckets/Chaining

Slides:



Advertisements
Similar presentations
CSCE 3400 Data Structures & Algorithm Analysis
Advertisements

Skip List & Hashing CSE, POSTECH.
© 2004 Goodrich, Tamassia Hash Tables1  
CS 261 – Data Structures Hash Tables Part 1. Open Address Hashing.
CS 261 – Data Structures Hash Tables Part III: Hash like sorting algorithms.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS261 Data Structures Hash-like Sorting. Hash Tables: Sorting Can create very fast sort programs using hash tables These sorts are not ‘general purpose’
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
CS 261 – Data Structures Hash Tables Hash-like Sorting.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Sections 10.5 – 10.6 Hashing.
Hashing (part 2) CSE 2011 Winter March 2018.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing CENG 351.
Data Structures Using C++ 2E
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hash Tables Part II: Using Buckets
Algorithm Design and Analysis (ADA)
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Design and Analysis of Algorithms
Advanced Associative Structures
Hash Table.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Dictionaries and Their Implementations
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables and Associative Containers
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Algorithms CSCI 235, Spring 2019 Lecture 18 Linear Sorting
Hash Tables Open Address Hashing
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
Data Structures and Algorithm Analysis Hashing
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Linear Time Sorting.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Dictionaries and Hash Tables
Presentation transcript:

Hash Tables Buckets/Chaining CS 261 – Data Structures Hash Tables Buckets/Chaining

Hash Tables Hash tables are similar to Vectors except… Elements can be indexed by values other than integers A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: Transform the value (or key) to an integer Map that integer to a valid hash table index Example: storing names Compute an integer from a name Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0 Angie, Robert Hash Function 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: Open address hashing: if a spot is full, probe for next empty spot Chaining (or buckets): keep a collection at each table entry Caching: save most recently accessed values, slower search otherwise Today we will examine Chaining/Buckets

Resolving Collisions: Chaining / Buckets Maintain a collection (i.e., a Bag ADT) at each table entry: Chaining/buckets: maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry Overflow: Keep a separate overflow area that is linked (maintains a link to the next free position) 0 Angie Robert 0 Angie 1 Linda 1 Linda 2 Joe Max John 2 Joe Max 3 3 Robert 4 Abigail Mark 4 Abigail John Mark Next free

Hash Table Implementation: Initialization struct HashTable { struct List **table; /* Hash table  Array of Lists. */ int cnt; int size; } void initHashTable(struct HashTable *ht, int size) { int i; ht->size = size; ht->cnt = 0; ht->table = (struct List **) malloc(size * sizeof(struct List *)); assert(ht->table != 0); for(i = 0; i < size; i++) ht->table[i] = newList();

Hash Table Implementation: Add void addHashTable(struct HashTable *ht, TYPE val) { /* Compute hash table bucket index. */ int idx = HASH(val) % ht->size; if (idx < 0) idx += ht->size; /* Add to bucket. */ addList(ht->table[idx], val); ht->cnt++; /* Next step: Reorganize if load factor to large. */ }

Hash Table: Contains & Remove Both just use linked list functions on the correct bucket Contains: find correct bucket, then see if element is there Remove: slightly more tricky, because you only want to decrement the count only if element is actually in list Alternatives: instead of keeping count in hash table, can call count on each list. What are pro/con for this?

Hash Table Size Load factor: l = n / m Load factor represents average number of elements at each table entry For chaining, load factor can be greater than 1 Want the load factor to remain small Same as open table hashing: if load factor becomes larger than some fixed limit (say, 8)  double table size # of elements Load factor Size of table

Hash Tables: Algorithmic Complexity Assumptions: Time to compute hash function is constant Chaining uses a linked list Worst case analysis  All values hash to same position Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: Worst case for open addressing  O( ) Worst case for chaining  O( ) Best case for open addressing  O( ) Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

Hash Tables: Average Case Assume hash function distributes elements uniformly (a BIG if) Average case for all operations: O() Want to keep the load factor relatively small Resize table (doubling its size) if load factor is larger than some fixed limit (e.g., 8) Only improves things IF hash function distributes values uniformly What happens if hash value is always zero?

When should you use hash tables? Data values must have good hash functions defined (e.g., string, double) Or write your own hash function Need to know that values are uniformly distributed Otherwise, a skip list or AVL tree is often faster

Your Turn Worksheet 25: Hash Tables using Buckets Questions?? Use linked list for buckets Keep track of number of elements Resize table if load factor is bigger than 8 Questions??

Hash Tables Hash-like Sorting CS 261 – Data Structures Hash Tables Hash-like Sorting

Hash Tables: Sorting Can create very fast sort programs using hash tables Unfortunately, these sorts are not general purpose: Only work with positive integer values (or other data that is readily mapped into positive integer values) Examples: Counting sort Radix sort

Hash Table Sorting: Counting Sort Quickly sort positive integer values from a limited range Count (tally) the occurrences of each value Recreate sorted values according to tally Example: Sort 1,000 integer elements with values between 0 and 19 Count (tally) the occurrences of each value: 0 - 47 4 - 32 8 - 41 12 - 43 16 - 12 1 - 92 5 - 114 9 - 3 13 - 17 17 - 15 2 - 12 6 - 16 10 - 36 14 - 132 18 - 63 3 - 14 7 - 37 11 - 92 15 - 93 19 - 89 Recreate sorted values according to tally: 47 zeros, 92 ones, 12 twos, …

Counting Sort: Implementation /* Sort an array of integers, each element no larger than max. */ void countSort(int data[], int n, int max) { int i, j, k; /* Array of all possible values. */ int *cnt = (int *)calloc(max + 1, sizeof(int)); for (i = 0; i < n; i++) /* Count the occurrences */ cnt[data[i]]++; /* of each value. */ /* Count holds the number of occurrences of numbers from 0 to max. */ i = 0; /* Now put values */ for (j = 0; j <= max; j++) /* back into the array. */ for (k = cnt[j]; k > 0; k--) data[i++] = j; }

Radix Sort Another specialized sorting algorithm Has historical ties to punch cards

Sorting Punch Cards It was far to easy to drop a tray of cards, which could be a disaster Convention became to put a sequence number on card, typically in positions 72-80 Could then be rebuilt by sorting on these positions A machine called a sorter used to resort the cards

Mechanical Sorter: Sorts a Single Column

Mechanical Sorter First sort on column 80 Then collect piles, keeping them in order, and sort on column 79 Repeat for each of the columns down to 72 At the end, the result is completely sorted Try it

Hash Table Sorting: Radix Sort Sorts positive integer values over any range Hash table size of 10 (0 through 9) Values are hashed according to their least significant digit (the “ones” digit) Values then rehashed according to the next significant digit (the tens digit) while keeping their relative ordering Process is repeated until we run out of digits Can also sort by hashing on: Characters in a String  table size of 26 (‘A’ through ‘Z’) Bytes in an integer  table size of 256 (as opposed to 10 above)

Radix Sort: Example Data: 624 762 852 426 197 987 269 146 415 301 730 78 593 Bucket Pass1 Pass2 Pass3 _ 0 730 301 78 1 301 415 146 - 197 2 762 - 852 624 - 426 269 3 593 730 301 4 624 146 415 - 426 5 415 852 593 6 426 - 146 762 - 269 624 7 197 - 987 78 730 - 762 8 78 987 852 9 269 593 - 197 987

Your Turn Worksheet 26: Radix Sorting Questions