1/51 Dictionaries, Tables Hashing TCSS 342. 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables
Advertisements

© 2004 Goodrich, Tamassia Hash Tables
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing CS 3358 Data Structures.
Dictionaries and Hash Tables1  
1/51 Dictionaries, Tables Hashing TCSS 342 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Maps, Dictionaries, Hashing
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Dictionaries and Hash Tables. Dictionary A dictionary, in computer science, implies a container that stores key-element pairs called items, and allows.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Searching the dictionary ADT binary search binary search trees.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
1 the hash table. hash table A hash table consists of two major components …
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Algorithms Design Fall 2016 Week 6 Hash Collusion Algorithms and Binary Search Trees.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Searching.
Dictionaries 11/23/2018 5:34 PM Hash Tables   Hash Tables.
Dictionaries and Hash Tables
CSCE 3110 Data Structures & Algorithm Analysis
Hash Tables   Maps Dictionaries 12/7/2018 5:58 AM Hash Tables  
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries and Hash Tables
Algorithms and Data Structures Lecture VI
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

1/51 Dictionaries, Tables Hashing TCSS 342

2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary stores key-element pairs the main operation supported by a dictionary is searching by key

3/51 Examples Telephone directory Library catalogue Books in print: key ISBN FAT (File Allocation Table)

4/51 Main Issues Size Operations: search, insert, delete, ??? Create reports??? List? What will be stored in the dictionary? How will be items identified?

5/51 The Dictionary ADT simple container methods: – size () –isEmpty () –elements () query methods: –findElemen t(k) –findAllElements (k)

6/51 The Dictionary ADT update methods: –insertItem(k, e) –removeElement(k) –removeAllElements(k) special element –NO_SUCH_KE Y, returned by an unsuccessful search

7/51 Implementing a Dictionary with a Sequence unordered sequence –searching and removing takes O(n) time –inserting takes O(1) time –applications to log files (frequent insertions, rare searches and removals)

8/51 Implementing a Dictionary with a Sequence array-based ordered sequence (assumes keys can be ordered) - searching takes O(log n) time (binary search) - inserting and removing takes O(n) time - application to look-up tables (frequent searches, rare insertions and removals)

9/51 Binary Search narrow down the search range in stages “high-low” game findElemen t(22) low high mid 14

10/51 Binary Search low high mid low high mid low = mid = high 22

11/51 Pseudocode for Binary Search Algorithm BinarySearch(S, k, low, high) if low > high then return NO_SUCH_KEY else mid  (low+high) / 2 if k = key(mid) then return key(mid) else if k < key(mid) then return BinarySearch(S, k, low, mi  -1) else return BinarySearch(S, k, mi  +1, high)

12/51 Running Time of Binary Search The range of candidate items to be searched is halved after each comparison ComparisonSearch Range 0 n 1 n/2 …… 2i2i n/2 i log 2 n1

13/51 Running Time of Binary Search In the array-based implementation, access by rank takes O(1) time, thus binary search runs in O(log n) time Binary Search is applicable only to Random Access structures (Arrays, Vectors…)

14/51 Implementations Sorted? Non Sorted? Elementary: Arrays, vectors linked lists –Orgainization: None (log file), Sorted, Hashed Advanced: balanced trees

15/51 Skip Lists Simulate Binary Search on a linked list. Linked list allows easy insertion and deletion.

16/51 A FAT Example Directory: Key: file name. Data (time, date, size …) location of first block in the FAT table. If first block is in physical location #23 (Disk block number) look up position #23 in the FAT. Either shows end of file or has the block number on disk. Example: Directory entry: block # 4 FAT: x x x F x The file occupies blocks 4,5,6,10, 3.

17/51 Hashing Place item with key k in position h(k). Hope: h(k) is 1-1. Requires: unique key (unless multiple items allowed). Key must be protected from change (use abstract class that provides only a constructor). Keys must be “comparable”.

18/51 Key class public abstract class KeyID { Private Comparable searchKey; Public KeyID(Comparable m) { searchKey = m; }//Only one constructor public Comparable getSearchKey() { return searchKey; }

19/51 Hashing Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone numbers are in the range 0 to R = –1 –n is the number of phone numbers used –want to do this as efficiently as possible

20/51 Hashing Problem We know two ways to design this dictionary: abalanced search tree (AVL, red-black) or a skip- list with the phone number as the key has O(log n) query time and O(n) space --- good space usage and search time, but can we reduce the search time to constant? abucket array indexed by the phone number has optimal O(1) query time, but there is a huge amount of wasted space: O(n + R)

21/51 Bucket Array Each cell is thought of as a bucket or a container –Holds key element pairs –In array A of size N, an element e with key k is inserted in A[k]. (null) Roberto(null)……

22/51 Generalized indexing Hash table –Data storage associated with a key –The key need not be an integer

23/51 Hash Tables A data structure The location of an item is determined –directly as a function of the item itself –Not by a sequence of trial and error comparisons Commonly used to provide faster searching –O(n) for linear searches –O (logn) for binary search –O(1) for hash table

24/51 Example: A symbol table constructed by a compiler –Stores identifiers and information about them

25/51 Another Solution A Hash Table is an alternative solution with O(1) expected query time and O(n + N) space, where N is the size of the table Like an array, but with a function to map the large range of keys into a smaller one –e.g., take the original key, mod the size of the table, and use that as an index

26/51 Example Insert item ( , Roberto) into a table of size mod 5 = 4, so item ( , Roberto) is stored in slot 4 of the table A lookup uses the same process: map the key to an index, then check the array cell at that index Roberto

27/51 Collision Insert ( , Andy) And insert ( , Devin). We have a collision!

28/51 Collision Resolution How to deal with two keys which map to the same cell of the array? Use chaining –Set up lists of items with the same index

29/51 Chaining

30/51 Chaining The expected, search/insertion/removal time is O(n/N), provided the indices are uniformly distributed The performance of the data structure can be fine-tuned by changing the table size N

31/51 Hash Function Function h defined by h(i) = i –Determines the location of an item i in the hash table Called a hash function. To reduce the large size of a hash table use h(i) = i mod 25;

32/51 From Keys to Indices The mapping of keys to indices of a hash table is called a hash function A hash function is usually the composition of two maps: –hash code map: key   integer –compression map: integer   [0, N  1] An essential requirement of the hash function is to map equal keys to equal indices A “good” hash function minimizes the probability of collisions

33/51 Java Hash Java provides a hashCode() method for the Object class, which typically returns the 32- bit memory address of the object. This default hash code would work poorly for Integer and String objects The hashCode() method should be suitably redefined by classes.

34/51 Popular Hash-Code Maps Integer cas t : for numeric types with 32 bits or less, we can reinterpret the bits of the number as an int Component sum: for numeric types with more than 32 bits (e.g., long and doubl e), we can add the 32-bit components.

35/51 Popular Hash-Code Maps Polynomial accumulation: for strings of a natural language, combine the character values (ASCII or Unicode) a 0 a 1... a n-1 by viewing them as the coefficients of a polynomial: a 0 + a 1 x x n-1 a n-1

36/51 Popular Hash-Code Maps –The polynomial is computed with Horner’s rule, ignoring overflows, at a fixed value x: a 0 + x (a 1 + x (a x (a n-2 + x a n-1 )... )) –The choice x = 33, 37, 39, or 41 gives at most 6 collisions on a vocabulary of 50,000 English words Why is the component-sum hash code bad for strings?

37/51 Random Hashing Random hashing –Uses a simple random number generation technique –Scatters the items “randomly” throughout the hash table

38/51 Popular Compression Maps Division: h(k) = |k| mod N –the choice N =2 k is bad because not all the bits are taken into account –the table size N is usually chosen as a prime number –certain patterns in the hash codes are propagated Multiply, Add, and Divide (MAD): –h(k) = |ak + b| mod N –eliminates patterns provided a mod N  0 –same formula used in linear congruential (pseudo) random number generators

39/51 More on Collisions A key is mapped to an already occupied table location –what to do?!? Use a collision handling technique –We’ve seen Chaining –Can also use Open Addressing Double Hashing Linear Probing

40/51 Linear Probing If the current location is used, try the next table location linear_probing_insert(K) if (table is full) error probe = h(K) while (table[probe] occupied) probe = (probe + 1) mod M table[probe] = K

41/51 Linear Probing Lookups walk along table until the key or an empty slot is found Uses less memory than chaining –don’t have to store all those links Slower than chaining –may have to walk along table for a long way Deletion is more complex –either mark the deleted slot –or fill in the slot by shifting some elements down

42/51 Linear Probing Example h(k) = k mod 13 Insert keys:

43/51 Double Hashing Use two hash functions If M is prime, eventually will examine every position in the table double_hash_insert(K) if(table is full) error probe = h1(K) offset = h2(K) while (table[probe] occupied) probe = (probe + offset) mod M table[probe] = K

44/51 Double Hashing Many of same (dis)advantages as linear probing Distributes keys more uniformly than linear probing does

45/51 Double Hashing Example h1(K) = K mod 13 h2(K) = 8 - K mod 8 –we want h2 to be an offset to add –

46/51 Hash code static int hashCode(long i) { return (int)((i >> 32) + (int) i);}

47/51 Hash code static int hashCode(String s) { int h=0; for (int i=0; i >> 27); // 5-bit cyclic shift of the running sum h += (int) s.charAt(i); // add in next character } return h; }

48/51 Linear Probing Hash Table public class LinearProbingHashTable implements Dictionary { /** Marker for deactivated buckets */ private static Item AVAILABLE = new Item(null, null); /** number of items in the dictionary */ private int n = 0; /** capacity of the bucket array */ private int N; /** bucket array */ private Item[] A;

49/51 Linear Probing Hash Table /** hash comparator */ private HashComparator h; /** constructor providing the hash comparator */ public LinearProbingHashTable(HashComparator hc) { h = hc; N = 1023; // default capacity A = new Item[N]; }

50/51 Linear Probing Hash Table /** constructor providing the hash comparator and the capacity * of the bucket array */ public LinearProbingHashTable(HashComparator hc, int bN) { h = hc; N = bN; A = new Item[N]; }

51/51 Linear Probing Hash Table // auxiliary methods private boolean available(int i) { return (A[i] == AVAILABLE); } private boolean empty(int i) { return (A[i] == null); }

52/51 Linear Probing Hash Table private Object key(int i) { return A[i].key(); } private Object element(int i) { return A[i].element(); } private void check(Object k) { if (!h.isComparable(k)) throw new InvalidKeyException("Invalid key."); }

53/51 Helper search method /** helper search method */ private int findItem(Object key) throws InvalidKeyException { check(key); int i = h.hashValue(key) % N; // division method compression map int j = i;

54/51 Helper search method do { if (empty(i)) return -1; // item is not found if (available(i)) i = (i + 1) % N; // bucket is deactivated else if (h.isEqualTo(key(i), key)) // we have found our item return i; else // we must keep looking i = (i + 1) % N; } while (i != j); return -1; // item is not found }

55/51 Dictionary // methods of the dictionary ADT public Object findElement (Object key) throws InvalidKeyException { int i = findItem(key); // helper method for finding a key if (i < 0) return Dictionary.NO_SUCH_KEY; return element(i); }

56/51 Dictionary public void insertItem (Object key, Object element) throws InvalidKeyException { check(key); int i = h. hashValue(key) % N; // division method compression map int j = i;

57/51 Dictionary // remember where we are starting do { if (empty(i) || available(i)) { // this slot is available A[i] = new Item(key, element); n++; return; } i = (i + 1) % N; // check next slot } while (i != j); // repeat until we return to start throw new HashTableFullException("Hash table is full."); }

58/51 Dictionary public Object removeElement (Object key) throws InvalidKeyException { int i = findItem(key); // find this key first if (i <0) return Dictionary.NO_SUCH_KEY; // nothing to remove Object toReturn = element(i); A[i] = AVAILABLE; // mark this slot as deactivated n--; return toReturn; }