Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 10 CS2013.

Similar presentations


Presentation on theme: "1 Lecture 10 CS2013."— Presentation transcript:

1 1 Lecture 10 CS2013

2 2 Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects can be expensive suppose your set of Strings stores the text of various books, you are adding a book, and you need to make sure you are preserving the Set definition – ie that no book already in the Set has the same text as the one you are adding. A hash function maps each datum to a value of a fixed size. This reduces the search space and makes searching less expensive Hash functions must be deterministic, since when we search for an item we will search for its hashed value. If an identical item is in the list, it must have received the same hash value

3 3 Hashing Any function that maps larger data to smaller ones must map more than one possible original datum to the same mapped value Diagram from Wikipedia When more than one item in a collection receives the same hash value, a collision is said to occur. There are various ways to deal with this. The simplest is to create a list of all items with the same hash code, and do a sequential or binary search of the list if necessary

4 4 Hashing Hash functions often use the data to calculate some numeric value, then perform a modulo operation. This results in a bounded-length hash value. If the calculation ends in % n, the maximum hash value is n-1, no matter how large the original data is. It also tends to produce hash values that are relatively uniformly distributed, minimizing collisions. Many hash-based data structures apply a compression function with a second modulo operation to scale the hash codes to a desired size and to resize the map (ie create new, different-sized array) when desired to either save space or reduce collisions Read the book’s descriptions of various ways to create hash functions; the only details that are important now are that Hash functions usually involve one or two modulo operations for scaling Hash functions are designed to disperse the hash codes in a way that appears random in order to avoid collisions

5 Hash Functions A hash function is usually specified as the composition of two functions: Hash code: h1: keys  integers Compression function: h2: integers  [0, N- 1] The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h2(h1(x)) The goal of the hash function is to “disperse” the keys in an apparently random way © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 5

6 Sparse: Not Sparse: Rural addresses: Juror numbers: 2113 Quail Road
6 Hashing The more sparse the data, the more useful hashing is Sparse: Rural addresses: 1920 Quail Road 2113 Quail Road 3047 Quail Road 3310 Quail Road 4111 Quail Road Not Sparse: Juror numbers: Juror 1 Juror 2 Juror 3 Juror 4 Juror 5 Juror 6

7 7 Hashing A hash table uses a hash function to map data to array indices. Where the size of the array is N and the hash function is h, store an item with key k at index h(k)

8 Example We design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integer Our hash table uses an array of size N = 10,000 and the hash function h(x) = last four digits of x 1 2 3 4 9997 9998 9999 © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 8

9 9 Hashing This one uses a better hash function:

10 Collision Handling Collisions occur when different elements are mapped to the same cell Separate Chaining: let each cell in the table point to a linked list of entries (bucket) that map there 1 2 3 4 Separate chaining is simple, but requires additional memory outside the table © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 10

11 Map with Separate Chaining
Delegate operations to a list-based map at each cell: Algorithm get(k): return A[h(k)].get(k) Algorithm put(k,v): t = A[h(k)].put(k,v) if t = null then {k is a new key} n = n + 1 return t Algorithm remove(k): t = A[h(k)].remove(k) if t ≠ null then {k was found} n = n - 1 © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 11

12 Linear Probing Example: h(x) = x mod 13
Open addressing: the colliding item is placed in a different cell of the table Linear probing: handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 1 2 3 4 5 6 7 8 9 10 11 12 © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 12

13 Search with Linear Probing
Consider a hash table A that uses linear probing get(k) We start at cell h(k) We probe consecutive locations until one of the following occurs An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed Algorithm get(k) i  h(k) p  0 repeat c  A[i] if c =  return null else if c.getKey () = k return c.getValue() else i  (i + 1) mod N p  p + 1 until p = N return null © 2014 Goodrich, Tamassia, Godlwasser Hash Tables 13

14 Hashing Performance of most hash table operations:
14 Hashing Performance of most hash table operations: O(1) if the keys are perfectly distributed (no collisions) O(n) in the worst case (all the hash codes turn out the same, so that in every case we search a bucket list (separate chaining) or do a linear probe (open addressing)

15 15 Maps A List or array can be thought of as a set of key-value pairs in which the keys are integers (the indexes) and the values are the data being stored. Suppose we want to be able to look up values using a key other than an integer index. For example, we need to look up friends' addresses based on their names. We could write a class with instance variables for name and address and then construct a List or Set. When we need to look up an address, we iterate through the list looking for a match for the name of the person whose address we want to look up. Maps provide a simpler alternative by mapping keys of any type to values of any other type.

16 16 The Map Interface The Map interface contains methods that map keys to the elements. The keys serve the same function as array or lists indices. In List, the indexes are integer, but in Map, the keys can be any objects. An instance of Map represents a group of objects (values) and a group of keys. You get the object from a Map by searching for the key. Correspondingly, you supply the key when you add a value to the map. A Map is parameterized with two types, the type of the key and the type of the value. For example, a Map <Long, Student> would be convenient for looking up students by CIN

17 17 The Map Interface

18 18 Entry

19 19 HashMap and TreeMap The HashMap and TreeMap classes are two concrete implementations of the Map interface. HashMap is implemented with an underlying hash table and is efficient for locating a value, inserting a mapping, and deleting a mapping. TreeMap is implemented with a well- balanced Binary Search Tree (future week) and is efficient for traversing the keys in a sorted order.

20 20 HashMap Map<String, String> myDict = new HashMap<String, String>(); myDict.put("evacuate", "remove to a safe place"); myDict.put("descend", "move or fall downwards"); myDict.put("hypochondriac", "a person who is abnormally anxious about their health"); myDict.put("injunction", "an authoritative warning or order"); myDict.put("creek", "a stream, brook, or a minor tributary of a river"); myDict.put("googol", "10e100"); String defString = "The definition of "; System.out.println(defString + "descend : " + myDict.get("descend")); System.out.println(defString + "injunction : " + myDict.get("injunction")); System.out.println(defString + "googol : " + myDict.get("googol")); //

21 21 LinkedHashMap The entries in a HashMap are not ordered in a way that is typically useful to the programmer. LinkedHashMap extends HashMap with a linked list implementation that supports an ordering of the entries in the map. Entries in a LinkedHashMap can be retrieved in the order in which they were inserted into the map (known as the insertion order), or the order in which they were last accessed, from least recently accessed to most recently (access order). The no-arg constructor constructs a LinkedHashMap with the insertion order. LinkedHashMap<key type, value type>(initialCapacity, loadFactor, true).

22 Example: LinkedHashMap
22 Example: LinkedHashMap // adapted from public static void main(String args[]) { // Create a hash map LinkedHashMap<String, Double> lhm = new LinkedHashMap<String, Double>(); // Put elements to the map lhm.put("Zara", new Double( )); lhm.put("Mahnaz", new Double(123.22)); lhm.put("Ayan", new Double( )); lhm.put("Daisy", new Double(99.22)); lhm.put("Qadir", new Double(-19.08)); // Get a set of the entries Set<Entry<String, Double>> set = lhm.entrySet(); // Get an iterator Iterator<Entry<String, Double>> i = set.iterator(); // Display elements while (i.hasNext()) { Entry<String, Double> me = i.next(); System.out.print(me.getKey() + ": "); System.out.println(me.getValue()); } System.out.println(); // Deposit 1000 into Zara's account double balance = lhm.get("Zara").doubleValue(); lhm.put("Zara", new Double(balance )); System.out.println("Zara's new balance: " + lhm.get("Zara"));

23 TreeMap 23 package demos; import java.util.TreeMap;
public class Demo { //adapted from public static void main(String[] args) { TreeMap<Integer, String> tMap = new TreeMap<Integer, String>(); // inserting data in alphabetical order by entry value tMap.put(6, "Friday"); tMap.put(2, "Monday"); tMap.put(7, "Saturday"); tMap.put(1, "Sunday"); tMap.put(5, "Thursday"); tMap.put(3, "Tuesday"); tMap.put(4, "Wednesday"); // data ends up sorted by key value System.out.println("Keys of tree map: " + tMap.keySet()); System.out.println("Values of tree map: " + tMap.values()); System.out.println("Key: 5 value: " + tMap.get(5) + "\n"); System.out.println("First key: " + tMap.firstKey() + " Value: " + tMap.get(tMap.firstKey()) + "\n"); System.out.println("Last key: " + tMap.lastKey() + " Value: " + tMap.get(tMap.lastKey()) + "\n");

24 24 TreeMap System.out.println("All values with an enhanced for loop: "); for(String s: tMap.values()) System.out.println(s); System.out.println("First three values: "); for(String s: tMap.headMap(4).values()) System.out.println("Values starting with fourth value: "); for(String s: tMap.tailMap(4).values()) System.out.println("Removing first datum: " + tMap.remove(tMap.firstKey())); System.out.println("Now the tree map Keys: " + tMap.keySet()); System.out.println("Now the tree map contain: " + tMap.values() + "\n"); System.out.println("Removing last entry: " + tMap.remove(tMap.lastKey())); System.out.println("Now the tree map contain: " + tMap.values()); }

25 Counting the Occurrences of Words in a Text
25 Counting the Occurrences of Words in a Text This program counts the occurrences of words in a text and displays the words and their occurrences in ascending alphabetical order. The program uses a hash map to store a pair consisting of a word and its count. Algorithm for handling input: For each word in the input file Remove non-letters (such as punctuation marks) from the word. If the word is already present in the frequencies map Increment the frequency for that entry. Else Insert an entry with frequency = 1 To sort the map, convert it to a tree map.   Source: Liang

26 26 package frequency; import java.io.File;
import java.io.FileNotFoundException; import java.util.Map; import java.util.Scanner; import java.util.TreeMap; public class WordFrequency { public static void main(String[] args) throws FileNotFoundException { Map<String, Integer> frequencies = new TreeMap<String, Integer>(); // in your labs, ask the user for the path to the input file; don’t hardcode it like this. Scanner in = new Scanner(new File("romeojuliet.txt")); while (in.hasNext()) { String word = clean(in.next()); // Get the old frequency count if (word != "") { Integer count = frequencies.get(word); // If there was none, put 1; otherwise, increment the count if (count == null) { count = 1; } else { count = count + 1; } frequencies.put(word, count); // Print all words and counts for (String key : frequencies.keySet()) { System.out.println(key + ": " + frequencies.get(key));

27 27 /** * Removes non-letter characters from a String * * @param s
a string with all the letters from s */ public static String clean(String s) { String r = ""; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (Character.isLetter(c)) { r = r + c; } return r.toLowerCase();


Download ppt "1 Lecture 10 CS2013."

Similar presentations


Ads by Google