LECTURE 34: MAPS & HASH CSC 212 – Data Structures.

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Hashing as a Dictionary Implementation
CSC 212 – Data Structures. Using Stack Stack Limitations  Great for Pez dispensers, JVMs,& methods  All of these use most recent item added only 
© 2004 Goodrich, Tamassia Hash Tables1  
Maps. Hash Tables. Dictionaries. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010 Goodrich, Tamassia.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Data Structures Hash Tables
Maps, Dictionaries, Hashtables
Dictionaries and Hash Tables1  
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Hash Table March COP 3502, UCF.
CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What.
CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
LECTURE 37: ORDERED DICTIONARY CSC 212 – Data Structures.
Hash Tables1   © 2010 Goodrich, Tamassia.
LECTURE 36: DICTIONARY CSC 212 – Data Structures.
LECTURE 26: QUEUES CSC 212 – Data Structures. Using Stack.
CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
CSC 212 – Data Structures Lecture 26: Hash Tables.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Question of the Day  How can you change the position of 1 toothpick and leave the giraffe in exactly the same form, but possibly mirror-imaged or oriented.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
1 the hash table. hash table A hash table consists of two major components …
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Question of the Day  How can you change the position of 1 toothpick and leave the giraffe in exactly the same form, but possibly mirror-imaged or oriented.
LECTURE 21: RECURSION & LINKED LIST REVIEW CSC 212 – Data Structures.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Building Java Programs Generics, hashing reading: 18.1.
CSC 213 – Large Scale Programming. Today’s Goal  Review when, where, & why we use Map s  Why Sequence -based approach causes problems  How hash can.
Dictionaries 9/14/ :35 AM Hash Tables   4
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Dictionaries 4/5/2019 1:49 AM Hash Tables  
slides created by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Dictionaries and Hash Tables
Presentation transcript:

LECTURE 34: MAPS & HASH CSC 212 – Data Structures

Entry ADT  Entry ADT represents searchable data  Two methods declared in Entry: key() & value()  Entry implementations need key & value fields  Entry instance holds single key-value pair  setValue() also included in most implementations  Does NOT define setKey()

Sequence:Element::Map:___  Sequence is collection of elements  Many implementations possible for this ADT  All of them could hold a number of elements  Collection of Entry s is defined by a Map  Possible to have many implementations of Map  Entry s stored in each of these implementations 9 “c” Entry s 11 “xd” 1 “ab” -4 “dc” View of the Map

Sequence:Element::Map:___  Sequence is collection of elements  Many implementations possible for this ADT  All of them could hold a number of elements  Collection of Entry s is defined by a Map  Possible to have many implementations of Map  Entry s stored in each of these implementations 9 “c” Position s elements 11 “xd” 1 “ab” -4 “dc” View of the Sequence used by the Map

Lessons from Polly… 1. When searching, key get (s) value 2. Each key is unique & has at most 1 value value 3. Failed search is usual case, not exceptional one

 In all seriousness, can be matter of life-or-death  911 Operators immediately need addresses  Google’s search performance in TB/s  O(log n) time too slow for these uses  Would love to use arrays  Get O(1) time to add, remove, or lookup data  This HUGE array needs massive RAM purchase Map Performance

Monster Amounts of RAM  Java requires int s be used as array indices  Unfortunately int and RAM have limits  Integer.MAX_VALUE = 2,147,483,647  Items in Google index = ~8,200,000,000 (2005)  Possible phone numbers = 10,000,000,000  Enabling O (1) array use requires we do more  As with all life’s problems we turn to hash

 Hash function turns key into int from 0 – N -1  Result is usable as index for an array  Function specific for key type; cannot be reused  Store the Entry s in array – a HASH TABLE  (Great name for shop in Amsterdam, too)  Compute index with hash function  Entry stored in array at that index  If O(1) time used computing hash  Could need O(1) time to get Entry  Adding & removing in O(1) time, too Hashing To The Rescue

Hash Table Example  Table is array of Entry  Simple hash function is h(x)  x mod 10,000  Key used is x  h(x) is Entry ’s index  Always mod array length  Not all locations used  Holes can appear in array  Empty slots left null Hash Table Entry s “Jay Doe” “Bob Doe” “Jill Roe” “Rhi Smith” 9999

What Hash Does  Implement Map with a hash table  Given a key, easily look up its Entry  Always computes same index for that key  Hash must be computed on each access  O(1) efficiency of array utilized  But is wasted if hash is slow  Spreads out Entry s, ideally  Want to use entire hash table

Bad Hash  h(x) = 0  Fast, repeatable, little use of table  h(x) = random.nextInt ()  Fast, not repeatable, uses entire table  h(x) = current index -or- free index  Slow, repeatable, uses entire table  h(x) = x x x x 31 …  Moderate, repeatable, but too large

Really Bad Hash  Using only part of the key  Inevitably, you will guess wrong Portion of key that matters Use this portion of this key

 Hash first turns key into int  East to do for numbers, at least  For a String, could add value of each character  Would hash to same index “spot”, “pots”, “stop”  Instead use polynomial code like Horner’s method: ( x 0 * a k-1 ) + (x 1 * a k-2 ) + … + (x k-2 * a 1 ) + x k-1 Good Hash Censored Example: “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + ‘t’

 Hash only use is computing array indices  Useless if larger than table’s length: no index exists!  “spot” = 4,293,383, when a =33  “triskaidekaphobia” = too big for my calculator  Instead use modulus (%) to compress result: result = (result + length) % length  Remember that modulus returns the remainder  Keeps result within array (just like array-based queue) Compression

 Occurs when 2 keys hash to same index  Ideal hash spreads keys out evenly across table  As much as possible this limits collisions  Small table size important also, since RAM limited  Unfortunately, there is no such thing as ideal hash  Must handle collisions if you want it to work  Ultimately, this could kill our O(1) efficiency buzz Collisions

Bucket Arrays  Make hash table an array of linked list Node s  First node in a linked list aliased by each array location  Whenever we have collision, we “chain” Entry s  Create new Node that stores the Entry  The linked list will have new Node at its front

Bucket Arrays  But what if have really bad hash?  Hashes to same index in every situation  All Entry s now found in single linked list  O(n) execution times would now be required

 Continue week #12 assignment  Due at usual time, whatever that may be  Read sections – of the book  Examine better approaches to handling collisions  Consider what we should do in following situation: Before Next Lecture…