1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012.

Slides:



Advertisements
Similar presentations
CSCI 2720 Hashing   Spring 2005.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
© 2004 Goodrich, Tamassia Hash Tables1  
Dictionaries1 © 2010 Goodrich, Tamassia m l h m l h m l.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Maps. Hash Tables. Dictionaries. 2 CPSC 3200 University of Tennessee at Chattanooga – Summer 2013 © 2010 Goodrich, Tamassia.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,
Hashing General idea: Get a large array
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 14: Intro to Recursion.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures Week 6 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Information and Computer Sciences University of Hawaii, Manoa
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
ISOM MIS 215 Module 5 – Binary Trees. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
COMP 103 Course Review. 2 Menu  A final word on hash collisions in Open Addressing / Probing  Course Summary  What we have covered  What you should.
Lecture14: Hashing Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Hashing CSE 2011 Winter July 2018.
Hashing Exercises.
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
i206: Lecture 13: Recursion, continued Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Maps.
CS210- Lecture 17 July 12, 2005 Agenda Collision Handling
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012

2 Outline What is a data structure Basic building blocks: arrays and linked lists Data structures (uses, methods, performance): –List, stack, queue –Dictionary –Tree –Graph

3 Dictionary Also known as hash table, lookup table, associative array, or map A searchable collection of key-value pairs –each key is associated with one value Many possible applications, e.g., –Address book –Student record database –Routing tables –…

4 Slide adapted from lecture by Andreas Veneris Hash Tables We want a data structure that, given a collection of n keys, implements the dictionary operations Insert(), Delete() and Search() efficiently. Binary search trees: can do that in O(log n) time and are space efficient. Arrays: can do this in O(1) time but they are not space efficient. Hash Tables: A generalization of an array that under some reasonable assumptions is O(1) for Insert/Delete/Search of a key

5 Slide adapted from lecture by Andreas Veneris How can you store all Social security numbers in an array and have O(1) access? –Use an array with range ,999,999 –This will give you O(1) access time but … –…considering there are approx. 32,000,000 people in Canada you waste 1,000,000,000-32,000,000 array entries! Problem: The range of key values we are mapping is too large (0-999,999,999) when compared to the # of keys (American citizens) So an array is an inefficent use of space in this example. Why Not Arrays?

6 Dictionary Methods get(k): if the dictionary has an entry with key k, return its associated value; else return null put(k,v): insert entry (k,v) into the dictionary; if key is not already in dictionary, return null; else return old value associated with k remove(k): if dictionary has an entry with key k, remove it from dictionary and return its associated value; else return null size() isEmpty()

7 Dictionary in Python

8 Desirable Properties Fast search, insert and delete (get, put, and remove) Efficient space usage

9 Hash Table A generalization of an array that, if properly designed, can realize fast insert/delete/search operations with good space efficiency –average run-time of O(1) –worst case run-time of O(n) A hash table is: –Good for storing and retrieving key-value pairs –Not good for iterating through a list of items

10 Hash Table A hash table consists of two major components: –Bucket array (for storing entries) –Hash function (for mapping keys to buckets) Obj5 key=1 obj1 key=15 Obj4 key=2 Obj2 key= table Obj3 key=4 buckets hash value/index

11 Hash Table Design Bucket array is an array A of size N, where each cell of A is considered a “ bucket ” Hash function h maps each key k to an integer in the range [0, N -1] Store entry (k,v) in the bucket A[h(k)] Search for entry (k,v) in the bucket A[h(k)] Choose hash function such that –Can take arbitrary objects (keys) as input –Hash computation is fast –Key mappings evenly distributed across [0, N -1]

12 Example Hash Function h(k) = k mod N mod stands for modulo, the remainder of the division of two numbers. For example: –8 mod 5 = 3 –9 mod 5 = 4 –10 mod 5 = 0 –15 mod 5 = 0 Observe that collisions are possible: –two different keys hash to the same value

13 Example h(k) = k mod key value Insert (2,x) 2 x key value Insert (21,y) 2 x 21 y key value Insert (34,z) 2 x 21 y 34 z Insert (54,w) There is a collision at array entry #4 ???

14 Slide adapted from lecture by Andreas Veneris Dealing with Collisions Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry other key key data Insert CHAIN Insert

15 Slide adapted from lecture by Andreas Veneris Hashing with Chaining The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a LIFO list (chain or bucket) pointed by this entry: other key key data Insert CHAIN Insert

16 Dictionary Performance  What is the run time for insert/search/delete? -Insert: It takes O(1) time to compute the hash function and insert at head of linked list -Search: It is proportional to the length of the linked list -Delete: Same as search key value Insert (54,w) 2 x 21 y 54 w34 z CHAIN Insert (101,x) 21 y 2 x 101 x 54 w34 z

17 Load Factor Average length of the linked lists is a function of the load factor –Load factor = number of items in hash table / array size In Python, the implementation details are transparent to the programmer (load factor kept under 2/3) In Java, the programmer can set the initial table size (default=16) and/or the load factor (default = 0.75)

18 Slide adapted from lecture by Hector Garcia- Molina Rule of thumb: Try to keep space utilization (load factor) between 50% and 80% Load factor = _ # keys used___ total # slots in table If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

19 Slide adapted from lecture by Andreas Veneris Choosing a Hash Function The performance of the hash table depends on a having a hash function which evenly distributes the keys. Choosing a good hash function requires taking into account the kind of data that will be used. –The statistics of the key distribution needs to be accounted for. –E.g., Choosing the first letter of a last name will cause problems depending on the nationality of the population Most programming languages (including java) have hash functions built in.

20 Recursion An algorithmic technique in which a function, in order to accomplish a task, calls itself with some part of the task.

21 Recursion A method that invokes itself Examples: –GNU stands for: GNU’s Not Unix (part of its own definition!) –A dog is a Collie if its name is Lassie or its mother was a Collie. Start with a given dog. Is its name Lassie? If so, done. Else is its mother a Collie? –Is the mother named Lassie? If so, done. –Else is its mother a Collie? –… and so on.

22 Recursion vs. Loops We could use recursion as an alternative to a loop for our various summation problems. This does not reduce (or increase) the O(g(n)) of the problem.

23 Recursion Example Let’s illustrate with a mother-child relationship. Definition: A dog is a collie if its name is Lassie or if its mother is Lassie, or if its mother’s mother is Lassie, or … if any of its female ancestors is Lassie.

24 Recursion Example Definition: A dog is a collie if its name is Lassie or if its mother is Lassie, or if its mother’s mother is Lassie, or … if any of its female ancestors is Lassie.

25 Recursion Methods – Recursive methods need: A base case / terminating condition (or else it doesn’t halt) A body (containing the recursive call) Base cases Recursive step

26

27 Recursion & the Runtime Stack Each call of the recursive method is placed on the stack. When we finally reach the base case, all the frames are popped off of the stack. –Say Grover’s mother is Fifi Rover’s mother is Fifi too Fifi’s mother is Fido Fido’s mother is Lassie –Call the isCollie() method starting with Grover Push the frame associated with each dog’s method call on the stack When we reach Lassie, we return “true” This value gets passed as the result from the 4 th frame to the 3 rd frame The 3 rd frame passes this to the 2 nd frame The 2 nd frame passes this to the 1 st frame And “true” is the final result.

28 From Goodrich & Tamassia

29 Methods and the Operating System’s Runtime Stack Each time a method is called –A new copy of the method ’ s data is pushed on the stack This data includes local variables –This applies to all methods, not only recursive ones The “ return ” command does two things: –Pops the data for the method off the stack Local variables disappear –Passes the return value to the calling method If there is no return statement –The method is popped –Nothing is passed to the calling method

30 Summary: Recursion Recursion is a useful problem-solving technique –We will see it again when we study trees and sorting But it takes some getting used to. To help: –Remember that each method call gets put on the stack until the base case is found; then the calls are popped off the stack. –You HAVE TO understand basic programming concepts like Method calls Returning from methods Passing parameters