CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p 515-522 (Tables)] [Chapter 12 p 598-604 (Hashing)]

Slides:



Advertisements
Similar presentations
Space-for-Time Tradeoffs
Advertisements

CSCE 3400 Data Structures & Algorithm Analysis
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Data Structures Using C++ 2E
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
Hash Table March COP 3502, UCF.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Dr. Yingwu Zhu.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
LECTURE 34: MAPS & HASH CSC 212 – Data Structures.
Storage and Retrieval Structures by Ron Peterson.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Hashing, Hashing Tables Chapter 8. Class Hierarchy.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Copyright © Curt Hill Hashing A quick lookup strategy.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
1 i206: Lecture 12: Hash Tables (Dictionaries); Intro to Recursion Marti Hearst Spring 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Hashing CS2110 Spring 2018.
Hash Table.
Hashing CS2110.
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Presentation transcript:

CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]

CSC 143T 2 Looking Up Data  A common pattern in many programs is to look up data  Find student record, given ID#  Find phone #, given person’s  A CS example: a compiler’s “symbol table” look up identifier to find its type, location, etc.  Because it is so common, many data structures for it have been investigated  We could use arrays, linked lists, general trees, binary search trees, etc.  Could also step back and consider look-up as an abstract operation

CSC 143T 3 Table ADT  Characteristics of table ADT type  Set of key/value pairs  No duplicate keys  Operations on tables  retrieve value from key  insert value at key  delete key and associated value from table  Uses:  Phone book, class roster, book index, databases  Sometimes also called a dictionary or map

CSC 143T 4 Table Terminology  Key: Portion of pair used to look up data, like an index (aka domain value)  Value: Portion of pair that contains data (aka range value) aTable : “ K” “ E” “24601JVJ” 23,440 “994802WE” “ A” Key (employee ID)Value (salary) 27,640 15,203 45,210 28,776

CSC 143T 5 Example of Operations // phone book example: name is key, // phone# is value Table pb; pb.insert(“Sarah”, ); pb.insert(“Richard”, ); pb.insert(“Bart”, ); int bartsNumber = pb.retrieve(“Bart”); pb.remove(“Richard”);

CSC 143T 6 Structures for Efficient Retrieval  In many Table applications, retrieve is the most common operation  So retrieve needs to be efficient  Many different data structures are possible  Array (sorted or unsorted?), List, Binary tree, Binary search tree. But not… stack, queue  Footnote: Some languages (Perl, Javascript) contain a built-in “associative array” construct  With a sorted array and binary search, retrieve would be O(log N)  Same for binary search tree find

CSC 143T 7 Can we do better than O(log N)?  Answer: Yes … sort of, if we're lucky.  General idea: take the key of the data record you’re inserting, and use that number directly as the item number in a list (array).  Example:  Assume you want quick access to a table of your friends. All of them have unique social security numbers (in the range to ).  If you had an array with 1,000,000,000 elements, each friend could be instantly located in the array by their social security number.  What’s wrong with the above scheme?

CSC 143T 8 Hash Functions  Basic idea:  Don't use the data value directly.  Given an array of size B, use a hash function, h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array. 0 1 h(x) B-1 x h

CSC 143T 9 Hashing and Tables  Hashing gives us another implementation of Table ADT  Hash the key; this gives an index; use it to find the value stored in the table  If this scheme worked, it would be O(1)  Great improvement over Log N.  Main problems  Finding a good hash function  Collisions  Wasted space in the table

CSC 143T 10 An Apparent Sidetrack  Problem to solve: Given a list of n integers, determine if there is a pair of duplicate values

CSC 143T 11 Element Uniqueness: Two Solutions 1. Nested loop: for each element in the array, scan the rest of the array to see if there is a duplicate.  O(n 2 ) 2. Sort the data. Then scan array (once) for duplicates.  O(nlog n) time to sort, O(n) to scan.  Anything simpler??  Any solution that looks like hashing?

CSC 143T 12  Step 1: Assign to buckets, based on value mod Element Uniqueness (2)

CSC 143T 13  Step 2: Look inside each bucket for duplicates Element Uniqueness (3)

CSC 143T 14 Hashing Functions  The hash function we choose depends on the type of the key field (the key we use to do our lookup).  Finding a good one can be hard  Example:  Student Ids (integers) h(idNumber) = idNumber % B eg. h(678921) = % 100 = 21  Names (char strings) h(name) = (sum over the ascii values) % B eg. h(“Bill”) = ( ) % 100 = 87

CSC 143T 15 Collisions  Collisions occur when multiple items are mapped to same cell h(idNumber) = idNumber % B h(678921) = 21 h(354521) = 21  Issues  number of buckets vs number of data items  Choice of hash function  With a bad choice of hash function we can have lots of collisions  Even with a good choice of hash functions there may be some collisions

CSC 143T 16 Collision Resolution  One strategy: Bucket hashing (aka open hashing)  Each cell (bucket) in the array contains a linked list of items: 0 1 h(x) B-1 AJK Q XB x h Many other solutions have been studied

CSC 143T 17 Analysis of hash table ops  Insert is easy to analyze:  It is just the cost of calculating the hash value O(1), plus the cost of inserting into the front of a linked list O(1)  Retrieve and Delete are harder. To do the analysis, we need to know:  The number of elements in the table (N)  The number of buckets (B)  The quality of the hash function

CSC 143T 18 Hashing Analysis (2)  We’ll assume that our hash function distributes items evenly through the table, so each bucket contains N/B items (N/B is called the load factor)  On average, doing a lookup or a deletion is O(N/B) (Which is O(1), if N/B is constant)  Using a good hash function and keeping B large with respect to N, we can guarantee constant time insertion, deletion, and lookup  Note that this means growing (rehashing) the hash table as more items are inserted.

CSC 143T 19 Open vs. Closed Hashing  Open hashing uses linked lists for the buckets  What is closed hashing??  All data is stored in the array  If there is a collision, the next available cell is used  Avoids overhead of linked lists and works well in practice  Example: hash the following into a table of size 10 17, 23, 47, 52, 71, 86, 63,

CSC 143T 20 Dynamic Hashing  Another implementation concept  As number of stored records increases, dynamically increase the number of buckets  Brute force: make a new (larger) array and copy (rehash) all the data to it  More subtle implementations are also possible

CSC 143T 21 Hashing and Files  We’ve spoken of the hashed data as being stored in an array (in memory)  Hashing is also very appropriate for disk files  Efficient look-up techniques for disk data are essential  Disks are thousands of times slower than memory  Even a LogN look-up algorithm is too slow for a database application!

CSC 143T 22 Drawbacks to Hashing  Finding a good hash function  Small risk of bad behavior  Dealing with collisions  Simplest method to use linked list for buckets  Wasted space in the array  Not a big deal if memory is cheap

CSC 143T 23 Summary  Hash tables are specialized for dictionary operations: Insert, Delete, Lookup  Principle: Turn the key field of the record into a number, which we use as an index for locating the item in an array.  O(1) in the ideal case; less in practice  Problems: collisions, wasted space  Implementations: open hashing, closed hashing, dynamic hashing  Highly suitable for database files, too