CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing as a Dictionary Implementation
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
1/51 Dictionaries, Tables Hashing TCSS 342 2/51 The Dictionary ADT a dictionary (table) is an abstract model of a database like a priority queue, a dictionary.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Data Structures Hashing Uri Zwick January 2014.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hash Table March COP 3502, UCF.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
1 Resolving Collision Although collisions should be avoided as much as possible, they are inevitable Need a strategy for resolving collisions. We look.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Hashing (part 2) CSE 2011 Winter March 2018.
CSC 172 DATA STRUCTURES.
Hashing CSE 2011 Winter July 2018.
Resolving collisions: Open addressing
CSCE 3110 Data Structures & Algorithm Analysis
Algorithms and Data Structures Lecture VI
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Presentation transcript:

CSC 172 DATA STRUCTURES

SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The main event.

Representation of Sets List Simple O(n) dictionary operations Binary Search Trees O(log n) average time Range queries, sorting Characteristic Vector O(1) dictionary ops, but limited to small sets Hash Table O(1) average for dictionary ops Tricky to expand, no range queries

Characteristic Vectors Boolean Strings whose position corresponds to the members of some fixed “universal” set A “1” in a location means that the element is in the set A “0” means that it is not

MUSIC THEORY  A chord is a set of notes played at the same time.  Represented by a 12 bit vector called a “pitch class”  {B,A#,A,G#,G,F#,F,E,D#,D,C#,C}  represents C major  represents C minor  Rotation is “transposition”  Bit reversal is “inversion”

UNIX file privileges {user, group, others} x {read, write, execute} 9 possible privileges Type “ls –l” on UNIX total 142 -rw-rw-r-- 1 pawlicki none 76 Jun PKG416.desc -rw-rw-r-- 1 pawlicki none Jun PKG416.pdf -rw-rw-r-- 1 pawlicki none 1849 Jun let.1 -rw-rw-r-- 1 pawlicki none 0 Apr 2 13:03 out -rw-rw-r-- 1 pawlicki none Jun stapp.uu

UNIX files The order is rwx for each of user (owner), group, and others So, a protection mode of means that the owner may read and write (but not execute), the group can read only and others cannot even read

GAMBLING  A deck has 52 cards  {2C,2H,2S,2D,3C,.... KD,AC,AH,AS,AD}  Represent a “hand” as a vector of 52 bits  is a pair of aces  In “Texas Hold'em” everyone gets two “hole” cards and 5 “board” cards  We can use bitwise & to find “hands”

CV advantages If the universal set is small, sets can be represented by bits packed 32 to a word Insert, delete, and lookup are O(1) on the proper bit Union, intersection, difference are implemented on a word-by-word basis O(m) where m is the size of the set Small constant factor (1/32) Fast, machine operations

Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket contains a list of set elements B = number of buckets A hash function that takes potential set elements and quickly produces a “random” integer [0..B- 1]

Example If the set elements are integers then the simplest/best hash function is usually h(x) = x % B or h(x) = x - (x%B), (never 0). Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30} They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0 Note: If B = 7 0,4,1,3,6,6,1,2

Pitfalls of Hash Function Selection We want to get a uniform distribution of elements into buckets Beware of data patterns that cause non-uniform distribution

Example If integers were all even, then B = 6 would cause only buckets 0,2, and 4 to fill If we hashed words in the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7

Dictionary Operations Lookup Go to head of bucket h(x) Search for bucket list. If x is in the bucket Insertion: append if not found Delete – list deletion from bucket list

Analysis If we pick B to be new N, the number of elements in the set, then the average list is O(1) long Thus, dictionary ops take O(1) time Worst case: all elements go into one bucket O(n)

Managing Hash Table Size If n gets as high as 2B, create a new hash table with 2B buckets “Rehash” every element into the new table O(n) time total There were at least n inserts since the last “rehash” All these inserts took time O(n) Thus, we “amortize” the cost of rehashing over the inserts since the last rehash Constant factor, at worst So, even with rehashing we get O(1) time ops

Collisions A collision occurs when two values in the set hash to the same value There are several ways to deal with this Chaining (using a linked list or some secondary structure) Open Addressing Double hashing Linear Probing

Chaining   64  83  76  94  53  30 Very efficient Time Wise Other approaches Use less space

Open Addressing When a collision occurs, if the table is not full find an available space Linear Probing Quadratic Probing Double Hashing

Linear Probing If the current location is occupied, try the next table location LinearProbingInsert(K) { if (table is full) error; probe = h(K); while (table[probe] is occupied) probe = ++probe % M; table[probe] = K; } Walk along table until an empty spot is found Uses less memory than chaining (no links) Takes more time than chaining (long walks) Deleting is a pain (mark a slot as having been deleted)

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5,

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5, 8

Linear Probing h(K) = K % Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : 5, 2, 9, 7, 6, 5, 8 73

Double Hashing If the current location is occupied, try another table location Use two hash functions If M is prime, eventually will examine every location DoubleHashInsert(K) { if (table is full) error; probe = h1(K); offset = h2(K); while (table[probe] is occupied) probe = (probe+offset) % M; table[probe] = K; } Many of the same (dis)advantages as linear probing Distributes keys more evenly than linear probing

Quadratic Probing  Don't step by 1 each time. Add i 2 to the h(x) hashed location (mod B of course) for i = 1,2,...

Double Hashing h1(K) = K % 13 h1(K) = 8 - K % Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7

Double Hashing h1(K) = K % 13 h1(K) = 8 - K % Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 31

Double Hashing h1(K) = K % 13 h1(K) = 8 - K % Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1,

Theoretical Results Double Hashing Linear Probing Chaining FoundNot Found

Expected Probes Linear Probing Double Hashing Chaining