CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun.

Slides:



Advertisements
Similar presentations
Hash Tables and Sets Lecture 3. Sets A set is simply a collection of elements Unlike lists, elements are not ordered Very abstract, general concept with.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Maps, Dictionaries, Hashtables
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 7 Heaps & Priority Queues Reminder Examples Hash Tables Reminder Examples.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
COSC 2007 Data Structures II
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 10.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
LECTURE 34: MAPS & HASH CSC 212 – Data Structures.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS 61B Data Structures and Programming Methodology Aug 7, 2008 David Sun.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hashing CSE 2011 Winter July 2018.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Advanced Associative Structures
Binary Search Trees 7/16/2009.
Advanced Implementation of Tables
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun

Deletion Delete a node given a key, if a node exists. 1.Find a node with key k using the same algorithm as find(). 2.Return null if k is not in the tree; 3.Otherwise, let n be the first node with key k. If n has no children, detach it from its parent and throw it away.

Deletion 4.If n has one child, move n‘s child up to take n's place. n's parent becomes the parent of n's child, and n's child becomes the child of n's parent. Dispose of n.

Deletion 5.If n has two children: – Let x be the node in n's right subtree with the smallest key. Remove x; since x has the minimum key in the subtree, x has no left child and is easily removed. – Replace n's entry with x's entry. x has the closest key to k that isn't smaller than k, so thebinary search tree invariant still holds.

Running Times In a perfectly (full) balanced binary tree with height/depth h, the number of nodes n = 2 (h+1) - 1. Therefore, no node has depth greater than log 2 n. The running times of find(), insert(), and remove() are all proportional to the depth of the last node encountered, so they all run in O(log n) worst-case time on a perfectly balanced tree.

Running Times What’s the running time for this binary tree? The running times of find(), insert(), and remove() are all proportional to the depth of the last node encountered, but d = n – 1, so they all run in O(n) worst-case time.

Running Times The Middle ground: reasonably well-balanced binary trees – Search tree operations will run in O(log n) time. You may need to resort to experiment to determine whether any particular application will use binary search trees in a way that tends to generate balanced trees or not.

Running Times Binary search trees offer O(log n) performance on insertions of randomly chosen or randomly ordered keys (with high probability). Technically, all operations on binary search trees have Theta(n) worst-case running time. Algorithms exists for keeping search trees balanced. e.g.,2-3-4 trees.

“Holy Grail” Given a set of objects and an object x, determine immediately (constant time) if x is in the set. What’s a situation where you can determine set membership in constant time? – The set contains integers with bounded values, i.e. for every x in the set, L < x < R, and L and R are known.

General Pattern What’ve seen in a variety of data structures is the following behavior: The search may be slow if you are looking at a linear data structure and faster in the case of a binary search tree, where each step rules out half of the remaining candidates. X Search Yes or No

Array-like Search If we know where the item should be located in an array, given its index, search can be implemented in constant time. Key is to figure out how to do the small amount of computation. X Lookup Set[k] Yes or No Small amount of computation integer k

Dictionaries Problem: – You have a large set of pairs, e.g., pair. – You want to be able to look up the definition of any word very quickly. – How can we do this efficiently?

Naïve Data Structure Consider a limited version of the previous problem: – You are building a dictionary for only the 2-letter words in the English language. – How many 2-letter combinations are there? – 26 * 26 = 676 possible two-letter words. Now we can: – Create an array with 676 references, initially all null. – Define a function hashCode() that maps each 2-letter word to a unique integer between 0 and 675. – This unique integer is an index into the array and the element at the index contains the definition of the word. – We can retrieve a definition in constant time, if it exists.

public class WordDictionary { private Definition[] defTable = new Definition[Word.WORDS]; public void insert(Word w, Definition d) { defTable[w.hashCode()] = d; } Definition find(Word w) { return defTable[w.hashCode()]; } public class Word { public static final int LETTERS = 26, WORDS = LETTERS * LETTERS; public String word; //this function maps a 2 letter word to a number between 0 and 267 public int hashCode() { return LETTERS * (word.charAt(0) - 'a') + (word.charAt(1) - 'a'); } Note: Java converts char to int automatically you can use chars in arithmetic operations.

Dictionaries What if we want to store every English word, not just the two-letter words? – The table "defTable" must be long enough to accommodate pneumonoultramicroscopicsilicovolcanoconiosis, 45 letters long (according to the Oxford Dictionary "a facticious word alleged to mean 'a lung disease caused by the inhalation of very fine silica dust causing inflammation in the lungs. Occurring chiefly as an instance of a very long word.) – Unfortunately, declaring an array of length is out of the question. – English has fewer than one million words, so we should be able to do better.

Hash Table Suppose n is the number of keys (words) whose definitions we want to store, and suppose we use a table of N buckets, where N is a bit larger than n, but much smaller than the number of possible keys. A hash table is an array of size N that maps a huge set of possible keys into its N elements, called buckets, by applying a compression function to each hash code. The obvious compression function is: h(hashCode) = hashCode mod N (everything is in 0 to N-1) hashCode(WordA) = 1000 h(hashCode(WordA)) = 1000 mod 6 = 4 DefA

Another Example N = 200 items. Keys are longs, evenly spread over the range − 1. hashCode(K) = K h(hashCode(K)) = hashCode(K) mod N , 433, and go into different buckets, But 10, , and 210 all go into the same bucket.

Collision Several keys are hashed to the same bucket in the table if : h(hashCode(K1)) = h(hashCode(K2)). How to deal with collisions? How to design hash code to reduce the likelihood of collisions? hashCode(WordB) = 742 h(hashCode(WordB)) = 742 mod 6 = 4 DefADefB

Chaining Idea: – Each bucket stores a chain (or linked list) of entries with the same hashcode. – For a new item, find its bucket and append the item to the end of the list. For this to work well, the hash code must avoid hashing keys to the same bucket. Example: #buckets N = 100

Hash Table Operations Hash tables usually support at least three operations. – public Entry insert(key, value) 1.Compute the key's hash code and compress it to determine the entry's bucket. 2.Insert the entry (key and value together) into that bucket's list. – public Entry find(key) 1.Hash the key to determine its bucket. 2.Search the list for an entry with the given key. If found, return the entry; otherwise, return null. – public Entry remove(key) 1.Hash the key to determine its bucket. 2.Search the list for an entry with the given key. Remove it from the list if found. Return the entry or null.

Open Addressing Idea: – Put one data item in each bucket. – When there is a collision, just use another. Various ways to do this: – Linear probes: If there is a collision at h(K), try h(K)+m, h(K)+2m, etc. (wrap around at end). – Quadratic probes: h(K) + m, h(K) + m 2,... – Double hashing: h(K) + h’(K), h(K) + 2h’(K), etc. Example: – hashCode(K) = K, h(hashCode(K)) = K mod N, with N = 10, linear probes with m = 1. – Add 1, 2, 11, 3, 102, 9, 18, 108, 309 to empty table. – Things can get slow, even when table is far from full.

Load Factors The load factor of a hash table is n/N, – where n is the number of keys in the table and – N is the number of buckets – n/N is the length of the bucket’s list if all entries are truly uniformly distributed. The hash code and compression function are "good,“ if the load factor stays with in a small constant (< 1) the linked lists are all short, and each operation takes O(1) time. However, if the load factor grows too large, performance is dominated by linked list operations and degenerates to O(n) time.

Hash Code and Compression Function How do we design a “good” hash code and compression function? – Unfortunately it’s a bit of a black art. – Ideally, hash code and compression function maps each key to a uniformly distributed random bucket from zero to N-1 for any input. – Note: random does not mean that the hash code gives a random value each time. Hash code on the same object should return the same value each time!

A Bad Compression Function Consider integers: – Try hashCode(i) = i. – Then h(hashCode) = hashCode mod N where N is – What’s wrong with this? Consider an application that only generates integer divisible by 4: – Any integer divisible by 4 mod is divisible by 4. – Three quarters of the buckets are wasted!

Reading Objects, Abstraction, Data Structures and Design using Java 5.0 – Chapter 8 pp pp