MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming.

Slides:



Advertisements
Similar presentations
Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Advertisements

Hash Tables.
Hashing / Hash tables Chapter 20 CSCI 3333 Data Structures.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
MA/CSSE 473 Day 28 Dynamic Programming Binomial Coefficients Warshall's algorithm Student questions?
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Hashing Hashing is another method for sorting and searching data.
Hash Tables - Motivation
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Hashing Alexandra Stefan.
Hashing Exercises.
Advanced Associative Structures
Resolving collisions: Open addressing
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
DATA STRUCTURES-COLLISION TECHNIQUES
Collision Resolution: Open Addressing Extendible Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming

MA/CSSE 473 Day 28 HW 11 is a good one to try to earn an extra late day. It is shorter than most assignments. Take-home exam available by Oct 29 (Friday) at 9:55 AM, due Nov 1 (Monday) at 8 AM. Student Questions Hashing summary B-Trees – a quick look Dynamic Programming Convex Hull Late Day until Saturday at 8 AM Convo schedule Monday: Section 1: 9:35 AM Section 2: 10:20 AM

Take-Home Exam Available no later than 9:55 AM on Friday October 29. Due 8 AM Monday, Nov 2 Two parts, each with a time limit of 3-4 hours –exact limit will be set after I finish writing questions. –Measured from time of ANGEL view of problems to submission back to ANGEL drop box. Covers through HW 12 and Section 8.2. A small number of problems (3-5 in each part) For most problems, partial credit for good ideas, even if you don't entirely get it. A blackout on communicating with other students about this course during the entire period from exam availability to exam due time.

Some Hashing Details The next slides are from CSSE 230. They are here in case you didn't "get it" the first time. We will not go over all of them in detail in class. If you don't understand the effect of clustering, you might find the animation that is linked from the slides especially helpful.

When an item hashes to a table location occupied by a non-equal item, simply use the next available space. Try H+1, H+2, H+3, … –With wraparound at the end of the array Problem: Clustering (picture on next slide) Collision Resolution: Linear Probing

Weiss Figure 20.4 Linear probing hash table after each insertion Data Structures & Problem Solving using JAVA/2E Mark Allen Weiss © 2002 Addison Wesley Animation: ac.nz/software/AlgAnim /hash_tables.html ac.nz/software/AlgAnim /hash_tables.html Q1a

Dependent on the load factor,, which is the ratio of the number of items in the table to the size of the table. Thus 0   1. For a given, what is the expected number of probes before an empty location is found? For simplicity, assume that all locations are equally likely to be occupied, and equally likely to be the next one we look at. Then the probability that a given cell is empty is 1 -, and thus the expected number of probes before finding an empty cell is (write it as a summation). | For example, if is 0.75, the expected value is 4. Analysis of linear probing

The "equally likely" probability is not realistic, because of clustering Large blocks of consecutive occupied cells are formed. Any attempt to place a new item in any of those cells results in extending the cluster by at least one item Thus items collide not only because of identical hash values, but also because of hash values that happen to put them into the cluster Average number of probes when is large: –0.5 [ 1 + 1/(1- ) 2 ]. For a proof, see Knuth, The Art of Computer Programming, Vol 3: Searching Sorting, 2 nd ed, Addision-Wesley, Reading, MA, –What are the values for = 0, 0.5, 0.75, 0.9? – When approaches 1, this gets bad! –But if is close to zero, then the average is near 1.0 Analysis of linear probing (continued)

Easy to implement Simple code has fast run time per probe Works well when load factor is low –It could be more efficient just to rehash using a bigger table once it starts to fill. –What is often done in practice: rehash to an array that is double in size once the load factor reaches 0.5 What about other fast, easy-to-implement strategies? So why consider linear probing?

With linear probing, if there is a collision at H, we try H, H+1, H+2, H+3,... until we find an empty spot. –Causes (primary) clustering With quadratic probing, we try H, H+1 2. H+2 2, H+3 2,... –Eliminates primary clustering, but can cause secondary clustering. Quadratic probing Q1b

Choose a prime number for the array size –If the array used for the table is not more than half full, finding a place to do the insertion is guaranteed, and no cell is probed twice –Suppose the array size is p, a prime number greater than 3 –Show by contradiction that if i and j are ≤ ⌊ p/2 ⌋, and i≠j, then H + i 2 ≢ H + j 2 (mod p). Use an algebraic trick to calculate next index –Replaces mod and general multiplication with subtraction and a bit shift –Difference between successive probes: H + (i+1) 2 = H + i 2 + (2i+1 ) [can use bit-shift for the multiplication]. nextProbe = nextProbe + (2i+1); if (nextProbe >= P) nextProbe -= P; Hints for quadratic probing Q2

No one has been able to analyze it Experimental data shows that it works well –Provided that the array size is prime, and is the table is less than half full Quadratic probing analysis

Double hashing –A second hash function is used to calculate an offset d to use in probing. Try locations h+d, h+2d, h+3d, etc Separate chaining –Rather than an array of items, we use an array of linked lists. When multiple items hash to the same location, we add them to the list for that location –Picture on next slide –No clustering effect But we use space for the links(that space could have been used to make the array larger). If many items have the same hash code, the chains can become long. Other approaches to collision resolution Q1c

Hashing with Chaining

Analysis: Hashing with Chaining With chaining, the load factor may be > 1. Assume a hash function that distributes keys evenly in the table. If there are n keys in the table (backed by an array of size m), the average chain should be elements long So it takes constant time to compute the hash function plus /2 to search within the chain. If ≈ 1, this is VERY fast But there is the extra space for the pointers, which could have been used to make the table larger if open addressing was used

B-trees We will do a quick overview here. For the whole scoop on B-trees (Actually B+ trees), take CSSE 433, Advanced Databases. Nodes can contain multiple keys and pointers to other to subtrees

B-tree nodes Each node can represent a block of disk storage; pointers are disk addresses This way, when we look up a node (requiring a disk access), we can get a lot more information than if we used a binary tree In an n-node of a B-tree, there are n pointers to subtrees, and thus n-1 keys All keys in T i are ≥ K i and < K i+1 K i is the smallest key that appears in T i

B-tree nodes (tree of order m) All nodes have at most m-1 keys All keys and associated data are stored in special leaf nodes (that thus need no child pointers) The other (parent) nodes are index nodes All index nodes except the root have between  m/2  and m children root has between 2 and m children All leaves are at the same level The space-time tradeoff is because of duplicating some keys at multiple levels of the tree. Example on next slide

Example B-tree(order 4)

B-tree Animation 600http://slady.net/java/bt/view.php?w=800&h= 600

Search for an item Within each parent or leaf node, the items are sorted, so we can use binary search (log m), which is a constant with respect to n, the number of items in the table Thus the search time is proportional to the height of the tree Max height is approximately log  m/2  n Exercise for you: Read and understand the straightforward analysis on pages Insert and delete are also proportional to height of the tree