CompSci 100e 6.1 Hashing: Log (10 100 ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 9 Hash Tables (continued) Reminder Examples.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
A Computer Science Tapestry 1 Recursion (Tapestry 10.1, 10.3) l Recursion is an indispensable technique in a programming language ä Allows many complex.
CompSci 100E 10.1 Solving Problems Recursively  Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CPS 100, Spring From Recursion to Self Reference public int calc(int n){ return n*calc(n-1); } l What is the Internet?  A network of networks.
CompSci 100 Prog Design and Analysis II Sept 16, 2010 Prof. Rodger CompSci 100, Fall
CompSci 100E 12.1 Solving Problems Recursively  Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 9: Searching Data Structures.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CPS Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
Compsci 100, Fall From Recursion to Self Reference public int calc(int n){ return n*calc(n-1); } l What is the Internet?  A network of networks.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Hashing as a Dictionary Implementation Chapter 19.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
CPS 100, Spring Tools: Solve Computational Problems l Algorithmic techniques  Brute-force/exhaustive, greedy algorithms, dynamic programming,
CPS 100, Fall From Recursion to Self Reference public int calc(int n){ return n*calc(n-1); } l What is the Internet?  A network of networks.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
A Computer Science Tapestry 10.1 Solving Problems Recursively l Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems.
CompSci Review of Recursion with Big-Oh  Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved.
CompSci Analyzing Algorithms  Consider three solutions to SortByFreqs, also code used in Anagram assignment  Sort, then scan looking for changes.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
CPS Today’s topics Programming Recursion Copyrights, patents, and digital media Reading Great Ideas, p Brookshear, Section Online.
CompSci 100e 7.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.
Recursion Fundamentals of Recursion l Base case (aka exit case)  Simple case that can be solved with no further computation  Does not make a recursive.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
CPS Solving Problems Recursively l Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved simply.
A Computer Science Tapestry 10.1 Solving Problems Recursively l Recursion is an indispensable tool in a programmer’s toolkit ä Allows many complex problems.
CPS Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CPS Today’s topics Programming Recursion Invariants Reading Great Ideas, p Brookshear, Section Upcoming Copyrights, patents, and.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Searching, Maps,Tries (hashing)
Plan for the Week Review for Midterm Source code provided
Solving Problems Recursively
Hash table CSC317 We have elements with key and satellite data
Announcements Final Exam on August 17th Wednesday at 16:00.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Announcements Final Exam on August 19th Saturday at 16:00.
Announcements Final Exam on December 25th Monday at 16:00.
Announcements HW3 grades will be announced this week
Solving Problems Recursively
Recursive example 1 double power(double x, int n) // post: returns x^n
Solving Problems Recursively
Presentation transcript:

CompSci 100e 6.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a billion elements?  What if one billion web-pages indexed? l Hashing is a search method: average case O(1) search  Worst case is very bad, but in practice hashing is good  Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book l A hash function generates the number from the key  Goal: Efficient to calculate  Goal: Distributes keys evenly in hash table

CompSci 100e 6.2 Hashing details l There will be collisions, two keys will hash to the same value  We must handle collisions, still have efficient search  What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? l Several ways to handle collisions, in general array/vector used  Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, …, wrap at end Clustering problems, deletion problems, growing problems  Quadratic probing Hash to index h, try h+1 2, h+2 2, h+3 2, …, wrap at end Fewer clustering problems  Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, … n-1

CompSci 100e 6.3 Chaining with hashing l With n buckets each bucket stores linked list  Compute hash value h, look up key in linked list table[h]  Hopefully linked lists are short, searching is fast  Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty  Potential problems? l Hash table details  Size of hash table should be a prime number  Keep load factor small: number of keys/size of table  On average, with reasonable load factor, search is O(1)  What if load factor gets too high? Rehash or other method

CompSci 100e 6.4 Hashing problems l Linear probing, hash(x) = x, (mod tablesize)  Insert 24, 12, 45, 14, delete 24, insert 23 (where?) l Same numbers, use quadratic probing (clustering better?) l What about chaining, what happens?

CompSci 100e 6.5 What about hash functions l Hashing often done on strings, consider two alternatives public static int hash(String s) { int k, total = 0; for(k=0; k < s.length(); k++){ total += s.charAt(k); } return total; } Consider total += (k+1)*s.charAt(k), why might this be better?  Other functions used, always mod result by table size l What about hashing other objects?  Need conversion of key to index, not always simple  Ever object has method hashCode()!

CompSci 100e 6.6 Tools: Solving Computational Problems l Algorithmic techniques and paradigms  Brute-force/exhaustive, greedy algorithms, dynamic programming, divide-and-conquer, …  Transcend a particular language  Designing algorithms, may change when turned into code l Programming techniques and paradigms  Recursion, memo-izing, compute-once/lookup, tables, …  Transcend a particular language  Help in making code work Avoid software problems (propagating changes, etc.) Avoid performance problems

CompSci 100e 6.7 Quota Exceeded l You’re running out of disk space  Buy more  Compress files  Delete files l How do you find your “big” files?  What’s big?  How do you do this?

CompSci 100e 6.8 Recursive structure matches code public static long THRESHOLD = L; // one million bytes public static void findBig(File dir, String tab) { File[] dirContents = dir.listFiles(); System.out.println(tab+"**:"+dir.getPath()); for(File f : dirContents){ if (f.isDirectory()) { findBig(f,tab+"\t"); } else { if (f.length() > THRESHOLD){ System.out.printf("%s%s%8d\n",tab,f.getName(), f.length()); } Does findBig call itself?

CompSci 100e 6.9 Solving Problems Recursively l Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved simply  Elegance and understanding in code often leads to better programs: easier to modify, extend, verify (and sometimes more efficient!!)  Sometimes recursion isn’t appropriate, when it’s bad it can be very bad---every tool requires knowledge and experience in how to use it l The basic idea is to get help solving a problem from coworkers (clones) who work and act like you do  Ask clone to solve a simpler but similar problem  Use clone’s result to put together your answer l Need both concepts: call on the clone and use the result

CompSci 100e 6.10 Print words read, but print backwards l Could store all the words and print in reverse order, but …  Probably the best approach, recursion works too public void printReversed(Scanner s){ if (s.hasNext()){ // reading succeeded? String word = s.next(); // store word printReversed(s); // print rest System.out.println(word); // print the word } The function printReversed reads a word, prints the word only after the clones finish printing in reverse order  Each clone has own version of the code, own word variable  Who keeps track of the clones?  How many words are created when reading N words? Can we do better?

CompSci 100e 6.11 Exponentiation l Computing x n means multiplying n numbers (or does it?)  What’s the simplest value of n when computing x n ?  If you want to multiply only once, what can you ask a clone? public static double power(double x, int n){ if (n == 0){ return 1.0; } return x * power(x, n-1); } l Number of multiplications?  Note base case: no recursion, no clones  Note recursive call: moves toward base case (unless …)

CompSci 100e 6.12 Faster exponentiation l How many recursive calls are made to computer ?  How many multiplies on each call? Is this better? public static double power(double x, int n){ if (n == 0) { return 1.0; } double semi = power(x, n/2); if (n % 2 == 0) { return semi*semi; } return x * semi * semi; } l What about an iterative version of this function?

CompSci 100e 6.13 Back to Recursion l Recursive functions have two key attributes  There is a base case, sometimes called the exit case, which does not make a recursive call See print reversed, exponentiation  All other cases make a recursive call, with some parameter or other measure that decreases or moves towards the base case Ensure that sequence of calls eventually reaches the base case “Measure” can be tricky, but usually it’s straightforward l Example: finding large files in a directory (on a hard disk)  Why is this inherently recursive?  How is this different from exponentation?

CompSci 100e 6.14 Recognizing recursion: public static void change(String[] a, int first, int last){ if (first < last) { String temp = a[first]; // swap a[first], a[last] a[first] = a[last]; a[last] = temp; change(a, first+1, last-1); } // original call (why?): change(a, 0, a.length-1); l What is base case? (no recursive calls) l What happens before recursive call made? l How is recursive call closer to the base case? Recursive methods sometimes use extra parameters; helper methods set this up

CompSci 100e 6.15 The Power of Recursion: Brute force l Consider the TypingJob APT problemn: What is minimum number of minutes needed to type n term papers given page counts and three typists typing one page/minute? (assign papers to typists to minimize minutes to completion)  Example: {3, 3, 3,5,9,10,10} as page counts l How can we solve this in general? Suppose we're told that there are no more than 10 papers on a given day.  How does this constraint help us?  What is complexity of using brute-force?

CompSci 100e 6.16 Recasting the problem l Instead of writing this function, write another and call it min minutes to type papers in pages int bestTime(int[] pages) { } l What cases do we consider in function below? int best(int[] pages, int index, int t1, int t2, int t3) // returns min minutes to type papers in pages // starting with index-th paper and given // minutes assigned to typists, t1, t2, t3 { } return best(pages,0,0,0,0);

CompSci 100e 6.17 Recursive example 1 double power(double x, int n) // post: returns x^n { if (n == 0) { return 1.0; } return x * power(x, n-1); } x: n: Return value:

CompSci 100e 6.18 Recursive example 2 double fasterPower(double x, int n) // post: returns x^n { if (n == 0) { return 1.0; } double semi = fasterPower(x, n/2); if (n % 2 == 0) { return semi*semi; } return x * semi * semi; } x: n: Return value:

CompSci 100e 6.19 Recursive example 3 String mystery(int n) { if (n < 2) { return "" + n; } else { return mystery(n / 2) + (n % 2); } n: Return value: