Hash TablesCS-2301, B-Term 20091 Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming.

Slides:



Advertisements
Similar presentations
Linked Lists in C and C++ CS-2303, C-Term Linked Lists in C and C++ CS-2303 System Programming Concepts (Slides include materials from The C Programming.
Advertisements

Hash Tables and Constant Access Time CS-2303, C-Term Hash Tables and Constant Access Time CS-2303 System Programming Concepts (Slides include materials.
Hashing.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Binary TreesCS-2303, C-Term Binary Trees (and Big “O” notation) CS-2303 System Programming Concepts (Slides include materials from The C Programming.
Linked Lists in C and C++ CS-2303, C-Term Linked Lists in C and C++ CS-2303 System Programming Concepts (Slides include materials from The C Programming.
More on Dynamic Memory Allocation Seokhee Jeon Department of Computer Engineering Kyung Hee University 1 Illustrations, examples, and text in the lecture.
Event-drive SimulationCS-2303, C-Term Project #3 – Event-driven Simulation CS-2303 System Programming Concepts (Slides include materials from The.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Hashing Techniques.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Lists and TreesCS-2301 D-term Data Structures — Lists and Trees CS-2301 System Programming D-term 2009 (Slides include materials from The C Programming.
Lists and Trees (continued) CS-2301, B-Term Lists and Trees (continued) CS-2301, System Programming for Non-Majors (Slides include materials from.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
More on Data Structures in C CS-2301 B-term More on Lists and Trees Introduction to Hash Tables CS-2301, System Programming for Non-majors (Slides.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Review of Exam #2CS-2301, B-Term Review of Exam #2 CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming Language,
Loose endsCS-2301, B-Term “Loose Ends” CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming Language, 2 nd.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
More Miscellaneous Topics CS-2301 B-term More Miscellaneous Topics CS-2301, System Programming for Non-majors (Slides include materials from The.
More on Data Structures in C CS-2301 D-term More on Data Structures in C CS-2301 System Programming D-term 2009 (Slides include materials from The.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Structures and UnionsCS-2301 B-term Structures and Unions CS-2301, System Programming for Non-majors (Slides include materials from The C Programming.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1 Hash Tables Professor Jennifer Rexford COS 217.
Data Structures, Lists, and Trees CS-2301 B-term Data Structures — Lists and Trees CS-2301, System Programming for Non-majors (Slides include materials.
Data Structures — Lists and Trees CS-2301, B-Term Data Structures — Lists and Trees CS-2301, System Programming for Non-Majors (Slides include materials.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
1 Data Structures Lists and Trees. 2 Real-Life Computational Problems All about organizing data! –What shape the data should have to solve your problem.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
File Processing - Indexing MVNC1 Indexing Jim Skon.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CSC 211 Data Structures Lecture 13
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing as a Dictionary Implementation Chapter 19.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
Free Ebooks Download Mba Ebooks By Edhole Mba ebooks Free ebooks download
1 Introduction to Hashing - Hash Functions Sections 5.1 and 5.2.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Hash Tables in C James Goerke.
Hash Table.
Hash Tables: A basic O(1)verview
Linked Lists in C and C++
Binary Trees (and Big “O” notation)
CS202 - Fundamental Structures of Computer Science II
Presentation transcript:

Hash TablesCS-2301, B-Term Hash Tables and Constant Access Time CS-2301, System Programming for Non-Majors (Slides include materials from The C Programming Language, 2 nd edition, by Kernighan and Ritchie and from C: How to Program, 5 th and 6 th editions, by Deitel and Deitel)

Hash TablesCS-2301, B-Term New Challenge What if we require a data structure that has to be accessed by value in constant time? I.e., O(log n) is not good enough! Need to be able to add or delete items Total number of items unknown But an approximate maximum might be known

Hash TablesCS-2301, B-Term Examples Anti-virus scanner Symbol table of compiler Virtual memory tables in operating system Bank or credit card account for a person

Hash TablesCS-2301, B-Term Example – Validate a Credit Card 16-digit credit card numbers possible card numbers Sparsely populated space E.g., 10 8 MasterCard holders, similar for Visa Not “random” enough for a binary tree Too many single branches  really deep searches Need to respond to customer in 1-2 seconds 1000s or tens of 1000s of customers per second! Same is true for ATM card numbers Bank account numbers Etc.

Hash TablesCS-2301, B-Term Example — Anti-Virus Scanner Look at each sequence of bytes in a file See if it matches against library of virus patterns How many possible patterns? If so, flag it as a possible problem Tens of Thousands!

Hash TablesCS-2301, B-Term Anti-Virus Scanner (continued) Time to scan a file? O(length)  O(# of patterns) Can we do better? Store patterns in a tree O(length)  O(log (# of patterns)) Can we do even better? Yes — a Hash Table. Today’s topic.

Hash TablesCS-2301, B-Term Requirement In these applications (and many like them), need constant time access I.e., O(1) Need to access by value!

Hash TablesCS-2301, B-Term Observation Arrays provide constant time access … … but you have to know which element you want! We only know the contents of the item we want! Also Not easy to grow or shrink Not open-ended Can we do better?

Hash TablesCS-2301, B-Term Definition – Hash Table A data structure comprising an array for constant time access A set of linked lists one list for each array element A hashing function to convert search key to array index a randomizing function to assure uniform distribution of values across array indices Also known as a hash function

Hash TablesCS-2301, B-Term Definition – Search Key A value stored as (part of) the payload of the item you are looking for E.g., your credit card number Your account number at Amazon A pattern characteristic of a virus Need to find the item containing that value (i.e., that key)

Hash TablesCS-2301, B-Term Definition – Hash Function A function that randomizes the search key it to produce an index into the array Always returns the same value for the same key So that non-random keys don’t concentrate around a subset of the indices in the array See §6.6 in Kernighan & Ritchie

Hash TablesCS-2301, B-Term data next Hash Table Structure item... data next data next data next data next data next data next data next data next data next data next data next data next The array The lists

Hash TablesCS-2301, B-Term data next Hash Table Structure (continued) item... data next data next data next data next data next data next data next data next data next data next data next data next The array Note that some of the lists are empty Average length of list should be in single digits

Hash TablesCS-2301, B-Term Guidelines for Hash Tables Lists from each item should be short I.e., with short search time (approximately constant) Size of array should be based on expected # of entries Err on large side if possible Hashing function Should “spread out” the values relatively uniformly Multiplication and division by prime numbers usually works well

Hash TablesCS-2301, B-Term Example Hashing Function P. 144 of K & R #define HASHSIZE 101 unsigned int hash(char *s) { unsigned int hashval; for (hashval = 0; *s != ‘\0’; s++) hashval = *s + 31 * hashval; return hashval % HASHSIZE } Note prime numbers to “mix it up”

Hash TablesCS-2301, B-Term Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ }

Hash TablesCS-2301, B-Term Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ } Hash table is indexed by hash value of s

Hash TablesCS-2301, B-Term Using a Hash Table struct item *lookup(char *s) { struct item *np; for (np = hashtab[hash(s)]; np != NULL; np = np -> next) if (strcmp(s, np->data) == 0) return np; /*found*/ return NULL;/* not found */ } Traverse the linked list to find item s

Hash TablesCS-2301, B-Term Using a Hash Table (continued) struct item *addItem(char *s, …) { struct item *np; unsigned int hv; if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np; }; return np; }

Hash TablesCS-2301, B-Term Using a Hash Table (continued) struct item *addItem(char *s, …) { struct item *np; unsigned int hv; if ((np = lookup(s)) == NULL) { np = malloc(item); /* fill in s and data */ np -> next = hashtab[hv = hash(s)]; hashtab[hv] = np; }; return np; } Inserts new item at head of the list indexed by hash value

Hash TablesCS-2301, B-Term Challenge What kinds of situations in your field might you need a hash table?

Hash TablesCS-2301, B-Term Hash Table Summary Widely used for constant time access Easy to build and maintain There is an art and science regarding the choice of hashing functions Consult textbooks, web, etc.

Hash TablesCS-2301, B-Term Questions?