Hash C and Data Structure Baojian Hua

Slides:



Advertisements
Similar presentations
C and Data Structures Baojian Hua
Advertisements

Hashing as a Dictionary Implementation
Extensible Array C and Data Structures Baojian Hua
CS 171: Introduction to Computer Science II Hashing and Priority Queues.
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Data Management and File Organization
Dictionaries and Their Implementations
Maps, Dictionaries, Hashtables
Queue C and Data Structures Baojian Hua
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Binary Search Tree C and Data Structures Baojian Hua
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Relation Discrete Mathematics and Its Applications Baojian Hua
Hash Tables1 Part E Hash Tables  
String C and Data Structures Baojian Hua
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Stack C and Data Structures Baojian Hua
C and Data Structures Baojian Hua
Hash Tables1 Part E Hash Tables  
Graph C and Data Structures Baojian Hua
CSE 326 Hashing Richard Anderson (instead of Martin Tompa)
Hash Discrete Mathematics and Its Applications Baojian Hua
Extensible Array C and Data Structures Baojian Hua
Extensible Array C and Data Structures Baojian Hua
Linked List C and Data Structures Baojian Hua
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Graph Discrete Mathematics and Its Applications Baojian Hua
Binary Search Tree C and Data Structures Baojian Hua
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash C and Data Structure Baojian Hua
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
ENEE150 – 0102 ANDREW GOFFIN Project 4 & Function Pointers.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hash Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne.
1 the hash table. hash table A hash table consists of two major components …
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
School of Computer Science and Engineering
Advanced Associative Structures
Richard Anderson (instead of Martin Tompa)
Discrete Mathematics and
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hash C and Data Structure Baojian Hua

Searching A dictionary-like data structure contains a collection of tuple data:,, … keys are comparable and pair-wise distinct supports these operations: new () insert (dict, k, v) lookup (dict, k) delete (dict, k)

Examples ApplicationPurposeKeyValue Phone Bookphonenamephone No. Banktransactionvisa$$$ Dictionarylookupwordmeaning compilersymbolvariabletype om searchkey wordscontents …………

Summary So Far rep ’ op ’ arraysorted array linked list sorted linked list binary search tree lookup()O(n)O(lg n)O(n) insert()O(n) delete()O(n)

What ’ s the Problem? For every mapping (k, v)s After we insert it into the dictionary dict, we don ’ t know it ’ s position! Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … and then lookup (d, “ zhang ” ); ( “ li ”, 97) … ( “ wang ”, 99) ( “ zhang ”, 100)

Basic Plan Start from the array-based approach Use an array A to hold elements (k, v)s For every key k: if we know its position (array index) i from k then lookup, insert and delete are simple: A[i] done in constant time O(1) … (k, v) i

Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#1: How to calculate index from the key?

Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#2: How long should array be?

Basic Plan Save (k, v)s in an array, index calculated from k Hash function: a method for computing index from given keys … (“li”, 97) hash (“li”)

Hash Function Given any key, compute an index Efficiently computable Ideal goals: for any key, the index is uniform different keys to different indexes However, thorough research problem, :-( Next, we assume that the array is of infinite length, so the hash function has type: int hash (key k); Next is a “ case analysis ” on how different key types affect “ hash ”

Hash Function On “ int ” If the key of hash is of “ int ” type, the hash function is trivial: int hash (int i) { return i; }

Hash Function On “ char ” If the key of hash is of “ char ” type, the hash function comes with type conversion: int hash (char c) { return c; }

Hash Function On “ float ” Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0.aaa, say 0.5?

Hash Function On “ string ” int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }

From “ int ” Hash to Index Problems with “ int ” Hash Type At any time, the array is finite no negative index (say -10) Our goal: int i ==> [0, N-1] Aha, that ’ s easy! It ’ s just: abs(i) % N

Bug! Note that “ int ” s range: ~ So abs(-2 31 ) = 2 31 (Overflow!) The key step is to wipe the sign bit off int t = i & 0x7fffffff; int hc = t % N; In summary: hc = (i & 0x7fffffff) % N;

Collision Given two keys k1 and k2, we compute two hash values h1, h2  [0, N-1] If k1<>k2, but h1==h2, then a collision occurs … (k1, v1) i (k2, v2)

Collision Resolution Open Addressing Re-hash Chaining

For collision index i, we keep a separate linear list (chain) at index i … (k1, v1) i (k2, v2) k1 k2

Load Factor loadFactor=numItems/numBuckets defaultLoadFactor: default value of the load factor k1 k2 k5k8 k43

“ hash ” ADT: interface #ifndef HASH_H #define HASH_H #define T Hash_t typedef struct T *T; T Hash_new (); T Hash_new2 (double lf); void Hash_insert (T h, poly key, poly value); poly Hash_lookup (T h, poly key); void Hash_delete (T h, poly key); #undef T #endif

Implementation #include “linked-list.h” #include “hash.h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 #define T Hash_t struct T { LinkedList_t (*buckets)[INIT_BUCKETS]; int numBuckets; int numItems; double defaultLoadFactor; };

In Figure k1 k2 k5k8 k43 buckets h

“ newHash () ” T Hash_new () { T h; NEW (h); h->buckets = checkedMalloc (initBuckets * sizeof (linkedList)); h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->defaultLoadFactor = 0.25; return h; }

“ newHash2 () ” T Hash_new (double lf) { T h; NEW (h); h->buckets = checkedMalloc (initBuckets * sizeof (linkedList)); h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->defaultLoadFactor = lf; return h; }

“ lookup (hash, key) ” Poly_t Hash_lookup (T h, poly k) { int i = k->hashCode (); // how to take this? int hc = (i & 0x7fffffff) % (h->numBuckets); Poly_t t = List_Search ((h->buckets)[hc], k); return t; }

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k8,

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k43, found!

“ insert (hash, key, value) ” void Hash_insert (t h, poly k, poly v) { if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to do this? int hc = (i & 0x7fffffff) % (h->numBuckets); Tuple_t x = Tuple_new (k, v); List_insertHead ((h->buckets)[hc], x); return; }

Ex: insert (ha, k13) k1 k2 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4

Ex: insert (ha, k13) k13 k1 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 k2

Complexity rep ’ op ’ arraysorted array linked list sorted linked list hash lookup()O(n)O(lg n)O(n) O(1) insert()O(n) O(1) delete()O(n) O(1)