Bitwise Hashing.

Slides:



Advertisements
Similar presentations
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Advertisements

Chapter 6 Structures By C. Shing ITEC Dept Radford University.
Hashing as a Dictionary Implementation
Hashing Techniques.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Bit Operations C is well suited to system programming because it contains operators that can manipulate data at the bit level –Example: The Internet requires.
A bit can have one of two values: 0 or 1. The C language provides four operators that can be used to perform bitwise operations on the individual bits.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 10.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Comp 335 File Structures Hashing.
© 2004 Goodrich, Tamassia Hash Tables1  
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Generic lists Vassilis Athitsos. Problems With Textbook Interface? Suppose that we fix the first problem, and we can have multiple stacks. Can we have.
ENEE150 – 0102 ANDREW GOFFIN Project 4 & Function Pointers.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CS 261 – Data Structures Hash Tables Part II: Using Buckets.
Windows Programming Lecture 06. Data Types Classification Data types are classified in two categories that is, – those data types which stores decimal.
CMSC 202 Lesson 26 Miscellaneous Topics. Warmup Decide which of the following are legal statements: int a = 7; const int b = 6; int * const p1 = & a;
LINKED LISTS.
DYNAMIC MEMORY ALLOCATION. Disadvantages of ARRAYS MEMORY ALLOCATION OF ARRAY IS STATIC: Less resource utilization. For example: If the maximum elements.
Sections 10.5 – 10.6 Hashing.
A bit of C programming Lecture 3 Uli Raich.
COMP 53 – Week Eleven Hashtables.
Data Structure and Algorithms
School of Computer Science and Engineering
Data Structures Interview / VIVA Questions and Answers
CISC/CMPE320 - Prof. McLeod
C Short Overview Lembit Jürimägi.
Review Graph Directed Graph Undirected Graph Sub-Graph
C Basics.
Hash functions Open addressing
CSCE 210 Data Structures and Algorithms
Hash Functions Sections 5.1 and 5.2
Hash Tables Part II: Using Buckets
Hash Tables in C James Goerke.
Arrays … The Sequel Applications and Extensions
Circular Buffers, Linked Lists
CS Introduction to Operating Systems
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Hash Tables.
Popping Items Off a Stack Lesson xx
Arrays and Pointers Reference: Chapter , 4.11 CMSC 202.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
2018, Spring Pusan National University Ki-Joune Li
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Lists.
Hash Tables Open Address Hashing
CISC/CMPE320 - Prof. McLeod
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

Bitwise Hashing

Hash Tables One of the most convenient forms of a hash table implementation is as an array of linked lists.

Data is stored in records containing 1. A key value 2. The actual data 3. A pointer to another record KEY DATA POINTER

Keys and Data Often the key value and the actual value are the same thing, as in the case of a dictionary, where the word being stored can also be used as the key for the hash function WORD POINTER

Hashing Hashing is the process of turning the key value into an integer pointer, used to locate a storage location in a larger array. Hash codes should be designed to give different codes for different keys, although, this cannot be guaranteed.

Collisions When two keys hash to the same code a collision occurs that must be dealt with. Linked lists offer an efficient solution for collision processing. Records with the same hash code are stored in the same linked list.

Limitations Usually, you will not have long linked lists. Your hash function should be designed to make sure there are few collisions. The problem with long linked lists is that they are sequential search structures with an O(n). As opposed to a simple, non-colliding hash O(1).

ht = hash table ht record record record

Advantages An advantage of using linked lists to implement hash functions is that adding and deleting records is not difficult.

Adding a record ht ht NULL NULL record record record record NULL NULL Hash into list Hash into list NULL NULL If list pointer is If list pointer is NULL then NULL then Record to be added Record to be added assign it to be assign it to be record record the pointer to the pointer to the record. the record. record record

ht NULL record record NULL record Record is now added record

Deleting a record ht NULL Hash into list record record If list pointer is NULL not NULL then record follow the list along until the word is found record

Assign pointer to record to next record then free old record pointer ht NULL record NULL record Record is now added record

Hash table example

Name structure #define SSIZE 20; name struct name { char last[SSIZE]; char first[SSIZE]; char mi; char title[SSIZE]; }

Address structure struct addr { char street[4*SSIZE]; char city[SSIZE]; char state[SSIZE]; char zip[SSIZE]; }

Address_entry structure struct addr_entry { struct name name; struct addr addr; } typedef struct addr_entry Addr_entry; typedef struct addr Addr; typedef struct name Name;

addr_entry name last first mi title addr street city state zip

A single node of the linked list struct addr_list_item { Addr_entry *addr; struct addr_list_item *next; }

addr_entry name last first mi title next addr street city state zip

typedef struct addr_list_item Addr_list_item; typedef Addr_list_item *Addr_list; #define EMPTY_LIST NULL

The hash table definition #define TABLE_SIZE 256 static Addr_list addr_ht[TABLE_SIZE]; The definition of this array as ‘static’ means that all values are set to NULL. This is what we want to start out with. As we hash into this table later on we will build linked lists from these pointers.

Hashing has two parts to it. 1. a function to convert C strings to unsigned integers 2. a function to convert unsigned integers to hash keys. The basic idea is that a person’s last name, first name and middle initial will be combined into one string, converted to an integer and then hashed into a table location between 0 and 255.

Character conversion Conversion of strings to unsigned integers There are conversion functions available in C to convert strings to numbers. to integer: atoi to double: atof to long integer: atol

Assumptions The presupposition is that the entire string does not contain more bits than can be represented by the data type involved. When you are trying to convert longer strings there are other functions that can be used.

Available conversions double: strtod(CSTR str, char **rest) long: strtol(CSTR str, char **rest, int base) unsigned long: strtoul(CSTR str, char **rest, int base) where CSTR is const string and **rest is a pointer to the rest of the string (that portion of the string that was too long to fit in the designated data type.

Whole strings to integers? If, however, we want to take an entire string (like one consisting of a last name, first name and middle initial) and process the whole thing into an integer we must write this function ourselves.

unsigned int str_to_int(const char unsigned int str_to_int(const char *str) { unsigned value = 0u, tmp = 0u; int size = sizeof(int)/sizeof(char); int len = strlen(str); while ( len >= size) { value ^= *(unsigned *)str; /* xor & typecast */ str += size; len -= size; } if ( len > 0 ) { strcopy( (char *)&tmp, str ); value ^= tmp; } }

Explanation of str_to_int This function converts a potentially long string to an unsigned integer. The size of an unsigned integer is machine-dependent. Let us assume that it is 32 bits. Then, value and tmp are initialized as follows:

Other key variables int size = sizeof(int)/sizeof(char) = 32 bits / 8 bits = 4 chars (bytes) per unsigned integer int len = strlen(str) /* the number of characters in the string */

value 1 byte 1 byte 1 byte 1 byte 32 bits tmp

The while loop then reads. while ( len >= size ) The while loop then reads while ( len >= size ) while the number of characters remaining to be processed into our integer >= the number of characters that can fit into one unsigned integer do the following.

In other words, the loop takes blocks of 4- In other words, the loop takes blocks of 4- characters (32 bits) from the string and processes them into value. The loop ends when there are not enough unprocessed characters left in the string to take up 32 bits.

The if condition after the loop will process The if condition after the loop will process the remaining characters into the unsigned integer value.

Handout Given str = “ABCDE”; how does the function come up with ‘value’?

Bitwise operators Expression Comment n & 017 Bitwise and; value is n with all but lower 4 bits masked away i | j Bitwise i or j i ^ j Bitwise i exclusive or j i << 4 Value is left shift i by 4 bits j >> 5 Value is right shift i by 5 bits ~n Value is 1’s complement of n

Truth tables (&|^) & | ^ T T T F F T F F T F T T T F F T F F T F T T

1 byte 1 byte 1 byte 1 byte 32 bits value A B C D E str 1 byte 1 byte 1 byte 1 byte 32 bits value A B C D E 1 str value = value ^ str 1

Dealing with leftovers value 1 if ( len > 0) { strcpy( (char *)&tmp, str ); 1 1 1 1 str

1 1 1 1 str tmp after strcpy 1

The final value tmp 1 value 1 value = value ^ tmp 1 1

What next? Now that we have the function that converts strings to unsigned integers, we can write the function that hashes unsigned integers into our hash table.

The hash function unsigned hash(const Name *name) { char h_str[HSTR_LEN]; unsigned int val; sprintf(h_str, “%s%s%c”, name->last, name->first, name->mi); val = str_to_int(h_str); return( val >> SHIFT_AMT ); }

SHIFT_AMT Where SHIFT_AMT is determined by #define SHIFT_AMT 8*sizeof(unsigned int) -TABLE_BITS

TABLE_BITS and TABLE_BITS is defined as #define TABLE_BITS 8 TABLE BITS is a function of the size of your hash table as a power of 2. Therefore, since our hash table is 256 the TABLE_BITS are set to 8 (as in 28).

The necessary conversion We want to convert name to an unsigned int and then map the results into a hash table of size 256. To do this last step we only need to select 8 bits of the 32-bit hash value.

Bitwise operations One easy method of selecting 8 bits is to use bit-wise operations to right-shift the value so that only 8 bits (TABLE_BITS) remain. This is a number between 0 and 255 and will go in to the hash table.

The hashing process

Value before and after right shifting Result of str_to_int(h_str) 1 multiplied by HASH_MULT 1 right shift 32-8 = 24 bits 1 value is 81

Results The right shift of 24 bits forces the result into an integer that fits in 8 bits. The largest such integer is 255 and the smallest is 0. We now have our hash value. In this case 81

Fundamental operations There are a host of hash table operations that must be performed. Some of them need to be able to hash values into the table as well. 1.) data retrieval Your data is stored by name. You want to enter a person’s name and have the program look them up in the table so you can print their address, etc.

2.) data removal You no longer need a person’s record. 3.) data insertion You wish to add a new record 4.) record creation Needed by the data insertion function

Data retrieval Addr_list retrieve(const Name *name) { Addr_list list_a = addr_ht[hash(name)]; for (; list_a != EMPTY_LIST; list_a = list_a->next) if (cmp_name(name, &list_a->addr->name) == 0) return(list_a); /* entry found */ return( EMPTY_LIST); /* entry not found */ }

Data Removal static int found = 0; int erase(const Name *name) { unsigned hashcode = hash(name); Addr_list list_a = addr_ht[hashcode]; Addr_list delete_list(Addr_list, const name *); // prototype found = 1; if (list_a != EMPTY_LIST ) { addr_ht[hashcode] = delete_list(list_a, name); return(found -1); } return(-1); /* no entry to delete */ }

Obviously, this needs a deletion routine.

Deletion Addr_list Delete_list(Addr_list list_a, const Name *name); { Addr_list ans; if ( list_a == EMPTY_LIST) { found = 0; return(NULL); }

if ( cmp_name (&list_a->addr->name, name) == 0) { ans = list_a->next; free(list_a); return(ans); } list_a->next = delete_list(list_a->next, name); return( list_a) }

Adding a new entry void entr_add(Addr_list a) { Addr_list b = retrieve(&a->addr->name); unsigned hashcode; if (b != EMPTY_LIST) /* replace existing entry */ { a->next = b->next; *b = *a; /* structure assignment */ free(a); }

else /. install new entry else /* install new entry */ { hashcode = hash(&a->addr->name); b = addr_ht[hashcode]; a->next = b; /* b is NULL or points to first item in a linked list */ addr_ht[hashcode] = a; } } New entries are inserted in the first position in the linked list.