Chapter 8 1. Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

CS Data Structures Chapter 8 Hashing.
Review of Chapter 8 張啟中.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Skip List & Hashing CSE, POSTECH.
NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing 1.
Data Structures Using C++ 2E
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Part II Chapter 8 Hashing. Dynamic Hashing Also called extendible hashing Motivation Limitations of static hashing When the table is to be full, overflows.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Symbol Table Symbol table is used widely in many applications.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Table March COP 3502, UCF.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Static Hashing Dynamic Hashing. – 2 – Sungkyunkwan University, Hyoung-Kee Choi © Symbol table ADT  We define the symbol table as a set of name-attribute.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Data Structures Using C++
Chapter 9 Hashing Dr. Youssef Harrath
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Data Structures Chapter 8: Hashing 8-1. Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Advanced Associative Structures
Hash Table.
CSCE 3110 Data Structures & Algorithm Analysis
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Chapter 8 1

Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management In general, the following operations are performed on a symbol table determine if a particular name is in the table retrieve the attribute of that name modify the attributes of that name insert a new name and its attributes delete a name and its attributes 2

Symbol Table Popular operations on a symbol table include search, insertion, and deletion A binary search tree could be used to represent a symbol table. The worse-case complexities for the operations are O(n). Hashing: insertions, deletions & finds in O(1). Static hashing Dynamic hashing 3

Static Hashing Dictionary pairs are stored in a fixed-size table called hash table. The address of location of an dictionary pairs, x, is obtained by computing some arithmetic function h(x). The memory available to maintain the symbol table (hash table) is assumed to be sequential. The hash table consists of b buckets and each bucket contains s records. h(x) maps the set of possible dictionary pairs onto the integers 0 through b-1. 4

Hash Tables The key density of a hash table is the ratio n/T, where n is the number of dictionary pairs in the table and T is the total number of possible keys. The loading density or loading factor of a hash table is α= n/(sb). 5

Hash Tables Two keys, I 1, and I 2, are said to be synonyms with respect to h if h(I 1 ) = h(I 2 ). An overflow occurs when a new key i is mapped or hashed by h into a full bucket. A collision occurs when two non-identical keys are hashed into the same bucket. If the bucket size is 1, collisions and overflows occur at the same time. 6

Example 8.1 b=26, s=2 Assume there are 10 distinct keys GA, A,G,L,A2,A1,A3,A4 and E If no overflow occur, the time required for hashing depends only on the time required to compute the hash function h. A1? 7

Hash Functions Requirements Simple to compute Minimize the number of collisions Dependent upon all the characters in the keys Uniform hash function If there are b buckets, we hope to have h(x)=i with the probability being (1/b) Mid-square, division, folding, digit analysis 8

Division Using the modulo (%) operator. A key k is divided by some number D and the remainder is used as the hash address for k. The bucket addresses are in the range of 0 through D-1. If D is a power of 2, then h(k) depends only on the least significant bits of k. 9

Division If a division function h is used as the hash function, the table size should not be a power of two. Since programmers have a tendency to use many variables with the same suffix  too many collisions If D is divisible by two, the odd keys are mapped to odd buckets and even keys are mapped to even buckets. Thus, the hash table is biased. 10

Mid-Square Mid-Square function It is computed by squaring the key and then using an appropriate number of bits from the middle of the square to obtain the bucket address. Table size is a power of two 11

Folding The key k is partitioned into several parts, all but the last being of the same length. All partitions are added together to obtain the hash address for k. Shift folding: different partitions are added together to get h(k). Folding at the boundaries: key is folded at the partition boundaries, and digits falling into the same position are added together to obtain h(k). This is similar to reversing every other partition and then adding. 12

Example

Secure Hash Functions Can be applied to any sized message M Produces fixed-length output h Is easy to compute h=H(M) for any message M Given h is infeasible to find x s.t. H(x)=h one-way property Given x is infeasible to find y s.t. H(y)=H(x) weak collision resistance Is infeasible to find any x,y s.t. H(y)=H(x) strong collision resistance 14

Secure Hash Algorithm (SHA) 步驟 1: 對訊息做前處理使得它的長度成為 q*512 位元,其中 q 是 某個整數。前處理的程序可能得在訊息的尾端加上許多個 0 的字 串。 步驟 2: 初始化 160- 位元的輸出緩衝區 OB ,它包含五個 32- 位元的 暫存器 A, B, C, D, E ,其中 A = , B = efcdab89 , C = 98badcfe , D = , E = c3d2e1f0 。(所有的值為 16 進位)。 步驟 3: for (int i = 1; i <= q; i++) { 令 B i = 訊息中第 i 個 512 位元區塊 ; OB = F (OB, B i ); } 步驟 4: 輸出 OB 。 15

Atomic SHA Operation 16 t= 步驟數; 0 ≤ t ≤ 79 f t (B, C, D)= 步驟 t 的基本(位元的)邏輯函數; 例如, (B^C) v (B^D) S k = 把 32 位元的暫存器左迴旋 k 個位元 W t = 從 Bi 推導出來的 32 位元值 K t = 常數

Example 8.4 A = , B = efcdab89 , C = 98badcfe , D = , E = c3d2e1f0 f t (B,C,D)=(B  C  D), K 0 =5a82799, W 0 = B= S 30 (B)= =7bf36ae2=C A=6e3edeb6 B= , D=98badcfe, E=

Overflow Handling There are two ways to handle overflow: Open addressing Linear probing Quadratic probing Rehasing Chaining 18

Open Addressing Assumes the hash table is an array The hash table is initialized so that each slot contains the null key. When a new key is hashed into a full bucket, find the closest unfilled bucket. linear probing or linear open addressing 19

Linear Probing (Example 8.6) Assume 26-bucket table with one slot per bucket and the following keys: GA, D, A, G, L, A2, A1, A3, A4, Z, ZA, E. Let the hash function h(k) = first character of k. When entering G, G collides with GA and is entered at ht[7] instead of ht[6]. 0A 1A2 2A1 3D 4A3 5A4 6GA 7G 8ZA 9E 10 11L ~… 24 25Z 20

Linear Probing When linear open address is used to handle overflows, a hash table search for key k proceeds as follows: compute h(k) examine key at positions ht[h(k)], ht[(h(k)+1) % b], …, ht[(h(k)+j) % b], in this order until one of the following condition happens: ht[(h(k)+j) % b]=k; in this case k is found ht[h(k)+j] is empty; k is not in the table We return to the starting position ht[h(k)]; the table is full and k is not in the table 21

Linear Probing When linear probing is used to resolve overflows, keys tend to cluster together. This increases search time. e.g., to find ZA, you need to examine ht[25], ht[0], …, ht[8] (total of 10 comparisons). The number of comparisons to look up a key is approximated (2-α)/(2-2α), where α is the load density 0A 1A2 2A1 3D 4A3 5A4 6GA 7G 8ZA 9E 10 11L ~… 24 25Z 22

Quadratic Probing One of the problems of linear open addressing is that it tends to create clusters of keys. These clusters tend to merge as more keys are entered, leading to bigger clusters. It is difficult to find an unused bucket. A quadratic probing scheme improves the growth of clusters. A quadratic function of i is used as the increment when searching through buckets. Perform search by examining bucket h(k), (h(k)+i 2 )%b, (h(k)-i 2 )%b for 1 ≤ i ≤ (b-1)/2. 23

Example (h(k)+i 2 )%b, (h(k)-i 2 )%b for 1 ≤ i ≤ (b-1)/2. Example: b=10, insert: 89, 18, 49, 58, % 10=9, 18 % 10=8 49 % 10=9, (9+1 2 ) % 10=0 58 % 10=8, (8+1 2 ) % 10=9, (8+2 2 ) % 10=2 69 % 10=9, (9+1 2 ) % 10=0, (9+2 2 ) % 10= Index

Rehashing Another way to control the growth of clusters is to use a series of hash functions h 1, h 2, …, h m. This is called rehashing. Buckets h i (k), 1 ≤ i ≤ m are examined in that order. 25

Chaining Linear probing performs poorly because the search for a key involves comparisons with keys that have different hash values. Unnecessary comparisons can be avoided if all the synonyms are put in the same list, where one list per bucket. Each chain has a head node. Head nodes are stored sequentially. 26

Example 27 Average search length is (6*1+3*2+1*3+1*4+1*5)/12 = 2

Dynamic Hashing Retain the fast retrieval time Dynamically increasing and decreasing file size without penalty Dynamic hashing is to minimize access to pages 28

An Example Hash Function kh(k) A A B B C C C C A: 100, B: 101, C: 110

Hashing With Directory Accessing any page only requires two steps: First step: use the hash function to find the address of the directory entry. Second step: retrieve the page associated with the address 30

Dynamic Hash Tables with Directories 31 Insert C5 into the hash table: h(C5,2)=01, d[01]  A1, B1  overflow  determine the least u such that h(k,u) is not the same for all keys in the overflow bucket  suppose that u=3  h(A1,3)=001, h(B1,3)=001, h(C5,3)=101 Insert C1 into the hash table: h(C1,2)=01, d[01]  A1, B1  overflow  determine the least u such that h(k,u) is not the same for all keys in the overflow bucket  suppose that u=4  h(A1,4)=0001, h(B1,4)=1001, h(C1,4)=0001

Directoryless Dynamic Hashing Hashing with directory requires at least one level of indirection Directoryless hashing assume a continuous address space in the memory to hold all the records. Therefore, the directory is not needed. Thus, the hash function must produce the actual address of a page containing the key. Contrasting to the directory scheme in which a single page might be pointed at by several directory entries, in the directoryless scheme one unique page is assigned to every possible address. 32

Inserting into a Directoryless Dynamic Hash Table 33 0~2 r +q, 0 ≤q<2 r Insert C5 (110101) into the hash table: h(C5,2)=01, d[01]  A1, B5  overflow  activated 2 r +q  reallocating  (b)  000, 100 indexed using h(k,3) 0~q-1 and 2 r ~2 r +q-1 are indexed using h(k,r+1); else, h(k,r)