IS 2610: Data Structures Searching March 29, 2004.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
Hashing CS 3358 Data Structures.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Hashing (Ch. 14) Goal: to implement a symbol table or dictionary (insert, delete, search)  What if you don’t need ordered keys--pred, succ, sort, select?
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CSCE 3110 Data Structures & Algorithm Analysis Binary Search Trees Reading: Chap. 4 (4.3) Weiss.
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
1 Road Map Associative Container Impl. Unordered ACs Hashing Collision Resolution Collision Resolution Open Addressing Open Addressing Separate Chaining.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Data Structures and Algorithms Lecture (Searching) Instructor: Quratulain Date: 4 and 8 December, 2009 Faculty of Computer Science, IBA.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
IS 2610: Data Structures Priority Queue, Heapsort, Searching March 15, 2004.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CSCI 210 Data Structures and Algorithms
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Advanced Associative Structures
Data Structures and Algorithms
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
EE 312 Software Design and Implementation I
Presentation transcript:

IS 2610: Data Structures Searching March 29, 2004

Symbol Table A symbol table is a data structure of items with keys that supports two basic operations: insert a new item, and return an item with a given key  Examples: Account information in banks Airline reservations

Symbol Table ADT Key operations  Insert a new item  Search for an item with a given key  Delete a specified item  Select the k th smallest item  Sort the symbol table  Join two symbol tables void STinit(int); int STcount(); void STinsert(Item); Item STsearch(Key); void STdelete(Item); Item STselect(int); void STsort(void (*visit)(Item));

Key-indexed ST Simplest search algorithm is based on storing items in an array, indexed by the keys static Item *st; static int M = maxKey; void STinit(int maxN) { int i; st = malloc((M+1)*sizeof(Item)); for (i = 0; i <= M; i++) st[i] = NULLitem; } int STcount() { int i, N = 0; for (i = 0; i < M; i++) if (st[i] != NULLitem) N++; return N; } void STinsert(Item item) { st[key(item)] = item; } Item STsearch(Key v) { return st[v]; } void STdelete(Item item) { st[key(item)] = NULLitem; } Item STselect(int k) { int i; for (i = 0; i < M; i++) if (st[i] != NULLitem) if (k-- == 0) return st[i]; } void STsort(void (*visit)(Item)) { int i; for (i = 0; i < M; i++) if (st[i] != NULLitem) visit(st[i]); }

Sequential Search based ST When a new item is inserted, we put it into the array by moving the larger elements over one position (as in insertion sort) To search for an element  Look through the array sequentially  If we encounter a key larger than the search key – we report an error

Binary Search Divide and conquer methodology  Divide the items into two parts  Determine which part the search key belongs to and concentrate on that part Keep the items sorted Use the indices to delimit the part searched. Item search(int l, int r, Key v) { int m = (l+r)/2; if (l > r) return NULLitem; if eq(v, key(st[m])) return st[m]; if (l == r) return NULLitem; if less(v, key(st[m])) return search(l, m-1, v); else return search(m+1, r, v); } Item STsearch(Key v) { return search(0, N-1, v); }

Binary Search Tree NST is a binary tree  A key is associated with each of its internal nodes  Key in any node is larger than (or equal to) the keys in all nodes in that node’s left subtree is smaller than (or equal to) the keys in all nodes in that node’s right subtree What is the output of inorder traversal on BST?

BST insertion Insert L !! void STinsert(Item item) { Key v = key(item); link p = head, x = p; if (head == NULL) { head = NEW(item, NULL, NULL, 1); return; } while (x != NULL) { p = x; x->N++; x = less(v, key(x->item)) ? x->l : x->r; } x = NEW(item, NULL, NULL, 1); if (less(v, key(p->item))) p->l = x; else p->r = x; } A A E E R R A A I I M M G G S S N N P P O O T T X X link insertR(link h, Item item) { Key v = key(item), t = key(h->item); if (h == z) return NEW(item, z, z, 1); if less(v, t) h->l = insertR(h->l, item); else h->r = insertR(h->r, item); (h->N)++; return h; } void STinsert(Item item) { head = insertR(head, item); }

BST Complexities Best and worst case heights  ln N and N Search costs  Internal path length is related to – search hit  External path length is related to – search miss N random keys  Average: Insertion, Search hit and Search miss require about 2 ln N comparisons  Worst case search: N comparisons

Basic Rotations Transformations to rearrange nodes in a tree  Maintain BST  Changes three pointers link rotL(link h) { link x = h->r; h->r = x->l; x->l = h; return x; } link rotR(link h) { link x = h->l; h->l = x->r; x->r = h; return x; }

Balanced Trees BST – worst case is bad!! Keep trees balanced so that searches can be done in less than ln N + 1 comparisons  Maintenance cost incurred! Splay trees (Self-adjusting)  Tree automatically reorganizes itself after each op  When insert or search for x, rotate x up to root using “double rotations”  Tree remains “balanced” without explicitly storing any balance information

Splay trees Check two links above current node  ZIG-ZAG: if orientations differ, same as root insertion  ZIG-ZIG: if orientations match, do top rotation first (unlike bottom rotation in root insertion using basic rotations)

2-3-4 Trees Nodes can hold more than one key  2-nodes : 1 key; two links  3-nodes : 2 keys; three links  4-nodes : 3 keys; four links A balanced tree  Links to empty trees are at the same hieght A A S S R R A, C S S R R A, C, H S S R R S S C, R H H A A

2-3-4 Trees How doe you Search? Insert  Search to bottom for key  2-node at bottom: convert to 3-node  3-node at bottom: convert to 4-node  4-node at bottom – split Whenever root becomes 4 node – split it into a triangle of three 2-nodes Add E

Red black trees Represent trees as binary trees

Hashing Save items in a key-indexed table  Index is a function of the key Hash function  function to compute table index from search key Collision resolution strategy  Algorithms and data structures to handle two keys that hash to the same index  One approach – use linked list

Hashing Time-space complexity  No space limitation Any search can be done in one memory access  No time limitation Use limited memory and do sequential search  Limitation on both Hashing to balance

Hash function: h Given a hash table of size M  h(Key) is a value in [0,.., M]  Ideally, for each input, every output should be equally likely Simple methods  Modular hash function h(K) = K mod M; choose M as prime  Multiplicative and modular methods h(K) = (K  ) mod M; choose M as prime A popular choice is  = (golden ration)

Hash Function: h Strings of characters  26 4 .5 Million 4-char keys  Table size M = 101 abcd hashes to 11  0x % 101 = % 101 = 11 dcba hashes to 57  Collision is inevitable Binary Hex Dec asciiabcd

Hash function: h Horner’s method 0x = 256*(256*(256*97+98) + 99)+100 0x mod 101 = 256*(256*(256*97+98) + 99)+100 mod 101 Can take mod after each op  (256*97+98) mod 101 = 84  (256*84+99) mod 101 = 90  (256*90+100) mod 101 = 11 N add, multiply and mod ops Binary Hex Dec asciiabcd int hash(char *v, int M) { int h = 0, a = 127; for (; *v != '\0'; v++) h = (a*h + *v) % M; return h; } Why 127 instead of 128?

Universal Hashing and collision Universal function  Chance of collision for two distinct keys for table size M is precisely 1/M How to handle the case when two keys hash to the same value  Separate chaining  Open addressing – linear probe Double hashing  Dynamic hash – increase table size dynamically int hashU(char *v, int M) { int h, a = 31415, b = 27183; for (h = 0; *v != '\0'; v++, a = a*b % (M-1)) h = (a*h + *v) % M; return h; } Performs well in practice!

Separate Chaining A linked list for each hash address  M linked lists M much smaller than N Property 14.1: Number of comparisons  Reduced by factor of M  Average length of the lists is N/M Search the list  Unordered: insert takes constant time Search is proportional to N/M

Open Addressing Open addressing  M is much larger than N  Plenty of empty table slots  When a new key collides find an empty slot  Complex collision patterns Linear Probing  When collision occurs, check (probe) the next position in the table Wrap around the table to find an empty slot

Linear Probing Load factor   - fraction of the table positions that are occupied (less than 1) Search increases with the value of  Search loops infinitely when  = 1  Insert: ½(1+ (/(1-  )2) ASERCHINGXMP

Double Hashing Avoid clustering using second hash Take hash function relatively prime to avoid from probe sequence to be very short  Make M prime  Choose second has value that returns values less than M A useful second hash: (k mod 97) + 1