Hashing, B - Trees and Red – Black Trees using Parallel Algorithms & Sequential Algorithms By Yazeed K. Almarshoud.

Slides:



Advertisements
Similar presentations
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Advertisements

Trees Types and Operations
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
A balanced life is a prefect life.
6/14/2015 6:48 AM(2,4) Trees /14/2015 6:48 AM(2,4) Trees2 Outline and Reading Multi-way search tree (§3.3.1) Definition Search (2,4)
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
CSC 212 Lecture 19: Splay Trees, (2,4) Trees, and Red-Black Trees.
Advanced Trees Part III Briana B. Morrison Adapted from Alan Eugenio & William J. Collins.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
10/20/2015 2:03 PMRed-Black Trees v z. 10/20/2015 2:03 PMRed-Black Trees2 Outline and Reading From (2,4) trees to red-black trees (§9.5) Red-black.
© 2004 Goodrich, Tamassia Red-Black Trees v z.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
© 2004 Goodrich, Tamassia Red-Black Trees v z.
CSC401 – Analysis of Algorithms Lecture Notes 6 Dictionaries and Search Trees Objectives: Introduce dictionaries and its diverse implementations Introduce.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Lec 15 Oct 18 Binary Search Trees (Chapter 5 of text)
Fall 2006 CSC311: Data Structures 1 Chapter 10: Search Trees Objectives: Binary Search Trees: Search, update, and implementation AVL Trees: Properties.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
1 Binary Search Trees   . 2 Ordered Dictionaries Keys are assumed to come from a total order. New operations: closestKeyBefore(k) closestElemBefore(k)
File Organization and Processing Week 3
COMP9024: Data Structures and Algorithms
BCA-II Data Structure Using C
Red-Black Trees v z Red-Black Trees Red-Black Trees
Red-Black Trees 5/17/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Red-Black Trees 5/22/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Binary Search Trees < > =
Multiway Search Trees Data may not fit into main memory
Azita Keshmiri CS 157B Ch 12 indexing and hashing
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
CSCI Trees and Red/Black Trees
Red-Black Trees v z Red-Black Trees 1 Red-Black Trees
Week 11 - Friday CS221.
i206: Lecture 13: Recursion, continued Trees
Binary Search Trees Why this is a useful data structure. Terminology
Binary Trees, Binary Search Trees
Red-Black Trees Motivations
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Lec 12 March 9, 11 Mid-term # 1 (March 21?)
Red-Black Trees v z Red-Black Trees 1 Red-Black Trees
CS202 - Fundamental Structures of Computer Science II
Red-Black Trees v z /20/2018 7:59 AM Red-Black Trees
Red-Black Trees v z Red-Black Trees Red-Black Trees
Multi-Way Search Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Introduction to Algorithms 6.046J/18.401J
Indexing and Hashing Basic Concepts Ordered Indices
(2,4) Trees (2,4) Trees (2,4) Trees.
Binary Search Trees < > =
Red-Black Trees v z /17/2019 4:20 PM Red-Black Trees
CS202 - Fundamental Structures of Computer Science II
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
Introduction to Algorithms
Advanced Implementation of Tables
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Binary Trees, Binary Search Trees
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
(2,4) Trees (2,4) Trees (2,4) Trees.
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
Hashing.
Binary Trees, Binary Search Trees
Binary Search Trees < > = Dictionaries
Red-Black Trees v z /6/ :10 PM Red-Black Trees
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

Hashing, B - Trees and Red – Black Trees using Parallel Algorithms & Sequential Algorithms By Yazeed K. Almarshoud

Road map Introduction. Definitions. Sequential algorithms Hashing. B Trees Red Black Trees. Sequential algorithms Parallel algorithms February 25, 2019

Introduction In this presentation I ‘m glad to present to you the importance of parallel computations and how it is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones which are then solved concurrently February 25, 2019

Hashing

Definitions Hashing: A hash function is any well – defined procedure or mathematical function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index into an array. i.e. keys made up for alphabetical characters could be replaced by their ASCII equivalents. Two standards hashing techniques: Division methods Multiplication method. February 25, 2019

Symbol – table problem February 25, 2019

Hash functions February 25, 2019

Choosing a hash function The assumption of simple uniform hashing is hard to guarantee, but several common techniques tend to work well in practice as long as their deficiencies can be avoided. Desirata: A good hash function should distribute the keys uniformly into the slots of the table. Regularity in the key distribution should not affect this uniformity. February 25, 2019

Division method Assume all keys are integers, and define h(k) = k mod m. Deficiency: Don’t pick an m that has a small divisor d. A preponderance of keys that are congruent modulo d can adversely affect uniformity. Extreme deficiency: If m = 2r, then the hash doesn’t even depend on all the bits of k: If k = 10110001110110102 and r = 6, then h(k) = 0110102 . February 25, 2019

Division method (continued) h(k) = k mod m. Pick m to be a prime not too close to a power of 2 or 10 and not otherwise used prominently in the computing environment. Annoyance: Sometimes, making the table size a prime is inconvenient. But, this method is popular, although the next method we’ll see is usually superior. February 25, 2019

Multiplication method Assume that all keys are integers, m = 2r, and our computer has w-bit words. Define h(k) = (A·k mod 2w) rsh (w – r), where rsh is the “bit-wise right-shift” operator and A is an odd integer in the range 2w–1 < A < 2w. Don’t pick A too close to 2w. Multiplication modulo 2w is fast. The rsh operator is fast. February 25, 2019

Multiplication method example February 25, 2019

Resolving collisions by chaining February 25, 2019

Analysis of chaining February 25, 2019

Search cost February 25, 2019

Resolving collisions by open addressing No storage is used outside of the hash table itself.. The hash function depends on both the key and probe number: h : U X {0, 1, …, m–1}  {0, 1, …, m–1}. E.g. h(k) = (k+i) mod m ; h(k) = (k+i2 ) mod m Inserting a key k: we check T[h(k,0)]. If empty we insert k, there. Otherwise, we check T[h(k,1)]. If empty we insert k, there. Otherwise,… otherwise etc for h(k,2), h(k,1), …, h(k,m–1). Finding a key k: we check if T[h(k,0)] is empty, and if =k. If not we check if T[h(k,1)] is empty, and if =k. If not Deleting a key k Find it are replace with a dummy (why) February 25, 2019

Example of open addressing February 25, 2019

Example of open addressing February 25, 2019

Example of open addressing February 25, 2019

Example of open addressing February 25, 2019

Probing strategies Linear probing: Given an ordinary hash function h ‘(k), linear probing uses the hash function h(k,i) = (h’(k) + i) mod m. This method, though simple, suffers from primary clustering, where long runs of occupied slots build up, increasing the average search time. Moreover, the long runs of occupied slots tend to get longer. February 25, 2019

Red-Black Trees 6 v 3 8 z 4

Roadmap Definition Height Insertion Deletion restructuring recoloring adjustment February 25, 2019

Red-Black Tree A red-black tree can also be defined as a binary search tree that satisfies the following properties: Root Property: the root is black External Property: every leaf is black Internal Property: the children of a red node are black Depth Property: all the leaves have the same black depth 9 4 15 2 6 12 21 7 February 25, 2019

Height of a Red-Black Tree Theorem: A red-black tree storing n items has height O(log n) The search algorithm for a red-black search tree is the same as that for a binary search tree By the above theorem, searching in a red-black tree takes O(log n) time February 25, 2019

Insertion To perform operation insertItem(k, o), we execute the insertion algorithm for binary search trees and color red the newly inserted node z unless it is the root We preserve the root, external, and depth properties If the parent v of z is black, we also preserve the internal property and we are done Else (v is red ) we have a double red (i.e., a violation of the internal property), which requires a reorganization of the tree Example where the insertion of 4 causes a double red: 6 6 v v 3 8 3 8 z z 4 February 25, 2019

Remedying a Double Red Consider a double red with child z and parent v, and let w be the sibling of v Case 1: w is black The double red is an incorrect replacement of a 4-node Restructuring: we change the 4-node replacement Case 2: w is red The double red corresponds to an overflow Recoloring: we perform the equivalent of a split 4 4 w v w v 2 7 2 7 z z 6 6 4 6 7 2 4 6 7 .. 2 .. February 25, 2019

Local invariants example: It involves only the fields of an object and the fields of its tree-children We specify local invariants using the repOkLocal method. February 25, 2019

Restructuring A restructuring remedies a child-parent double red when the parent red node has a black sibling It is equivalent to restoring the correct replacement of a 4-node The internal property is restored and the other properties are preserved z 4 6 w v v 2 7 4 7 z w 6 2 4 6 7 4 6 7 .. 2 .. .. 2 .. February 25, 2019

Restructuring (cont.) There are four restructuring configurations depending on whether the double red nodes are left or right children 6 4 2 6 2 4 2 6 4 2 6 4 4 2 6 February 25, 2019

Recoloring A recoloring remedies a child-parent double red when the parent red node has a red sibling The parent v and its sibling w become black and the grandparent u becomes red, unless it is the root The double red violation may propagate to the grandparent u 4 4 w v w v 2 7 2 7 z z 6 6 … 4 … 2 4 6 7 2 6 7 February 25, 2019

Analysis of Insertion Algorithm insertItem(k, o) 1. We search for key k to locate the insertion node z 2. We add the new item (k, o) at node z and color z red 3. while doubleRed(z) if isBlack(sibling(parent(z))) z  restructure(z) return else { sibling(parent(z) is red } z  recolor(z) Recall that a red-black tree has O(log n) height Step 1 takes O(log n) time because we visit O(log n) nodes Step 2 takes O(1) time Step 3 takes O(log n) time because we perform O(log n) recolorings, each taking O(1) time, and at most one restructuring taking O(1) time Thus, an insertion in a red-black tree takes O(log n) time February 25, 2019

Deletion To perform operation remove(k), we first execute the deletion algorithm for binary search trees Let v be the internal node removed, w the external node removed, and r the sibling of w If either v of r was red, we color r black and we are done Else (v and r were both black) we color r double black, which is a violation of the internal property requiring a reorganization of the tree Example where the deletion of 8 causes a double black: 6 6 v r 3 8 3 r w 4 4 February 25, 2019

Remedying a Double Black The algorithm for remedying a double black node w with sibling y considers three cases Case 1: y is black and has a red child We perform a restructuring, equivalent to a transfer , and we are done Case 2: y is black and its children are both black We perform a recoloring, equivalent to a fusion, which may propagate up the double black violation Case 3: y is red We perform an adjustment, equivalent to choosing a different representation of a 3-node, after which either Case 1 or Case 2 applies Deletion in a red-black tree takes O(log n) time February 25, 2019

Red-Black Tree Reorganization Insertion remedy double red Red-black tree action (2,4) tree action result restructuring change of 4-node representation double red removed recoloring split double red removed or propagated up Deletion remedy double black Red-black tree action (2,4) tree action result restructuring transfer double black removed recoloring fusion double black removed or propagated up adjustment change of 3-node representation restructuring or recoloring follows February 25, 2019

Binary Trees 6 3 8

Binary Trees A tree in which no node can have more than two children The depth of an “average” binary tree is considerably smaller than N, even though in the worst case, the depth can be as large as N – 1. February 25, 2019

Example: Expression Trees Leaves are operands (constants or variables) The other nodes (internal nodes) contain operators Will not be a binary tree if some operators are not binary February 25, 2019

Binary Trees Possible operations on the Binary Tree ADT Implementation parent left_child, right_child sibling root, etc Implementation Because a binary tree has at most two children, we can keep direct pointers to them February 25, 2019

Compare: Implementation of a general tree February 25, 2019

Binary Search Trees Stores keys in the nodes in a way so that searching, insertion and deletion can be done efficiently. Binary search tree property For every node X, all the keys in its left subtree are smaller than the key value in X, and all the keys in its right subtree are larger than the key value in X February 25, 2019

Binary Search Trees A binary search tree Not a binary search tree February 25, 2019

Binary search trees Two binary search trees representing the same set: Average depth of a node is O(log N); maximum depth of a node is O(N) February 25, 2019

Implementation February 25, 2019

Searching BST If we are searching for 15, then we are done. If we are searching for a key < 15, then we should search in the left subtree. If we are searching for a key > 15, then we should search in the right subtree. February 25, 2019

February 25, 2019

Searching (Find) Find X: return a pointer to the node that has key X, or NULL if there is no such node Time complexity O(height of the tree) February 25, 2019

Inorder traversal of BST Print out all the keys in sorted order Inorder: 2, 3, 4, 6, 7, 9, 13, 15, 17, 18, 20 February 25, 2019

FindMin/ FindMax Return the node containing the smallest element in the tree Start at the root and go left as long as there is a left child. The stopping point is the smallest element Similarly for findMax Time complexity = O(height of the tree) February 25, 2019

Insert Time complexity = O(height of the tree) Proceed down the tree as you would with a find If X is found, do nothing (or update something) Otherwise, insert X at the last spot on the path traversed Time complexity = O(height of the tree) February 25, 2019

Delete When we delete a node, we need to consider how we take care of the children of the deleted node. This has to be done such that the property of the search tree is maintained. February 25, 2019

Delete Three cases: (1) the node is a leaf (2) the node has one child Delete it immediately (2) the node has one child Adjust a pointer from the parent to bypass that node February 25, 2019

Delete (3) the node has 2 children replace the key of that node with the minimum element at the right subtree delete the minimum element Has either no child or only right child because if it has a left child, that left child would be smaller and would have been chosen. So invoke case 1 or 2. Time complexity = O(height of the tree) February 25, 2019

Sequential algorithms

Sequential algorithms Is the search space a tree or a graph? The space of a 0/1 integer program is a tree, while that of an 8-puzzle is a graph. This has important implications for search since unfolding a graph into a tree can have significant overheads. February 25, 2019

Sequential algorithms Two examples of unfolding a graph into a tree. February 25, 2019

Best-First Search (BFS) Algorithms BFS algorithms use a heuristic to guide search. The core data structure is a list, called Open list, that stores unexplored nodes sorted on their heuristic estimates. The best node is selected from the list, expanded, and its off-spring are inserted at the right position. If the heuristic is admissible, the BFS finds the optimal solution. February 25, 2019

Best-First Search (BFS) Algorithms BFS of graphs must be slightly modified to account for multiple paths to the same node. A closed list stores all the nodes that have been previously seen. If a newly expanded node exists in the open or closed lists with better heuristic value, the node is not inserted into the open list. February 25, 2019

Best-First Search: Example Applying best-first search to the 8-puzzle: (a) initial configuration; (b) final configuration; and (c) states resulting from the first four steps of best-first search. Each state is labeled with its -value (that is, the Manhattan distance from the state to the final state). February 25, 2019

Search Overhead Factor The amount of work done by serial and parallel formulations of search algorithms is often different. Let W be serial work and WP be parallel work. Search overhead factor s is defined as WP/W. Upper bound on speedup is p×(W/WP). February 25, 2019

Parallel algorithms

Parallel algorithms How is the search space partitioned across processors? Different subtrees can be searched concurrently. However, subtrees can be very different in size. It is difficult to estimate the size of a subtree rooted at a node. Dynamic load balancing is required. February 25, 2019

Parallel algorithms When a processor runs out of work, it gets more work from another processor. This is done using work requests and responses in message passing machines and locking and extracting work in shared address space machines. On reaching final state at a processor, all processors terminate. Unexplored states can be conveniently stored as local stacks at processors. The entire space is assigned to one processor to begin with. February 25, 2019

Parallel Best-First Search The core data structure is the Open list (typically implemented as a priority queue). Each processor locks this queue, extracts the best node, unlocks it. Successors of the node are generated, their heuristic functions estimated, and the nodes inserted into the open list as necessary after appropriate locking. Termination signaled when we find a solution whose cost is better than the best heuristic value in the open list. Since we expand more than one node at a time, we may expand nodes that would not be expanded by a sequential algorithm. February 25, 2019

Parallel Best-First Search A general schematic for parallel best-first search using a centralized strategy. The locking operation is used here to serialize queue access by various processors. February 25, 2019

Parallel Best-First Search The open list is a point of contention. Let texp be the average time to expand a single node, and taccess be the average time to access the open list for a single-node expansion. If there are n nodes to be expanded by both the sequential and parallel formulations (assuming that they do an equal amount of work), then the sequential run time is given by n (taccess + texp). The parallel run time will be at least n taccess. Upper bound on the speedup is (taccess+ texp)/taccess February 25, 2019

Parallel Best-First Search Avoid contention by having multiple open lists. Initially, the search space is statically divided across these open lists. Processors concurrently operate on these open lists. Since the heuristic values of nodes in these lists may diverge significantly, we must periodically balance the quality of nodes in each list. A number of balancing strategies based on ring, blackboard, or random communications are possible. February 25, 2019

Parallel Best-First Search A message-passing implementation of parallel best-first search using the ring communication strategy. February 25, 2019

Parallel Best-First Search An implementation of parallel best-first search using the blackboard communication strategy. February 25, 2019

Parallel Best-First Graph Search Graph search involves a closed list, where the major operation is a lookup (on a key corresponding to the state). The classic data structure is a hash. Hashing can be parallelized by using two functions - the first one hashes each node to a processor, and the second one hashes within the processor. This strategy can be combined with the idea of multiple open lists. If a node does not exist in a closed list, it is inserted into the open list at the target of the first hash function. In addition to facilitating lookup, randomization also equalizes quality of nodes in various open lists. February 25, 2019

Speedup Anomalies in Parallel Search Since the search space explored by processors is determined dynamically at runtime, the actual work might vary significantly. Executions yielding speedups greater than p by using p processors are referred to as acceleration anomalies. Speedups of less than p using p processors are called deceleration anomalies. Speedup anomalies also manifest themselves in best-first search algorithms. If the heuristic function is good, the work done in parallel best-first search is typically more than that in its serial counterpart. February 25, 2019

Speedup Anomalies in Parallel Search The difference in number of nodes searched by sequential and parallel formulations of DFS. For this example, parallel DFS reaches a goal node after searching fewer nodes than sequential DFS. February 25, 2019

Speedup Anomalies in Parallel Search A parallel DFS formulation that searches more nodes than its sequential counterpart February 25, 2019

To here, The Presentation ends By Yazeed K. Almarshoud CS6260

Presentation Question: What are the two methods of Resolving collusions in addressing hashing functions mentioned in the presentation? In Parallel systems, when a processor runs out of work, it gets more work from another processor (How?). Part One: The two methods are: Resolving collusion by chaining. Resolving collusion by open addressing. Part Two: This is done using work requests and responses in message passing machines and locking and extracting work in shared address space machines.