CMSC 341 Skip Lists. 8/3/2007 UMBC CMSC 341 SkipList 2 Looking Back at Sorted Lists Sorted Linked List What is the worst case performance of find( ),

Slides:



Advertisements
Similar presentations
CS 225 Lab #11 – Skip Lists.
Advertisements

The Dictionary ADT: Skip List Implementation
MATH 224 – Discrete Mathematics
Alon Efrat Computer Science Department University of Arizona SkipList.
CMSC 341 Binary Heaps Priority Queues. 8/3/2007 UMBC CSMC 341 PQueue 2 Priority Queues Priority: some property of an object that allows it to be prioritized.
Skip List & Hashing CSE, POSTECH.
QuickSort 4 February QuickSort(S) Fast divide and conquer algorithm first discovered by C. A. R. Hoare in If the number of elements in.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
CSE 326 Randomized Data Structures David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Algorithm Efficiency and Sorting
Skip Lists1 Skip Lists William Pugh: ” Skip Lists: A Probabilistic Alternative to Balanced Trees ”, 1990  S0S0 S1S1 S2S2 S3S3 
Searching Chapter Chapter Contents The Problem Searching an Unsorted Array Iterative Sequential Search Recursive Sequential Search Efficiency of.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Week 11 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
Chapter 12 Recursion, Complexity, and Searching and Sorting
Data Structures & Algorithms CHAPTER 4 Searching Ms. Manal Al-Asmari.
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
BINARY SEARCH TREE. Binary Trees A binary tree is a tree in which no node can have more than two children. In this case we can keep direct links to the.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
CMSC 341 Disjoint Sets. 8/3/2007 UMBC CMSC 341 DisjointSets 2 Disjoint Set Definition Suppose we have an application involving N distinct items. We will.
CSC 211 Data Structures Lecture 13
CMSC 341 Disjoint Sets Textbook Chapter 8. Equivalence Relations A relation R is defined on a set S if for every pair of elements (a, b) with a,b  S,
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
A first look an ADTs Solving a problem involves processing data, and an important part of the solution is the careful organization of the data In order.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.
Skip Lists 二○一七年四月二十五日
Winter 2006CISC121 - Prof. McLeod1 Stuff Deadline for assn 3 extended to Monday, the 13 th. Please note that the testing class for assn 3 has changed.
Binary Trees Chapter 10. Introduction Previous chapter considered linked lists –nodes connected by two or more links We seek to organize data in a linked.
© Love Ekenberg Hashing Love Ekenberg. © Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
1 C++ Classes and Data Structures Jeffrey S. Childs Chapter 15 Other Data Structures Jeffrey S. Childs Clarion University of PA © 2008, Prentice Hall.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
Chapter 3 The easy stuff. Lists If you only need to store a few things, the simplest and easiest approach might be to put them in a list Only if you need.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
2005MEE Software Engineering Lecture 7 –Stacks, Queues.
CMSC 202, Version 5/02 1 Trees. CMSC 202, Version 5/02 2 Tree Basics 1.A tree is a set of nodes. 2.A tree may be empty (i.e., contain no nodes). 3.If.
Recursion ITI 1121 N. El Kadri. Reminders about recursion In your 1 st CS course (or its equivalent), you have seen how to use recursion to solve numerical.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
Skip Lists – Why? BSTs –Worse case insertion, search O(n) –Best case insertion, search O(log n) –Where your run fits in O(n) – O(log n) depends on the.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Arrays, Link Lists, and Recursion Chapter 3. Sorting Arrays: Insertion Sort Insertion Sort: Insertion sort is an elementary sorting algorithm that sorts.
Data Structures and Algorithm Analysis Dr. Ken Cosh Linked Lists.
CMSC 202 Computer Science II for Majors. CMSC 202UMBC Topics Templates Linked Lists.
List data structure This is a new data structure. The List data structure is among the most generic of data structures. In daily life, we use shopping.
Searching an Array: Binary Search
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Prof. Neary Adapted from slides by Dr. Katherine Gibson
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Data Structures and Analysis (COMP 410)
CMSC 341 Skip Lists 1.
CMSC 341 Lecture 14 Priority Queues & Heaps
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
CMSC 341 Skip Lists.
CMSC 341 Skip Lists.
Trees CMSC 202, Version 5/02.
B- Trees D. Frey with apologies to Tom Anastasio
Data Structures Sorted Arrays
Data Structures & Algorithms
CSCI 104 Skip Lists Mark Redekopp.
CMSC 341 Skip Lists.
Presentation transcript:

CMSC 341 Skip Lists

8/3/2007 UMBC CMSC 341 SkipList 2 Looking Back at Sorted Lists Sorted Linked List What is the worst case performance of find( ), insert( )? Sorted Array What is the worst case performance of find( ), insert( )?

8/3/2007 UMBC CMSC 341 SkipList 3 An Alternative Sorted Linked List What if you skip every other node?  Every other node has a pointer to the next and the one after that Find :  follow “skip” pointer until target < this.skip.element Resources  Additional storage Performance of find( )?

8/3/2007 UMBC CMSC 341 SkipList 4 Skipping Every 2 nd Node The value stored in each node is shown below the node and corresponds to the the position of the node in the list. It’s clear that find( ) does not need to examine every node. It can skip over every other node, then do a final examination at the end. The number of nodes examined is no more than  n/2  + 1. For example the nodes examined finding the value 15 would be 2, 4, 6, 8, 10, 12, 14, 16, a total of  16/2  + 1 = 9.

8/3/2007 UMBC CMSC 341 SkipList 5 Skipping Every 2 nd and 4 th Node The find operation can now make bigger skips than the previous example. Every 4 th node is skipped until the search is confined between two nodes of size 3. At this point as many as three nodes may need to be scanned. It’s also possible that some nodes may be examined more than once. The number of nodes examined is no more than  n / 4  + 3. Again, look at the nodes examined when searching for 15.

8/3/2007 UMBC CMSC 341 SkipList 6 New and Improved Alternative Add hierarchy of skip pointers  every 2 i -th node points 2 i nodes ahead  For example, every 2 nd node has a reference 2 nodes ahead; every 8 th node has a reference 8 nodes ahead

8/3/2007 UMBC CMSC 341 SkipList 7 Skipping Every 2 i -th node Suppose this list contained 32 nodes and we want to search for some value in it. Working down from the top, we first look at node 16 and have cut the search in half. When we look again one level down in either the right or left half, we have cut the search in half again. We continue in this manner until we find the node being sought (or not). This is just like binary search in an array. Intuitively we can understand why the max number of nodes examined is O(lg N).

8/3/2007 UMBC CMSC 341 SkipList 8 Some Serious Problems This structure looks pretty good, but what happens when we insert or remove a value from the list? Reorganizing the the list is O(N). For example, suppose the first element of the list was removed. Since it’s necessary to maintain the strict pattern of node sizes, it’s easiest to move all the values toward the head and remove the end node. A similar situation occurs when a new node is added.

8/3/2007 UMBC CMSC 341 SkipList 9 Skip Lists Concept: A skip list maintains the same distribution of nodes, but without the requirement for the rigid pattern of node sizes 1/2 have 1 pointer 1/4 have 2 pointers 1/8 have 3 pointers … 1/2 i have i pointers It’s no longer necessary to maintain the rigid pattern by moving values around for insert and remove. This gives us a high probability of still having O(lg N) performance. The probability that a skip list will behave badly is very small.

8/3/2007 UMBC CMSC 341 SkipList 10 A Probabilistic Skip List The number of forward reference pointers a node has is its “size”. The distribution of node sizes is exactly the same as the previous figure, the nodes just occur in a different pattern.

8/3/2007 UMBC CMSC 341 SkipList 11 Inserting a Node When inserting a new node, we choose the size of the node probabilistically. Every skip list has an associated (and fixed) probability, p, that determines the distribution of node sizes. A fraction, p, of the nodes that have at least r forward references also have r + 1 forward references.

8/3/2007 UMBC CMSC 341 SkipList 12 Skip List Insert To insert node:  Create new node with random size.  For each pointer, i, connect to next node with at least i pointers. int generateNodeSize(double p, int maxSize) { int size = 1; while (drand48() < p) size++; return (size >maxSize) ? maxSize : size; }

8/3/2007 UMBC CMSC 341 SkipList 13 An Aside on Node Distribution Given an infinitely long skip list with associated probability p, it can be shown that 1 – p nodes will have just one forward reference. This means that p(1 – p) nodes will have exactly two forward references and in general p k (1 – p) nodes will have k + 1 forward reference pointers. For example, with p = (1/2 of the nodes will have exactly one forward reference) 0.5 (1 – 0.5) = 0.25 (1/4 of the nodes will have 2 references) (1 – 0.5) = (1/8 of the nodes will have 3 references) (1 – 0.5) = (1/16 of the nodes will have 4 references) Work out the distribution for p = 0.25 (1/4) for yourself.

8/3/2007 UMBC CMSC 341 SkipList 14 Determining the Size of the Header Node The size of the header node (the number of forward references it has) is the maximum size of any node in the skip list and is chosen when the empty skip list is constructed (i.e. it must be predetermined) Dr. Pugh has shown that the maximum size should be chosen as log 1/p N. For p = ½, the maximum size for a skip list with 65,536 elements should be no smaller than log = 16.

8/3/2007 UMBC CMSC 341 SkipList 15 Performance Considerations The expected time to find an element (and therefore to insert or remove) is O( lg N ). It is possible for the time to be substantially longer if the configuration of nodes is unfavorable for a particular operation. Since the node sizes are chosen randomly, it is possible to get a “bad” run of sizes. For example, it is possible that each node will be generated with the same size, producing the equivalent of an ordinary linked list. A “bad” run of sizes will be less important in a long skip list than in a short one. The probability of poor performance decreases rapidly as the number of nodes increases.

8/3/2007 UMBC CMSC 341 SkipList 16 More performance The probability that an operation takes longer than expected is function of the associated probability p. Dr. Pugh calculated that with p = 0.5 and 4096 elements, the probability that the actual time will exceed the expected time by more than a factor of 3 is less than one in 200 million. The relative time and space performance depends on p. Dr. Pugh suggests p = 0.25 for most cases. If the predictability of performance is important, then he suggests using p = 0.5 (the variability of the performance decreases with larger p). Interestingly, the average number of references per node is only 1.33 when p = 0.25 is used. A BST has 2 references per node, so a skip list is more space-efficient.

8/3/2007 UMBC CMSC 341 SkipList 17 Skip List Implementation public class SkipList >{ private static class SkipListNode { void setDatum(AnyType datum){ } void setForward(int i, SkipListNode f){ } void setSize(int size){ } SkipListNode(){ } SkipListNode(AnyType datum, int size){ } SkipListNode(SkipListNode c){ } AnyType getDatum(){ } int getSize(){ } SkipListNode getForward(int level){ } private int m_size; private Vector m_forward; private Vector m_datum; }

8/3/2007 UMBC CMSC 341 SkipList 18 Skip List Implementation (cont.) SkipList(){} SkipList(int max_node_size, double probab){} SkipList(SkipList ref) {} int getHighNodeSize(){} int getMaxNodeSize(){} double getProbability(){} void insert( AnyType item){} boolean find( AnyType item){} void remove( AnyType item){} private SkipListNode find(AnyType item, SkipListNode start){} private SkipListNode getHeader(){} private SkipListNode findInsertPoint( AnyType item, int nodesize){} private boolean insert( AnyType item, int nodesize){} private int m_high_node_size; private int m_max_node_size; private double m_prob; SkipListNode m_head; }

8/3/2007 UMBC CMSC 341 SkipList 19 find boolean find(Comparable x) { node = header node for(reference level of node from (nodesize-1) down to 0) while (the node referred to is less than x) node = node referred to if (node referred to has value x) return true else return false }

8/3/2007 UMBC CMSC 341 SkipList 20 findInsertPoint Ordinary list insertion: Have handle (iterator) to node to insert in front of Skip list insertion: Need handle to all nodes that skip to node of given size at insertion point (all “see-able” nodes). Use backLook structure with a pointer for each level of node to be inserted

8/3/2007 UMBC CMSC 341 SkipList 21 Insert 6.5

8/3/2007 UMBC CMSC 341 SkipList 22 In the figure, the insertion point is between nodes 6 and 7. “Looking” back towards the header, the nodes you can “see” at the various levels are level node seen header We construct a “backLook” node that has its forward pointers set to the relevant “see-able” nodes. This is the type of node returned by the findInsertPoint method

8/3/2007 UMBC CMSC 341 SkipList 23 insert Method Once we have the backLook node returned by findInsertPoint and have constructed the new node to be inserted, the insertion is easy. The public insert( AnyType x) decides on the new nodes size by random choice, then calls the overloaded private insert( AnyType x, int nodeSize) to do the work. Code in C is available in Dr. Anastasio’s HTML version of these notes.