CMSC 341 Skip Lists.

Slides:



Advertisements
Similar presentations
CMSC420: Skip Lists Kinga Dobolyi Based off notes by Dave Mount.
Advertisements

CS 225 Lab #11 – Skip Lists.
The Dictionary ADT: Skip List Implementation
CMSC 341 Binary Heaps Priority Queues. 8/3/2007 UMBC CSMC 341 PQueue 2 Priority Queues Priority: some property of an object that allows it to be prioritized.
Skip List & Hashing CSE, POSTECH.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
INTRODUCTION TO AVL TREES P. 839 – 854. INTRO  Review of Binary Trees: –Binary Trees are useful for quick retrieval of items stored in the tree –order.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
Searching: Binary Trees and Hash Tables CHAPTER 12 6/4/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education,
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
BINARY SEARCH TREE. Binary Trees A binary tree is a tree in which no node can have more than two children. In this case we can keep direct links to the.
CSC 211 Data Structures Lecture 13
CMSC 341 Binary Heaps Priority Queues. 2 Priority: some property of an object that allows it to be prioritized WRT other objects (of the same type) Priority.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CMSC 341 Skip Lists. 8/3/2007 UMBC CMSC 341 SkipList 2 Looking Back at Sorted Lists Sorted Linked List What is the worst case performance of find( ),
© Love Ekenberg Hashing Love Ekenberg. © Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
CMSC 341 Binary Heaps Priority Queues. 2 Priority: some property of an object that allows it to be prioritized WRT other objects (of the same type) Priority.
Chapter 3 The easy stuff. Lists If you only need to store a few things, the simplest and easiest approach might be to put them in a list Only if you need.
CSCI  Sequence Containers – store sequences of values ◦ vector ◦ deque ◦ list  Associative Containers – use “keys” to access data rather than.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
AVL Trees. AVL Tree In computer science, an AVL tree is the first-invented self-balancing binary search tree. In an AVL tree the heights of the two child.
Data Structures and Algorithm Analysis Dr. Ken Cosh Linked Lists.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Chapter 4 The easy stuff.
Top 50 Data Structures Interview Questions
Multiway Search Trees Data may not fit into main memory
Design & Analysis of Algorithm Priority Queue
COMP 53 – Week Eleven Hashtables.
Binary Search Trees A binary search tree is a binary tree
CSC212 Data Structure - Section AB
Data Structure and Algorithms
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Searching an Array: Binary Search
Hashing Exercises.
Review for Final Exam Non-cumulative, covers material since exam 2
Review for Final Exam Non-cumulative, covers material since exam 2
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Linked Lists.
Object Oriented Programming COP3330 / CGS5409
Arrays and Linked Lists
CMSC 341 Binary Search Trees 11/19/2018.
CMSC 341 Skip Lists 1.
CMSC 341 Lecture 14 Priority Queues & Heaps
B- Trees D. Frey with apologies to Tom Anastasio
CMSC 341 Binary Search Trees.
B- Trees D. Frey with apologies to Tom Anastasio
CMSC 341 Skip Lists.
CMSC 341 Skip Lists.
Trees CMSC 202, Version 5/02.
CMSC 341 Binary Search Trees 1/3/2019.
Searching CLRS, Sections 9.1 – 9.3.
B- Trees D. Frey with apologies to Tom Anastasio
Data Structures Sorted Arrays
CMSC 341 Lists 3.
CMSC 341 Lists 3.
Advanced Implementation of Tables
CMSC 341 Binary Search Trees 2/23/2019.
CMSC 341 Binary Search Trees.
Data Structures & Algorithms
Amortized Analysis and Heaps Intro
CSCI 104 Skip Lists Mark Redekopp.
Mark Redekopp David Kempe
CMSC 341 Binary Search Trees 5/8/2019.
Data Structures and Algorithm Analysis Priority Queues (Heaps)
Lists CMSC 202, Version 4/02.
CMSC 341 Binary Search Trees 5/28/2019.
Lists CMSC 202, Version 4/02.
Lecture-Hashing.
Presentation transcript:

CMSC 341 Skip Lists

Looking Back at Lists Sorted List Performance find, insert: what is worst case? If the sorted array implementation is acceptable, it has one big advantage over a sorted Linked List which is….. Can do binary search on an array, but not on LL

An Alternative What if you skip every other node? Every other node has a pointer to the next and the one after that Find : follow skip pointer until target < this.skip.element Resources Additional storage: Performance:

Skipping every 2nd Node The value stored in each node is shown below the node and corresponds to the the position of the node in the list. It’s clear that find does not need to visit every node. It can skip over every other node, then do a final visit at the end. The number of nodes visited is no more than n/2 + 1. For example the nodes visited finding for the value 15 would be 2, 4, 6, 8, 10, 12, 14, 16, 15 a total of 16/2 + 1 = 9.

Skipping every 2nd and 4th Node The find operation can now make bigger skips than the previous example. Every 4th node is skipped until the search is confined between two nodes of size 3. At this point as many as three nodes may need to be scanned. It’s also possible that some nodes may be visited more than once. The number of nodes visited is no more than n / 4 + 3. Again, look at the nodes visited when searching for 15 Searching for 15: visit 4, 8, 12, 16, 14, 16, 15 for a total of 16/4 + 3 = 7.

New, Improved Alternative Add hierarchy of skip pointers every 2i-th node points 2i nodes ahead For example, every 2nd node has a reference 2 nodes ahead; every 8th node has a reference 8 nodes ahead But what about insert? Reorganization costs O(n)

Skipping every 2i-th node Suppose this list contained 32 nodes and we want to search for some value in it. Working down from the top, we first look at node 16 and have cut the search in half. When we look again one level down in either the right or left half, we have cut the search in half again. We continue in this manner until we find the node being sought (or not). This is just like binary search in an array. Intuitively we can understand why the max number of nodes visited is O(lg N).

Some serious problems This structure looks pretty good, but what happens when we insert or remove a value from the list? Reorganizing the the list is O(N) For example, suppose the first element of the list was removed. Since it’s necessary to maintain the strict pattern of node sizes, it’s easiest to move all the values toward the head and remove the end node. A similar situation occurs when a new node is added

Skip Lists Concept: A skip list maintains the same distribution of nodes, but without the requirement for the rigid pattern of node sizes 1/2 have 1 pointer 1/4 have 2 pointers 1/8 have 3 pointers … 1/2i have i pointers It’s no longer necessary to maintain the rigid pattern by moving values around for insert and remove. This gives us a high probability of still having O(lg N) performance. The probability that a skip list will behave badly is very small. The number of forward reference pointers a node has is its “size”

A Probabilistic Skip List The distribution of node sizes is exactly the same as the previous figure, the nodes just occur in a different pattern.

Inserting a node When inserting a new node, we choose the size of the node probabilistically. Every skip list has an associated (and fixed) probability, p, that determines the distribution of node sizes. A fraction, p, of the nodes that have at least r forward references also have r + 1 forward references.

Skip List Insert To insert node: create new node with random size for each pointer, i , connect to next node with at least i pointers int generateNodeSize(double p, int maxSize) { int size = 1; while (drand48() < p) size++; return (size >maxSize) ? maxSize : size; }

An aside on Node Distribution Given an infinitely long skip list with associated probability p, it can be shown that 1 – p nodes will have just one forward reference. This means that p(1 – p) nodes will have exactly two forward references and in general pk(1 – p) nodes will have k + 1 forward reference pointers. For example, with p = 0.5 0.5 (1/2) of the nodes will have exactly one forward reference 0.5 (1 – 0.5) = 0.25 (1/4) of the nodes will have 2 references 0.52 (1 – 0.5) = 0.125 (1/8) of the nodes will have 3 references 0.53 (1 – 0.5) = 0.0625 (1/16) of the nodes will have 4 references Work out the distribution for p = 0.25 (1/4) for yourself

Determining the size of the Header Node The size of the header node (the number of forward references it has) is the maximum size of any node in the skip list and is chosen when the empty skip list is constructed (i.e. it must be predetermined) Dr. Pugh has shown that the maximum size should be chosen as log 1/p N. For p = ½, the maximum size for a skip list with 65,536 elements should be no smaller than log 2 65536 = 16.

Performance considerations The expected time to find an element (and therefore to insert or remove) is in O(lg N). It is possible for the time to be substantially longer if the configuration of nodes is unfavorable for a particular operation. Since the node sizes are chosen randomly, it is possible to get a “bad” run of sizes. For example, it is possible each node will be generated with the same size, producing the equivalent of an ordinary linked list. A “bad” run of sizes will be less important in a long skip list than in a short one. The probability of poor performance decreases rapidly as the number of nodes increases.

More performance The probability that an operation takes longer than expected is function of the associated probability p. Dr. Pugh calculated that with p = 0.5 and 4096 elements, the probability that the actual time will exceed the expected time by more than a factor of 3 is less than one in 200 million. The relative time and space performance depends on p. Dr. Pugh suggests p = 0.25 for most cases. If the variability of performance is important, then he suggests using p = 0.5 (the variability decreases with larger p). Interestingly, the average number of references per node is only 1.33 when p = 0.25 is used. A BST has 2 references per node, so a skip list is more space-efficient.

Skip List Implementation template <class Comparable> class SkipList { private: class SkipListNode { public: void setDatum(const Comparable &datum); void setForward(int I, SkipListNode *f); void setSize(int sz); SkipListNode(); SkipListNode(const Comparable& datum, int size); SkipListNode(const SkipListNode &c); ~SkipListNode(); const Comparable & getDatum() const; int getSize() const; SkipListNode *getForward(int level);

Skip List Implementation (cont) private: // SkipListNode int _size; vector <SkipListNode*> _forward; Comparable _datum; }; // SkipListNode public: SkipList() SkipList(int max_node_size, double probab); SkipList(const SkipList &); ~SkipList(); int getHighNodeSize() const; int getMaxNodeSize() const; double getProbability() const; void insert (const Comparable &item); const Comparable &find(const Comparable &item, bool &success); void remove(const Comparable &item);

Skip List Implementation (cont) private: // SkipList SkipListNode *find(const Comparable *item, SkipListNode *startnode); SkipListNode *getHeader() const; SkipListNode *findInsertPoint(const Comparable &item, int nodesize); void insert(const Comparable &item, int nodesize, bool &success); int _high_node_size; int _max_node_size; double _prob; SkipListNode *_head; };

find Comparable &find(const Comparable &x) { node = header node for(reference level of node from (nodesize-1) down to 0) while (the node referred to is less than x) node = node referred to if (node referred to has value x) return value in node referred to else return item_not_found }

findInsertPoint Ordinary list insertion: have handle to node to insert after Skip list insertion: need handle to all nodes that skip to node of given size at insertion point (all “see-able” nodes) Use backLook structure with a pointer for each level of node to be inserted

Insert 6.5 Want to insert 6.5.

In the figure, the insertion point is between nodes 6 and 7 In the figure, the insertion point is between nodes 6 and 7. “Looking” back towards the header, the nodes you can “see” at the various levels are level node seen 0 6 1 6 2 4 3 header We construct a “backLook” node that has its forward pointers set to the relevant “see-able” nodes. This is the type of node returned by the findInsertPoint method

insert method Once we have the backLook node returned by findInsertPoint and have constructed the new node to be inserted, the insertion is easy. The public insert( const Comparable& x) decides on the new nodes size by random choice, then calls the overloaded private insert( const Comparable& x, int nodeSize) to do the work. Code is available in Dr. Anastasio’s HTML version of these notes