Selection in heaps and row-sorted matrices

Slides:



Advertisements
Similar presentations
Fibonacci Heaps Especially desirable when the number of calls to Extract-Min & Delete is small (note that all other operations run in O(1) This arises.
Advertisements

Single Source Shortest Paths
Priority Queues  MakeQueuecreate new empty queue  Insert(Q,k,p)insert key k with priority p  Delete(Q,k)delete key k (given a pointer)  DeleteMin(Q)delete.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Advanced Data structure
Rank-Pairing Heaps Robert Tarjan, Princeton University & HP Labs Joint work with Bernhard Haeupler and Siddhartha Sen, ESA
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
ANALYSIS OF SOFT HEAP Varun Mishra April 16,2009.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
The Power of Incorrectness A Brief Introduction to Soft Heaps.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
1 Chapter 6 Heapsort. 2 About this lecture Introduce Heap – Shape Property and Heap Property – Heap Operations Heapsort: Use Heap to Sort Fixing heap.
1 Fibonacci heaps: idea List of multiway trees which are all heap-ordered. Definition: A tree is called heap-ordered if the key of each node is greater.
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Fibonacci Heaps. Fibonacci Binary insert O(1) O(log(n)) find O(1) N/A union O(1) N/A minimum O(1) O(1) decrease key O(1) O(log(n)) delete O(log(n) O(log(n))
Leftist Trees Linked binary tree.
Advanced Sorting 7 2  9 4   2   4   7
DAST Tirgul 7.
Priority Queues An abstract data type (ADT) Similar to a queue
Data Structures Binomial Heaps Fibonacci Heaps Haim Kaplan & Uri Zwick
Priority Queues © 2010 Goodrich, Tamassia Priority Queues 1
Priority Queues Chuan-Ming Liu
October 30th – Priority QUeues
Double-Ended Priority Queues
Priority Queues MakeQueue create new empty queue
Analysis and design of algorithm
Types of Algorithms.
Interval Heaps Complete binary tree.
Data Structures Lecture 4 AVL and WAVL Trees Haim Kaplan and Uri Zwick
Pairing Heaps Actual Complexity.
Minimum Spanning Tree Verification
Part-D1 Priority Queues
Ch 6: Heapsort Ming-Te Chi
Analysis of Algorithms
Types of Algorithms.
A simpler implementation and analysis of Chazelle’s
Uri Zwick – Tel Aviv Univ.
ערמות בינומיות ופיבונצ'י
Lectures on Graph Algorithms: searching, testing and sorting
CS 583 Analysis of Algorithms
Heap Sort CSE 2011 Winter January 2019.
Priority Queues (Chapter 6.6):
Data Structure and Algorithms
Priority Queues An abstract data type (ADT) Similar to a queue
Binary and Binomial Heaps
Topic 5: Heap data structure heap sort Priority queue
Sorting And Searching CSE116A,B 4/7/2019 B.Ramamurthy.
General External Merge Sort
Heapsort Sorting in place
Types of Algorithms.
Union-Find with Constant Time Deletions
CENG 351 Data Management and File Structures
Priority Queues (Heaps)
Binomial heaps, Fibonacci heaps, and applications
Priority Queues (Chapter 6):
Heaps By JJ Shepherd.
Priority Queues Supports the following operations. Insert element x.
The Selection Problem.
Heaps & Multi-way Search Trees
Analysis of Algorithms CS 477/677
CS210- Lecture 13 June 28, 2005 Agenda Heaps Complete Binary Tree
Data Structures and Algorithms
CMPT 225 Lecture 16 – Heap Sort.
Presentation transcript:

Selection in heaps and row-sorted matrices using soft heaps Haim Kaplan László Kozma Or Zamir Uri Zwick (武熠) 清华大学交叉信息研究院 May 15, 2018 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Generalized selection Given 𝑛 items from a totally ordered domain, with a partial order on them already known, find the 𝑘-th smallest item, or the set of 𝑘 smallest items. All algorithms in this talk are comparison-based. smaller Partial order larger

Generalized selection Given 𝑛 items from a totally ordered domain, with a partial order on them already known, find the 𝑘-th smallest item. The generalized sorting problem is well understood. Information-theoretic lower bound is essentially tight. Information-theoretic lower bound for generalized selection may be extremely loose. (E.g., finding the minimum) General answer for generalized selection is not known. Some interesting special cases were studied.

Some interesting selection problems Collection of sorted lists Doubly sorted matrix Binary min-heap “set of pairwise sums” 𝑋+𝑌= 𝑥+𝑦 | 𝑥∈𝑋,𝑦∈𝑌 (𝑋,𝑌 are not sorted) Studied by [Frederickson-Johnson (1982)] [Frederickson (1993)] We give simpler and somewhat improved algorithms.

Selecting the 𝑘-th smallest item in a heap 1 4 2 3 5 7 6 8 9 Already extracted Currently in the heap Trivial algorithm: 𝑂(𝑘 log 𝑘 ) Insert root into an auxiliary priority queue 𝑄. Repeat 𝑘 times: Extract the minimum item from 𝑄. Insert its two children into 𝑄. Not seen yet Returns the 𝑘 smallest items in sorted order.

Selecting the 𝑘-th smallest item in a heap 9 15 7 25 16 11 14 12 10 19 23 17 22 30 21 Non-trivial algorithm: 𝑂(𝑘) [Frederickson (1993)] 𝑘 log 𝑘 →𝑘 log log 𝑘 →𝑘 log log log 𝑘 → 𝑘 3 log ∗ 𝑘 → 𝑘 2 log ∗ 𝑘 → 𝑘 Matching the information-theoretic lower bound. We get an 𝑂(𝑘)-time algorithm by essentially running the trivial algorithm using a soft heap.

Fibonacci heaps (Hollow heaps) Heaps vs. Soft Heaps [Chazelle ’00] [Kaplan-Tarjan-Zwick ’13] Fibonacci heaps (Hollow heaps) Soft Heaps Make-heap 𝑂(1) Insert Extract-min 𝑂(log⁡𝑛) 𝑂(log⁡1/𝜀) Meld All bounds are amortized. Soft heaps may increase keys of items. Items with increased keys are corrupt. At most 𝜀𝐼 items in the heaps are corrupt, where 𝐼 is the total number of insertions.

Previous applications of Soft Heaps A deterministic 𝑂(𝑚𝛼(𝑚,𝑛))-time algorithm for finding minimum spanning trees [Chazelle ’00] An optimal deterministic algorithm for finding minimum spanning trees (with a yet unknown running time) [Pettie-Ramachandran ’02] New selection and approximate sorting algorithms [Chazelle ’00]

Deletions from a binary heap 9 15 7 25 16 11 14 12 10 19 23 17 22 30 21

Deletions from a binary heap 9 15 25 16 11 14 12 10 19 23 17 22 30 21

Deletions from a binary heap 15 25 16 11 14 12 10 19 23 17 22 30 21

Deletions from a binary heap 15 9 25 16 11 14 12 10 19 23 17 22 30 21

Deletions from a binary heap 15 9 25 16 11 14 12 19 23 17 22 30 21

Deletions from a binary heap 10 15 9 25 16 11 14 12 19 23 17 22 30 21

Deletions from a binary heap 10 15 9 25 16 11 14 19 23 17 22 30 21

Deletions from a binary heap 10 15 9 25 16 11 14 12 19 23 17 22 30 21

Deletions from a binary heap 10 15 9 25 16 11 14 12 19 23 17 22 30 21

Deletions from a binary heap 10 15 9 25 16 11 14 12 19 23 17 22 30 21

Binary heaps with lists [Kaplan-Tarjan-Zwick ’13] Corrupt key of all items in the list Original keys Tree is heap ordered with respect to corrupt keys 18 3 12 2 4 16 17 18 22 22 15 1 27 27 8 40 40 35 35 30 30 45 45 Each node has a list of items. (Most) items in lists of length>1 are corrupt.

Binary heaps with lists [Kaplan-Tarjan-Zwick ’13] Corrupt key of all items in the list Tree is heap ordered with respect to corrupt keys 18 3 12 2 4 16 17 18 22 22 15 1 27 27 8 40 40 35 35 30 30 45 45 Deleting an item of smallest corrupt key is easy. Until the list at the root becomes empty…

Double even refill [Kaplan-Tarjan-Zwick ’13] When a node of even rank 𝑘≥𝑡 is empty, fill it twice, concatenating the two lists of items. 𝑘 Move list of smaller child to root 18 56 1 12 2 4 16 17 18 44 10 45 56

Double even refill [Kaplan-Tarjan-Zwick ’13] When a node of even rank 𝑘≥𝑡 is empty, fill it twice, concatenating the two lists of items. 𝑘 18 1 12 2 4 16 17 18 Recursively fill the child 56 44 10 45 56

Double even refill [Kaplan-Tarjan-Zwick ’13] When a node of even rank 𝑘≥𝑡 is empty, fill it twice, concatenating the two lists of items. 18 1 12 2 4 16 17 18 If 𝑘≥𝑡 is even, do it again! 20 56 4 14 20 5 44 10 45 56 𝑡= log 3 𝜀 Moving lists takes a constant time.

Controlling corruption The size of a node of rank 𝑘 is at most: Corrupt items rank ≥𝑡 𝑠 𝑘 = 2 𝑘−𝑡 2 if 𝑘>𝑡 1 otherwise Claim: Number of nodes of rank 𝑘 is at most 𝑛/ 2 𝑘 rank <𝑡 uncorrupt items Number of corrupt items: 𝑛 𝑘≥𝑡 𝑠 𝑘 2 𝑘 =𝑂 𝑛 2 𝑡 =𝑂 𝜀𝑛 𝑡= log 3 𝜀

Soft Heaps – assumptions Insert and Meld do not corrupt items. All corruptions are caused by Extract-min, after extraction. Extract-min returns a list of newly corrupted items. 𝑒,𝐶 ← Extract-min(𝑄) Item with smallest (corrupt) key, extracted from the heap. List of newly corrupt items, corrupted after extracting 𝑒. (Remain in the heap.) 𝑒.𝑐𝑜𝑟𝑟𝑢𝑝𝑡 – Is 𝑒 corrupt?

Selection from a heap using a soft heap Run the naïve algorithm using a soft heap. When an item is extracted, insert its children, and the children of all items corrupted by the extraction, into the soft heap. 𝑄← Soft-Heap(𝜀) Insert(𝑄,𝑟) for 𝑖←1 to 𝑘−1: 𝑒,𝐶 ← Extract-min(𝑄) if not 𝑒.𝑐𝑜𝑟𝑟𝑢𝑝𝑡: 𝐶←𝐶∪ 𝑒 for 𝑒∈𝐶: Insert(𝑄,𝑒.𝑙𝑒𝑓𝑡) Insert(𝑄,𝑒.𝑟𝑖𝑔ℎ𝑡) Claim 1: After 𝑘−1 iterations, the 𝑘 smallest items were inserted into 𝑄. Claim 2: Total number of items inserted into 𝑄 is 𝑂(𝑘). Finish by finding the 𝑘-th smallest among the inserted items, using a standard selection algorithm.

Selection from a heap using a soft heap Proof of Claim 1 Claim 1: After 𝑘−1 iterations, the 𝑘 smallest items were inserted into 𝑄. After each iteration, 𝑄 constrains a barrier of uncorrupt items, and possibly some corrupt items above them. All other items above the barrier were already extracted from 𝑄. Corrupt items Barrier The item extracted at each iteration is smaller or equal to the smallest item on the barrier. (We rely on the assumption that corruption occurs only after extractions.) After 𝑖 iterations, the rank of the smallest item on the barrier is at least 𝑖+1. After 𝑘−1 iterations, the rank of the smallest item on the barrier is at least 𝑘. After 𝑘−1 iterations, the 𝑘 smallest items must be on or above the barrier, i.e., they were inserted into 𝑄, as claimed.

Selection from a binary heap Proof of Claim 2 Claim 2: Total number of items inserted into 𝑄 is 𝑂(𝑘). 𝐼 – Number of insertions 𝐶 – Number of corrupt items 𝐶 < 1+2𝜀 1−2𝜀 𝑘 𝐼<2𝑘+2𝐶 𝐶<𝑘+𝜀 𝐼 𝐼 < 2 1+ 1+2𝜀 1−2𝜀 𝑘 It is thus enough to take 𝜀< 1 2 , e.g., 𝜀= 1 4 . Each soft heap operation takes 𝑂(1) time. Total running time (and number of comparisons) is 𝑂(𝑘). Simple!

New “output-sensitive” result: Row-sorted matrices Select the 𝑘-th smallest item from a collection of 𝑚 sorted lists. 𝑚 𝑂 𝑚+𝑘 , 𝑂 𝑚 log 𝑘 𝑚 [Frederickson-Johnson (1982)] We immediately get 𝑂(𝑚+𝑘) using soft heaps. Number of items in the 𝑖-row that are among the 𝑘 smallest We also get a simple 𝑂 𝑚 log 𝑘 𝑚 algorithm. 𝑂 𝑚+ 𝑖=1 𝑚 log ( 𝑘 𝑖 +1) New “output-sensitive” result:

Row-sorted matrices - 𝑂 𝑚 log 𝑘 𝑚 Split each row into blocks of size 𝑘 2𝑚 . Select the smallest 2𝑚 block leaders. (Using the 𝑂 𝑚+𝑘 algorithm.) 𝑚 The 𝑘 smallest items must reside in at least 2𝑚 blocks. Thus, the selected 2𝑚 leaders must be among the 𝑘 smallest. At least 𝑚 blocks, i.e., the non-last blocks in each row, are fully contained in the set of 𝑘 smallest, and can be eliminated. In 𝑂(𝑚) time, 𝑘 was reduced to about 𝑘/2. After log 𝑘 𝑚 iterations, 𝑘 is down to 𝑚, and we use previous algorithm.

Row-sorted matrices - 𝑂 𝑚+ 𝑖=1 𝑚 log 𝑛 𝑖 Let 𝑛 𝑖 be the length of the 𝑖-th list. 𝑚 𝑚′ Long rows: 𝑛 𝑖 ≥𝑘/2𝑚. Let 𝑚 ′ be the number of long rows. The short rows contain at most 𝑘/2 items and are put “on hold”. The long rows contain at least 𝑘/2 items of the solution. Split the long rows into blocks of size 𝑘/(4 𝑚 ′ ). The 𝑘/2 smallest items in long rows reside in at least 2𝑚′ blocks. Select the 2𝑚′ smallest leaders in the long rows. At least 𝑚′ non-last blocks, containing at least 𝑘/4 items, eliminated.

Row-sorted matrices - 𝑂 𝑚+ 𝑖=1 𝑚 log 𝑛 𝑖 Let 𝑛 𝑖 be the length of the 𝑖-th list. 𝑚 𝑚′ Cost of iteration is 𝑂 𝑚 ′ , proportional to number of participating rows. Each iteration reduces 𝑘 to at most 3𝑘/4. Threshold (=𝑘/2𝑚) decreases exponentially to a constant. Row 𝑖 participates in at most the last 𝑂 log 𝑛 𝑖 iterations. Total running time is 𝑂 𝑚+ 𝑖=1 𝑚 log 𝑛 𝑖 .

Row-sorted matrices - 𝑂 𝑚+ 𝑖=1 𝑚 log ( 𝑘 𝑖 +1) Select the 𝑘 smallest items from a collection of 𝑚 sorted lists. Let 𝑘 𝑖 be the number of items selected from 𝑖-th row. 𝑚 Let 𝐿= 𝑖=1 𝑚 log (𝑘 𝑖 +1) . Split each row into blocks of sizes 1,2,4,… Note that 𝐿 is exactly the number of blocks that cover the 𝑘 smallest items. If we know 𝐿, or a tight upper bound ℓ on it, we could select the ℓ smallest leaders and use the 𝑛 𝑖 algorithm. If 𝐿 not know, try ℓ=𝑚, 2𝑚, 4𝑚, …

Concluding remarks Results for general partial orders? More applications of Soft Heaps? Thank you for your attention!