Lower and Upper Bounds on Obtaining History Independence

Slides:



Advertisements
Similar presentations
CS16: Introduction to Data Structures & Algorithms
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Introduction to Algorithms Quicksort
Two Segments Intersect?
110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Greedy Algorithms Greed is good. (Some of the time)
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
1 HeapSort CS 3358 Data Structures. 2 Heapsort: Basic Idea Problem: Arrange an array of items into sorted order. 1) Transform the array of items into.
1 Heaps & Priority Queues (Walls & Mirrors - Remainder of Chapter 11)
September 19, Algorithms and Data Structures Lecture IV Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
More sorting algorithms: Heap sort & Radix sort. Heap Data Structure and Heap Sort (Chapter 7.6)
Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
History Independent Data-Structures. What is History Independent Data-Structure ? Sometimes data structures keep unnecessary information. –not accessible.
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
PQ, binary heaps G.Kamberova, Algorithms Priority Queue ADT Binary Heaps Gerda Kamberova Department of Computer Science Hofstra University.
Heapsort CIS 606 Spring Overview Heapsort – O(n lg n) worst case—like merge sort. – Sorts in place—like insertion sort. – Combines the best of both.
Randomized Algorithms - Treaps
Heapsort Based off slides by: David Matuszek
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
Brought to you by Max (ICQ: TEL: ) February 5, 2005 Advanced Data Structures Introduction.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
COMP20010: Algorithms and Imperative Programming Lecture 4 Ordered Dictionaries and Binary Search Trees AVL Trees.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
Advanced Data Structure Hackson Leung
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Data Structure & Algorithm II.  In a multiuser computer system, multiple users submit jobs to run on a single processor.  We assume that the time required.
Algorithms and data structures Protected by
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
3.1. Binary Search Trees   . Ordered Dictionaries Keys are assumed to come from a total order. Old operations: insert, delete, find, …
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
S. Raskhodnikova and A. Smith. Based on slides by C. Leiserson and E. Demaine. 1 Adam Smith L ECTURES Priority Queues and Binary Heaps Algorithms.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Advanced Data Structure By Kayman 21 Jan Outline Review of some data structures Array Linked List Sorted Array New stuff 3 of the most important.
Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   
1 Chapter 6 Heapsort. 2 About this lecture Introduce Heap – Shape Property and Heap Property – Heap Operations Heapsort: Use Heap to Sort Fixing heap.
Heaps, Heap Sort, and Priority Queues. Background: Binary Trees * Has a root at the topmost level * Each node has zero, one or two children * A node that.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
"Teachers open the door, but you must enter by yourself. "
DAST Tirgul 7.
School of Computing Clemson University Fall, 2012
Top 50 Data Structures Interview Questions
Priority Queues and Heaps
October 30th – Priority QUeues
Heap Sort Example Qamar Abbas.
Draw pictures to indicate the subproblems middleMax solves at each level and the resulting maxPtr and PrevPtr for each on this linked list:
Enumerating Distances Using Spanners of Bounded Degree
"Teachers open the door, but you must enter by yourself. "
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
EE 312 Software Design and Implementation I
Presentation transcript:

Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel

What is History Independent Data-Structure ? Sometimes data structures keep unnecessary information. not accessible via the legitimate interface of the data structure, can be restored from the data-structure layout. A privacy issue if an adversary gains control over the data-structure layout. The core problem: history of operations applied on the data-structure may be revealed.

Example Data structure with three operations: Insert(D, x) Remove(D, x) Print(D) Used for a wedding invitee list. Naive Implementation – an array. Insert – adds last entry. Remove entry i – move entries i+1 to n backwards (wiser implementation - linked list on an array) Layout implies the order. For example, who was invited last !

Weak History Independence [Naor, Teague]: A Data structure implementation is (weakly) History Independent if: Any two sequences of operations S1 and S2 that yield the same content induce the same distribution on memory layout. Security: Nothing gained from layout beyond the content.

The array is a uniformly chosen permutation on the elements Example – cont. Making the previous data structure weakly history independent: Insert(x): (say, n elements in data-structure) Choose uniformly at random r {1,2,…,n+1} Set A[n+1]  A[r]; A[r]  x Remove entry i: A[i]  A[n] The array is a uniformly chosen permutation on the elements

Weak History Independence Problems No Information leaks if adversary gets layout once (e.g., the laptop was stolen). But what if adversary may get layout several times ? Information on content modifications leaks. We want: no more information leakage.

Strong History Independence [Naor-Teague]: A Data structure implementation is (Strongly) History Independent if: Pair of sequences S1, S2 two lists of stop points in S1, S2 If content is the same in each pair of corresponding stop points Then: Joint distribution of memory layouts at stop points is identical in the two sequences. Security: We cannot distinguish between any such two sequences.

Strong History Independence First stop Second stop S1 = ins(1), ins(2), ins(3), ins(4) S2 = ins(2), ins(1), ins(5), ins(4), ins(3), del(5) We should not be able to tell from the layouts which of the two sequences happened

Is this implementation strongly history independent ? Example – cont. Recall example: Insert(x) : (say, n elements in database) Choose uniformly at random r {1,2,…,n+1} Set A[n+1]  A[r]; A[r]  x Remove entry i: A[i]  A[n] Is this implementation strongly history independent ? No !

Example – cont. Assume you get the layout of the array twice: First time you see: Second time you see: 1 2 3 4 5 5 2 3 4 1 What could not happen: The empty sequence Remove(4), Insert(4) Lots of other constraints…

Each content has only one possible layout. Example – last Making the data structure strongly history independent We can keep the array aligned left and sorted. Each content has only one possible layout. Problem: The time complexity of Insert and Remove is Ω(n), (“Usually” shift Ω(n) elements during insert or delete)

A Short History of History Independence [Micciancio97] Weak history independent 2-3 tree (motivated by the problem of private incremental cryptography [BGG95]). [Naor-Teague01] History-independent hash-table, union-find. Weak history-independent memory allocation. All above results are efficient. [HHMPR02] Strong history independence means canonical layout. Relaxation of strong history independence. History independent memory resize.

Our Results Strong history independence implies canonical memory layout. Separations between strong & weak (lower bounds): Strong requires a much higher efficiency penalty in the comparison based model. Proving (almost) the same lower bounds to a relaxed version of strong history independence. Implementations (upper bounds): The heap has a weakly history independent implementation with no time complexity penalty.

Weak History Independence Strong History Independence Bounds Summary Operation Weak History Independence Strong History Independence heap: insert O(log n) Ω(n) heap: increase-key heap: extract-max No lower bound heap: build-heap O(n) Ω(n log n) queue: max{ insert-first, remove-last} O(1)

Why is Comparison Based implementation important? It is “natural”: Standard implementations for most data structure operations are like that. Therefore, we should know not to design this way when seeking strong history independence Library functions are easy to use: Only implement the comparison operation on data structure elements.

What’s Next Strong History Independence means Canonical Representation. Lower Bounds on strong history independence. Lower Bounds on relaxed strong history independence. Obtaining a weak history independent heap.

Strong History Independence = Canonical Representation Definition [content graph]: The content graph of data-structure: Vertices: The possible contents. Edges: C1  C2 if  operation OP and parameters σ such that OP(C1, σ)= C2. Definition [well behaved]: An abstract data-structure is well behaved if its content graph is strongly connected.

Strong History Independence = Canonical Representation Lemma: For any strongly history independent implementation of a well behaved data-structure:  layout L,  operation Op, Op(L) yields only one possible layout. Corollary: Any strongly history independent implementation of well-behaved data-structure is canonical.

Canonical Representation: Proof cont. Corollary: Any strongly history independent implementation of well-behaved data-structure is canonical. Proof sketch (assuming the lemma): Let S be a sequence of operations yielding content C. Each operation in S generates one layout.  By induction S yields one possible layout. By strong history independence any other sequence yielding C creates the same layout.

Canonical Representation Proof of Lemma Lemma: For any strongly history independent implementation of a well behaved data-structure:  layout L,  operation Op, Op(L) yields only one possible layout. Assuming well-behaved, any operation Op has a sequence OP-1 that “reverses” Op. Assuming strong history independence we may set any two sequences with stop points.

Canonical Representation Proof of Lemma Proof sketch: Fix any layout L, fix any operation Op. We need to show that Op(L) yields a single specific layout L’. Let S be any sequence of operation yielding L with probability > 0. Consider the following sequences with the following ‘stop’ points: 1 2 1 2 S1 = S S2 = S ◦ Op ◦ OP-1 The two stop points are the same in S1. The same layout must also appear in S2.

Canonical Representation Proof of Lemma 1 2 1 2 S1 = S S2 = S ◦ Op ◦ OP-1 Suppose L appears after S. L must appear again at the end of S2. Otherwise, we could distinguish between the two sequences. For any Li =Op(L), Op-1 must transform Li to L with probability 1. L L2 L1 Lk Op Op-1

Canonical Representation: Proof Now let’s extend the sequence and modify stop points: 1 2 1 2 S3 = S ◦ Op S4 = S ◦ Op ◦ Op-1 ◦ Op Suppose some Li=Op(L) appears after S ◦ Op. Li must appear also at the end of S4. Otherwise, we could distinguish between the two sequences. After Op-1 the layout is again L. The operation of Op depends only on L. Op cannot “know” which Li to create. There is only one Li = Op(L) L L2 L1 Lk Op Op-1

What’s Next Strong History Independence means Canonical Representation. Lower Bounds on strong history independence. Lower Bounds on relaxed strong history independence. Obtaining a weak history independent heap.

Lower Bounds: an example Lemma: D: Data-structure whose content is the set of keys stored inside it. I: Implementation of D that is : comparison-based and canonical. The operation Insert(D, x) requires time Ω(n). This lemma applies for example to: Heaps, Dictionaries, Search trees.

Lower Bounds – cont. Proof sketch: comparison-based: keys are treated as ‘black boxes’ according to the comparison order.  The algorithm treats any n keys only according to their total order.  The canonical layout of any n different keys is the same no matter what their real values are. d1, d2, … dn - memory addresses of n keys in the layout according to their total order. d’1, d’2, … d’n+1 - memory addresses of n+1 keys in the layout according to their total order.

Lower Bounds – cont.  The operation moves at least n/2 keys. Δ: The number of indices for which di  d’i Consider the content C = {k2, k3, … , kn+1} k2< k3< … < kn+1: Case 1 - Δ > n/2 - consider insert(C, kn+2): Puts kn+2 in address d’n+1. Moves each ki (2  i  n+1) from di-1 to d’i-1.  The operation moves at least n/2 keys. Case 2 - Δ  n/2 - consider insert(C, k1): Puts k1 in d’1 Moves each ki (2  i  n+1) from di-1 to d’i. The operation moves at least n/2 keys.

More Lower Bounds By similar methods we can show: Remove-key requires time Ω(n). For a Heap: Increase-key requires time Ω(n). Build-Heap Operation requires time Ω(n log n). For a queue: either Insert-first or Remove-Last requires time Ω(n).

What’s Next Strong History Independence means Canonical Representation. Lower Bounds on strong history independence. Lower Bounds on relaxed strong history independence. Obtaining a weak history independent heap.

Relaxed strong history independence Strong history independence implies very strong lower bounds. How can we relax the definition allowing more efficient data structures ? One possible way [HHMPR02 ]: Allowing the adversary to distinguish between the empty sequence and other sequences. Does this definition implies canonical memory layout ?

Relaxed strong history independence (cont.) The relaxed definition does not implies canonical memory layout. Possible implementation of previous data structure: In each operation - choose a new independent uniformly chosen permutation of the elements. Not canonical … Relaxed strong history independent. Each operation - O(n)

Relaxed strong history independence Is this relaxation enough ? (for efficient implementations) No We may prove almost the same lower bounds using different property of these data structures.

What’s Next Strong History Independence means Canonical Representation. Lower Bounds on strong history independence. Lower Bounds on relaxed strong history independence. Obtaining a weak history independent heap.

The Binary Heap Binary heap - a simple implementation of a priority queue. The keys are stored in an almost full binary tree. Heap property - For each node i: V(parent(i))  V(i) Assume that all values in the heap are unique. 10 7 9 3 6 4 8 1 5 2

The Binary Heap: Heapify Heapify - used to preserve the heap property. Input: a root and two proper sub-heaps of height  h-1. Output: a proper heap of height h. 2 10 9 3 6 7 8 1 5 4 The node always chooses to sift down to the direction of the larger value.

Heapify Operation 2 10 7 9 3 6 4 8 1 5 2 10 9 6 7 8 3 1 5 4

Reversing Heapify heapify-1: “reversing” heapify: Heapify-1(H: Heap, i: position) Root  vi All the path from the root to node i are shifted down. 10 7 9 3 6 4 8 1 5 2 The parameter i is a position in the heap H

Heapify-1 Operation Heapify(Heapify-1(H, i)) = H 10 2 10 9 3 6 7 8 1 5 4 7 9 6 4 8 3 1 5 2 Property: If all the keys in the heap are unique then for any i: Heapify(Heapify-1(H, i)) = H

The Binary Heap: Build-heap in O(n) Building a heap - applying heapify on any sub-tree in the heap in a bottom up manner. Time Complexity 10 7 9 3 6 4 8 1 5 2

Works in a Top-Bottom manner Reversing Build-heap Works in a Top-Bottom manner Build-Heap-1(H: heap) : Tree If size(H) = 1 then return (H); Choose a node i uniformly at random among the nodes in the heap H; H  Heapify-1(H, i); Return TREE(root(H), build-heap-1(HL), build-heap-1(HR)); For any random choice: Build-heap(Build-heap-1(H)) = H

Uniformly Chosen Heaps Build-heap is a Many-To-One procedure. Build-heap-1 is a One-To-Many procedure depending on the random choices. Support(H) : The set of permutations (trees) such that build-heap(T) = H Facts (without proof): For each heap H the size of Support(H) is the same. Build-heap-1 returns one of these heaps uniformly.

How to Obtain a Weak History Independent Heap Main idea: keeping a uniformly random heap at all time. We want: Build-heap: Return one of the possible heaps uniformly. Other operations: preserve this property.

An Easy Implementation: Build-Heap Apply random permutation on the input elements and then use the standard build-heap. Analysis: Each heap has the same size of Support group  each heap has the same probability. More intuition: Applying random permutation on the elements erases all data about the order of the elements.  There is no information on the history.

Another Easy Implementation: Increase-key Standard Increase-key - changes the value of element and sift it up until it gets to the correct place. 9 10 9 8 3 6 7 4 1 5 2 7 8 6 10 4 3 1 5 2

The standard increase-key is good for us. Increase-key – cont. The standard increase-key is good for us. The increase-key operation is reversible: decreasing the value of the key back will return the key to its previous location. The number of heaps with n different keys is the same no matter of the actual values of keys. The increase-key function is 1-1.  If we had uniformly chosen heap then after increase-key it stays uniformly chosen heap.

Not So Easy: Extract-max and Insert The standard operation of extract-max: Extract-max(H) Replace the value at the root with the value of the last leaf. Let the value sift down to the right position. Is this good for us ? No !

Standard Extract-max is Not Good Three possible heaps with 4 elements: 4 2 3 1 1/3 4 3 2 1 1/3 4 3 2 1 1/3 4 3 1 2 1/3 3 1 2 3 2 1 3 2 1 One heap has probability 1/3 while the other has probability of 2/3 !

Naive Implementation: Extract-max Extract-max(H) T = build-heap-1(H) Remove the last node v in the tree (T’). H’ = build-heap(T’) If we already removed the maximal value return H’ Otherwise: Replace the root with v and let v sift down to its correct position. build-heap-1 and build-heap works in O(n) … but this implementation is history independent.

Analysis: Extract-max Extract-max(H) T = build-heap-1(H) Remove the last node v in the tree (T’). T is a random uniform permutation on the n+1 keys of the heap. T’ is a random uniform permutation on n keys of the heap excluding the random key v. H’ = build-heap(T’) H’ is a random uniform heap on the n original keys of the heap excluding a random key v.

Analysis: Extract-max If we already removed the maximal value return H’ Otherwise: Replace the root with v and let v sift down to its correct position. If we already removed the maximal value we are done. Otherwise: This is just applying increase/decrease-key on the value at the root. (this is a 1-1 function …)

Improving Complexity: Extract-max First 3 steps of Extract-max(H) T = build-heap-1(H) Remove the last node v in the tree. H’ = build-heap(T’) Main problem - steps 1 to 3 that takes O(n). Simple observation reduces the complexity of these steps to O(log2(n)) instead of O(n)

Reducing the Complexity to O(log2(n)) Observation: Most of the operations of build-heap-1 are redundant. they are always canceled by the operation of build-heap. Only the operations applied on nodes lying on the path from the root to the last leaf are really needed. 10 7 9 3 6 4 8 1 5 2

Reducing the Complexity to O(log2(n)) Complexity analysis: Each heapify-1 and heapify operation takes at most O(log n). There are O(log n) such operations. 10 7 9 3 6 4 8 1 5 2

Reducing the Complexity: O(log(n)) Expected Time This is the most complex part Main ideas: We can show that there are actually O(1) operations of heapify-1 and heapify that make a differnce (in average over the random choices made by the algorithm in each step). We can detect these operations and apply only them.

The Insert Operation The standard implementation of insert is not good for us. Good implementation must use randomization in order to be efficient (otherwise it should be canonical …) Making insert history independent is not easy. The general method is similar to Extract-max. The most difficult part is again reducing the complexity from O(log2n) to O(log n) expected time.

Conclusions Demanding strong history independence usually requires a high efficiency penalty in the comparison based model. Weak history independent heap in the comparison-based model without panelty, Complexity: build-heap - O(n) worst case. increase-key - O(log n) worst case. extract-max, insert- O(log n) expected time, O(log2n) worst case.

Thank you Open Questions Can We show separation between weak and strong History independence in the non-comparison model ? History independent implementation of other, more complex, data structures. Thank you