Chapter 7 Data Structure Transformations Basheer Qolomany
Outline 7.1 Making Structures Dynamic 7.2 Making Structures Persistent
7.1 Making Structures Dynamic Two types of data structures for solving searching problems: – A static structure is built once and then searched many times; insertions and deletions of elements are not allowed. – dynamic structure: This structure is initially empty, and the three operations available on it are for inserting a new element, for deleting a current element, and for performing a search.
Static & Dynamic Structures To describe the performance of the static structure A we give three functions of N : – P A (N) = the preprocessing time required to build A, – Q A (N) = the query time required to perform a search in A, and – S A (N) = the storage required to represent A. We analyze the performance of the dynamic structure B by giving the functions – I B ( N ) = the insertion time for B, – D B (N) = the deletion time for B, – Q B (N) = the query time required to perform a search in B, and – S B (N) = the storage required to represent B.
The Transformations The two well-studied problems here are how to make a static structure dynamic and how to allow queries in old states of a dynamic data structure. Static structures, like the interval trees: they are built once and then allow queries, but no changes of the underlying data. To make them dynamic, we want to allow changes in the underlying data. There are efficient construction methods that take the static data structure as a black box, which is used to build the new dynamic structure.The most important such class is the decomposable searching problems.
decomposable searching problems The notion of decomposable search problems, and the idea of a static-to-dynamic transformation, goes back to Bentley (1979). The underlying idea is always that the current set is partitioned in a number of blocks X = X 1 ∪ · · · ∪ X m. Each block is stored by one static structure; queries are answered by querying each of these static structures, and updates are performed by rebuilding one or several blocks.
decomposable searching problems The original method in Bentley (1979) uses only blocks whose size is a power of two, and only one block of each size. Thus there are at most log n blocks. This gives a bad worst-case complexity because we might have to rebuild everything into one structure; but the structure of size 2 i is rebuilt only when the ith bit of n changes, which is every 2 i−1 th step.
Insertion
If preproc(k) is the time to build a static structure of size k, then the total time of the first n inserts is: Thus the amortized insertion time in a set of n elements is
Theorem Given a static structure for a decomposable searching problem that can be built in time O (n(log n) c ) and that answers queries in time O(log n) for an n-element set, the exponential-blocks transformation gives a structure for the same problem that supports insertion in amortized O((log n) c+1 ) time and queries in worst-case O((log n) 2 ) time.
This method is not useful for deletion; if we delete an element from the largest block, we have to rebuild everything, so we can easily construct a sequence of alternating insert and delete operations, in which each time the entire structure has to be rebuilt. A method that also supports deletion partitions the set in Θ(√n) blocks of size O(√n).
Theorem Given a static structure for a decomposable searching problem that can be built in time O (n(log n) c ) and that answers queries in time O(log n) for an n-element set, the √n-blocks transformation gives a structure for the same problem that supports insertion and deletion in O(√n(log n) c ) time and queries in O(√n log n) time, all times worst case.
Weak deletion A weak deletion deletes the element, so that the queries are answered correctly, but the time bound for subsequent queries and weak deletions does not decrease. If we combine the weak deletion with the exponential- blocks idea, we get the following structure: The current set is partitioned into blocks, where each block has a nominal size and an actual size. The nominal size is a power of 2, with each power occurring at most once. The actual size of a block with nominal size 2 i is between 2 i−1 + 1 and 2 i.
To delete an element, we find its block and perform a weak deletion, decreasing the actual size. If by this the actual size of the block becomes 2 i−1, we check whether there is a block of nominal size 2 i−1 ; if there is none, we rebuild the block of actual size 2 i−1 as block of nominal size 2 i−1. Else, we rebuild the block of actual size 2 i−1 together with the elements of the block of nominal size 2 i−1 as block of nominal size 2 i. To insert an element, we create a block of size 1 and perform the binary addition of the blocks, based on their nominal size. To query, we perform the query for each block.
Theorem Given a static structure for a decomposable searching problem that can be built in time preproc(n) and that supports weak deletion in time weakdel(n), and answers queries in time query(n) for an n-element set, the exponential-blocks transformation with weak deletion gives a structure for the same problem that supports insertion in amortized O((log n) preproc(n)/n ) time, deletions in amortized O(weakdel(n) + preproc(n)/n ) time, and queries in worst-case O(log n query(n)) time.
7.2 Making Structures Persistent A dynamic data structure changes over time, and sometimes it is useful if we can access old versions of it. Obvious application is revision control and the implementation of the “undo” command in editors, multiple file versions, and error recovery. For example, given a database containing a company's personnel administration, it might be important to be able to ask questions like: how many people had a salary >= x one year ago. To answer this kind of so-called in-the-past queries, we require that the data structure can remember relevant information concerning its own history.
Techniques to access earlier versions “Partial persistence” The most natural persistence, allows queries to previous versions, which could be identified by timestamps or version numbers. “Full persistence” in which past versions can also be changed, giving rise to a version tree without any special current version. “Confluent persistence” studied first for double-ended queues, in a confluently persistent structure, one may also join different versions. But these stronger variants of persistence seem only of theoretical interest. “Backtracking”: setting the current version back to an old version and discarding all changes since then. The use of a stack for old versions predates all persistence considerations.
Fat nodes “Fat nodes” method is a transformation that replaces each node of the pointer-based structure by a search tree for the correct version of the node, using the query time as key. Each time the underlying structure is modified, any “fat” node whose content is modified just receives a new version entry in its search tree; and newly created nodes contain new search trees, initially with one version only.
Theorem Any dynamic structure in the pointer-machine model that supports queries in time query(n) and updates in time update(n) on a set with n elements can be made persistent, allowing queries to past versions, with a query time O(log n query(n)) for past versions, O(query(n)) for the current version, and update time O(update(n)), using the “fat nodes” method combined with a search tree that allows constant-time queries and updates at the maximum end.
Theorem Any dynamic structure in the pointer-machine model that supports queries in time query(n) and updates in time update(n) on a set with n elements can be made to support backtracking, using stacks for “fat nodes,” with a query time O(query(n)), update time O(update(n)), and backtrack time amortized O(1), with a sequence of a updates, b queries, and c backtracks, starting on an initially empty set, taking O(a update(a) + b query(a) + c).
Thank You